【RT-DETR有效改进】反向残差块网络EMO | 一种轻量级的CNN架构(轻量化网络,参数量下降约700W)

前言

大家好,这里是RT-DETR有效涨点专栏

本专栏的内容为根据ultralytics版本的RT-DETR进行改进,内容持续更新,每周更新文章数量3-10篇。

专栏以ResNet18、ResNet50为基础修改版本,同时修改内容也支持ResNet32、ResNet101和PPHGNet版本,其中ResNet为RT-DETR官方版本1:1移植过来的,参数量基本保持一致(误差很小很小),不同于ultralytics仓库版本的ResNet官方版本,同时ultralytics仓库的一些参数是和RT-DETR相冲的所以我也是会教大家调好一些参数和代码,真正意义上的跑ultralytics的和RT-DETR官方版本的无区别。

👑欢迎大家订阅本专栏,一起学习RT-DETR👑   

一、本文介绍

本文给大家带来的改进机制是反向残差块网络EMO,其的构成块iRMB在之前我已经发过了,同时进行了二次创新,本文的网络就是由iRMB组成的网络EMO,所以我们二次创新之后的iEMA也可以用于这个网络中,再次形成二次创新,同时本文的主干网络为一种轻量级的CNN架构,轻量化网络,参数量下降约700W。

专栏链接:RT-DETR剑指论文专栏,持续复现各种顶会内容——论文收割机RT-DETR  

目录

一、本文介绍

二、EMO模型原理

三、EMO的核心代码 

四、手把手教你添加EMO

4.1 修改一

4.2 修改二 

4.3 修改三 

4.4 修改四

4.5 修改五

4.6 修改六

4.7 修改七 

4.8 修改八

4.9 RT-DETR不能打印计算量问题的解决

4.10 可选修改

五、EMO的yaml文件

5.1 yaml文件

5.2 运行文件

5.3 成功训练截图

六、全文总结


二、EMO模型原理

论文地址:官方论文地址

代码地址:官方代码地址


Efficient MOdel (EMO)模型基于反向残差块(Inverted Residual Block, IRB),这是一种轻量级CNN的基础架构,同时融合了Transformer的有效组件。通过这种结合,EMO实现了一个统一的视角来处理轻量级模型的设计,创新地将CNN和注意力机制相结合。此外,EMO模型在各种基准测试中展示出优越的性能,特别是在ImageNet-1K、COCO2017和ADE20K等数据集上的表现。该模型不仅在效率和精度方面取得了平衡,而且在轻量级设计方面实现了突破。

EMO的基本原理可以分为以下几个要点:

1. 反向残差块(IRB)的应用:IRB作为轻量级CNN的基础架构,EMO将其扩展到基于注意力的模型。

2. 元移动块(MMB)的抽象化:EMO提出了一种新的轻量级设计方法,即单残差的元移动块(MMB),这是从IRB和Transformer的有效组件中抽象出的。

3. 现代反向残差移动块(iRMB)的构建:基于简单但有效的设计标准,EMO推导出了iRMB,并以此构建了类似于ResNet的高效模型(EMO)。

在下面这个图中,我们可以看到EMO模型的结构细节:

左侧是一个抽象统一的元移动块(Meta-Mobile Block),它融合了多头自注意力机制(Multi-Head Self-Attention)、前馈网络(Feed-Forward Network)和反向残差块(Inverted Residual Block)。这个复合模块通过不同的扩展比率和高效的操作符进行具体化。

右侧展示了一个类似于ResNet的EMO模型架构,它完全由推导出的iRMB组成。图中突出了EMO模型中微操作组合(如深度可分卷积、窗口Transformer等)和不同尺度的网络层次,这些都是用于分类(CLS)、检测(Det)和分割(Seg)任务的。这种设计强调了EMO模型在处理不同下游任务时的灵活性和效率。  


三、EMO的核心代码 

EMO的核心代码如下,使用方法看章节四!

from timm.models.layers import  trunc_normal_
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
from functools import partial
from einops import rearrange, reduce
from timm.models.layers import DropPathinplace = True__all__ = ['EMO_1M', 'EMO_2M', 'EMO_5M', 'EMO_6M']class SELayerV2(nn.Module):def __init__(self, in_channel, reduction=4):super(SELayerV2, self).__init__()assert in_channel >= reduction and in_channel % reduction == 0, 'invalid in_channel in SaElayer'self.reduction = reductionself.cardinality = 4self.avg_pool = nn.AdaptiveAvgPool2d(1)# cardinality 1self.fc1 = nn.Sequential(nn.Linear(in_channel, in_channel // self.reduction, bias=False),nn.ReLU(inplace=True))# cardinality 2self.fc2 = nn.Sequential(nn.Linear(in_channel, in_channel // self.reduction, bias=False),nn.ReLU(inplace=True))# cardinality 3self.fc3 = nn.Sequential(nn.Linear(in_channel, in_channel // self.reduction, bias=False),nn.ReLU(inplace=True))# cardinality 4self.fc4 = nn.Sequential(nn.Linear(in_channel, in_channel // self.reduction, bias=False),nn.ReLU(inplace=True))self.fc = nn.Sequential(nn.Linear(in_channel // self.reduction * self.cardinality, in_channel, bias=False),nn.Sigmoid())def forward(self, x):b, c, _, _ = x.size()y = self.avg_pool(x).view(b, c)y1 = self.fc1(y)y2 = self.fc2(y)y3 = self.fc3(y)y4 = self.fc4(y)y_concate = torch.cat([y1, y2, y3, y4], dim=1)y_ex_dim = self.fc(y_concate).view(b, c, 1, 1)return x * y_ex_dim.expand_as(x)def get_act(act_layer='relu'):act_dict = {'none': nn.Identity,'relu': nn.ReLU,'relu6': nn.ReLU6,'silu': nn.SiLU,'gelu': nn.GELU}return act_dict[act_layer]class LayerNorm2d(nn.Module):def __init__(self, normalized_shape, eps=1e-6, elementwise_affine=True):super().__init__()self.norm = nn.LayerNorm(normalized_shape, eps, elementwise_affine)def forward(self, x):x = rearrange(x, 'b c h w -> b h w c').contiguous()x = self.norm(x)x = rearrange(x, 'b h w c -> b c h w').contiguous()return xdef get_norm(norm_layer='in_1d'):eps = 1e-6norm_dict = {'none': nn.Identity,'in_1d': partial(nn.InstanceNorm1d, eps=eps),'in_2d': partial(nn.InstanceNorm2d, eps=eps),'in_3d': partial(nn.InstanceNorm3d, eps=eps),'bn_1d': partial(nn.BatchNorm1d, eps=eps),'bn_2d': partial(nn.BatchNorm2d, eps=eps),# 'bn_2d': partial(nn.SyncBatchNorm, eps=eps),'bn_3d': partial(nn.BatchNorm3d, eps=eps),'gn': partial(nn.GroupNorm, eps=eps),'ln_1d': partial(nn.LayerNorm, eps=eps),'ln_2d': partial(LayerNorm2d, eps=eps),}return norm_dict[norm_layer]class LayerScale(nn.Module):def __init__(self, dim, init_values=1e-5, inplace=True):super().__init__()self.inplace = inplaceself.gamma = nn.Parameter(init_values * torch.ones(1, 1, dim))def forward(self, x):return x.mul_(self.gamma) if self.inplace else x * self.gammaclass LayerScale2D(nn.Module):def __init__(self, dim, init_values=1e-5, inplace=True):super().__init__()self.inplace = inplaceself.gamma = nn.Parameter(init_values * torch.ones(1, dim, 1, 1))def forward(self, x):return x.mul_(self.gamma) if self.inplace else x * self.gammaclass ConvNormAct(nn.Module):def __init__(self, dim_in, dim_out, kernel_size, stride=1, dilation=1, groups=1, bias=False,skip=False, norm_layer='bn_2d', act_layer='relu', inplace=True, drop_path_rate=0.):super(ConvNormAct, self).__init__()self.has_skip = skip and dim_in == dim_outpadding = math.ceil((kernel_size - stride) / 2)self.conv = nn.Conv2d(dim_in, dim_out, kernel_size, stride, padding, dilation, groups, bias)self.norm = get_norm(norm_layer)(dim_out)self.act = nn.GELU()self.drop_path = DropPath(drop_path_rate) if drop_path_rate else nn.Identity()def forward(self, x):shortcut = xx = self.conv(x)x = self.norm(x)x = self.act(x)if self.has_skip:x = self.drop_path(x) + shortcutreturn x# ========== Multi-Scale Populations, for down-sampling and inductive bias ==========
class MSPatchEmb(nn.Module):def __init__(self, dim_in, emb_dim, kernel_size=2, c_group=-1, stride=1, dilations=[1, 2, 3],norm_layer='bn_2d', act_layer='silu'):super().__init__()self.dilation_num = len(dilations)assert dim_in % c_group == 0c_group = math.gcd(dim_in, emb_dim) if c_group == -1 else c_groupself.convs = nn.ModuleList()for i in range(len(dilations)):padding = math.ceil(((kernel_size - 1) * dilations[i] + 1 - stride) / 2)self.convs.append(nn.Sequential(nn.Conv2d(dim_in, emb_dim, kernel_size, stride, padding, dilations[i], groups=c_group),get_norm(norm_layer)(emb_dim),get_act(act_layer)(emb_dim)))def forward(self, x):if self.dilation_num == 1:x = self.convs[0](x)else:x = torch.cat([self.convs[i](x).unsqueeze(dim=-1) for i in range(self.dilation_num)], dim=-1)x = reduce(x, 'b c h w n -> b c h w', 'mean').contiguous()return x
class iRMB(nn.Module):def __init__(self, dim_in, dim_out, norm_in=True, has_skip=True, exp_ratio=1.0, norm_layer='bn_2d',act_layer='relu', v_proj=True, dw_ks=3, stride=1, dilation=1, se_ratio=0.0, dim_head=64, window_size=7,attn_s=True, qkv_bias=False, attn_drop=0., drop=0., drop_path=0., v_group=False, attn_pre=False):super().__init__()self.norm = get_norm(norm_layer)(dim_in) if norm_in else nn.Identity()dim_mid = int(dim_in * exp_ratio)self.has_skip = (dim_in == dim_out and stride == 1) and has_skipself.attn_s = attn_sif self.attn_s:assert dim_in % dim_head == 0, 'dim should be divisible by num_heads'self.dim_head = dim_headself.window_size = window_sizeself.num_head = dim_in // dim_headself.scale = self.dim_head ** -0.5self.attn_pre = attn_preself.qk = ConvNormAct(dim_in, int(dim_in * 2), kernel_size=1, bias=qkv_bias, norm_layer='none',act_layer='none')self.v = ConvNormAct(dim_in, dim_mid, kernel_size=1, groups=self.num_head if v_group else 1, bias=qkv_bias,norm_layer='none', act_layer=act_layer, inplace=inplace)self.attn_drop = nn.Dropout(attn_drop)else:if v_proj:self.v = ConvNormAct(dim_in, dim_mid, kernel_size=1, bias=qkv_bias, norm_layer='none',act_layer=act_layer, inplace=inplace)else:self.v = nn.Identity()self.conv_local = ConvNormAct(dim_mid, dim_mid, kernel_size=dw_ks, stride=stride, dilation=dilation,groups=dim_mid, norm_layer='bn_2d', act_layer='silu', inplace=inplace)self.se = SELayerV2(dim_mid)self.proj_drop = nn.Dropout(drop)self.proj = ConvNormAct(dim_mid, dim_out, kernel_size=1, norm_layer='none', act_layer='none', inplace=inplace)self.drop_path = DropPath(drop_path) if drop_path else nn.Identity()def forward(self, x):shortcut = xx = self.norm(x)B, C, H, W = x.shapeif self.attn_s:# paddingif self.window_size <= 0:window_size_W, window_size_H = W, Helse:window_size_W, window_size_H = self.window_size, self.window_sizepad_l, pad_t = 0, 0pad_r = (window_size_W - W % window_size_W) % window_size_Wpad_b = (window_size_H - H % window_size_H) % window_size_Hx = F.pad(x, (pad_l, pad_r, pad_t, pad_b, 0, 0,))n1, n2 = (H + pad_b) // window_size_H, (W + pad_r) // window_size_Wx = rearrange(x, 'b c (h1 n1) (w1 n2) -> (b n1 n2) c h1 w1', n1=n1, n2=n2).contiguous()# attentionb, c, h, w = x.shapeqk = self.qk(x)qk = rearrange(qk, 'b (qk heads dim_head) h w -> qk b heads (h w) dim_head', qk=2, heads=self.num_head,dim_head=self.dim_head).contiguous()q, k = qk[0], qk[1]attn_spa = (q @ k.transpose(-2, -1)) * self.scaleattn_spa = attn_spa.softmax(dim=-1)attn_spa = self.attn_drop(attn_spa)if self.attn_pre:x = rearrange(x, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head).contiguous()x_spa = attn_spa @ xx_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h,w=w).contiguous()x_spa = self.v(x_spa)else:v = self.v(x)v = rearrange(v, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head).contiguous()x_spa = attn_spa @ vx_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h,w=w).contiguous()# unpaddingx = rearrange(x_spa, '(b n1 n2) c h1 w1 -> b c (h1 n1) (w1 n2)', n1=n1, n2=n2).contiguous()if pad_r > 0 or pad_b > 0:x = x[:, :, :H, :W].contiguous()else:x = self.v(x)x = x + self.se(self.conv_local(x)) if self.has_skip else self.se(self.conv_local(x))x = self.proj_drop(x)x = self.proj(x)x = (shortcut + self.drop_path(x)) if self.has_skip else xreturn xclass EMO(nn.Module):def __init__(self, dim_in=3, num_classes=1000, img_size=224,depths=[1, 2, 4, 2], stem_dim=16, embed_dims=[64, 128, 256, 512], exp_ratios=[4., 4., 4., 4.],norm_layers=['bn_2d', 'bn_2d', 'bn_2d', 'bn_2d'], act_layers=['relu', 'relu', 'relu', 'relu'],dw_kss=[3, 3, 5, 5], se_ratios=[0.0, 0.0, 0.0, 0.0], dim_heads=[32, 32, 32, 32],window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True], qkv_bias=True,attn_drop=0., drop=0., drop_path=0., v_group=False, attn_pre=False, pre_dim=0):super().__init__()self.num_classes = num_classesassert num_classes > 0dprs = [x.item() for x in torch.linspace(0, drop_path, sum(depths))]self.stage0 = nn.ModuleList([MSPatchEmb(  # down to 112dim_in, stem_dim, kernel_size=dw_kss[0], c_group=1, stride=2, dilations=[1],norm_layer=norm_layers[0], act_layer='none'),iRMB(  # dsstem_dim, stem_dim, norm_in=False, has_skip=False, exp_ratio=1,norm_layer=norm_layers[0], act_layer=act_layers[0], v_proj=False, dw_ks=dw_kss[0],stride=1, dilation=1, se_ratio=1,dim_head=dim_heads[0], window_size=window_sizes[0], attn_s=False,qkv_bias=qkv_bias, attn_drop=attn_drop, drop=drop, drop_path=0.,attn_pre=attn_pre)])emb_dim_pre = stem_dimfor i in range(len(depths)):layers = []dpr = dprs[sum(depths[:i]):sum(depths[:i + 1])]for j in range(depths[i]):if j == 0:stride, has_skip, attn_s, exp_ratio = 2, False, False, exp_ratios[i] * 2else:stride, has_skip, attn_s, exp_ratio = 1, True, attn_ss[i], exp_ratios[i]layers.append(iRMB(emb_dim_pre, embed_dims[i], norm_in=True, has_skip=has_skip, exp_ratio=exp_ratio,norm_layer=norm_layers[i], act_layer=act_layers[i], v_proj=True, dw_ks=dw_kss[i],stride=stride, dilation=1, se_ratio=se_ratios[i],dim_head=dim_heads[i], window_size=window_sizes[i], attn_s=attn_s,qkv_bias=qkv_bias, attn_drop=attn_drop, drop=drop, drop_path=dpr[j], v_group=v_group,attn_pre=attn_pre))emb_dim_pre = embed_dims[i]self.__setattr__(f'stage{i + 1}', nn.ModuleList(layers))self.norm = get_norm(norm_layers[-1])(embed_dims[-1])if pre_dim > 0:self.pre_head = nn.Sequential(nn.Linear(embed_dims[-1], pre_dim), get_act(act_layers[-1])(inplace=inplace))self.pre_dim = pre_dimelse:self.pre_head = nn.Identity()self.pre_dim = embed_dims[-1]self.head = nn.Linear(self.pre_dim, num_classes)self.apply(self._init_weights)self.width_list = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]def _init_weights(self, m):if isinstance(m, nn.Linear):trunc_normal_(m.weight, std=.02)if m.bias is not None:nn.init.zeros_(m.bias)elif isinstance(m, (nn.LayerNorm, nn.GroupNorm,nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d,nn.InstanceNorm1d, nn.InstanceNorm2d, nn.InstanceNorm3d)):nn.init.zeros_(m.bias)nn.init.ones_(m.weight)@torch.jit.ignoredef no_weight_decay(self):return {'token'}@torch.jit.ignoredef no_weight_decay_keywords(self):return {'alpha', 'gamma', 'beta'}@torch.jit.ignoredef no_ft_keywords(self):# return {'head.weight', 'head.bias'}return {}@torch.jit.ignoredef ft_head_keywords(self):return {'head.weight', 'head.bias'}, self.num_classesdef get_classifier(self):return self.headdef reset_classifier(self, num_classes):self.num_classes = num_classesself.head = nn.Linear(self.pre_dim, num_classes) if num_classes > 0 else nn.Identity()def check_bn(self):for name, m in self.named_modules():if isinstance(m, nn.modules.batchnorm._NormBase):m.running_mean = torch.nan_to_num(m.running_mean, nan=0, posinf=1, neginf=-1)m.running_var = torch.nan_to_num(m.running_var, nan=0, posinf=1, neginf=-1)def forward(self, x):unique_tensors = {}for blk in self.stage0:x = blk(x)width, height = x.shape[2], x.shape[3]unique_tensors[(width, height)] = xfor blk in self.stage1:x = blk(x)width, height = x.shape[2], x.shape[3]unique_tensors[(width, height)] = xfor blk in self.stage2:x = blk(x)width, height = x.shape[2], x.shape[3]unique_tensors[(width, height)] = xfor blk in self.stage3:x = blk(x)width, height = x.shape[2], x.shape[3]unique_tensors[(width, height)] = xfor blk in self.stage4:x = blk(x)width, height = x.shape[2], x.shape[3]unique_tensors[(width, height)] = xresult_list = list(unique_tensors.values())[-4:]return result_listdef EMO_1M(pretrained=False, **kwargs):model = EMO(# dim_in=3, num_classes=1000, img_size=224,depths=[2, 2, 8, 3], stem_dim=24, embed_dims=[32, 48, 80, 168], exp_ratios=[2., 2.5, 3.0, 3.5],norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],dw_kss=[3, 3, 5, 5], dim_heads=[16, 16, 20, 21], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],qkv_bias=True, attn_drop=0., drop=0., drop_path=0.04036, v_group=False, attn_pre=True, pre_dim=0,**kwargs)return modeldef EMO_2M(pretrained=False, **kwargs):model = EMO(# dim_in=3, num_classes=1000, img_size=224,depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[32, 48, 120, 200], exp_ratios=[2., 2.5, 3.0, 3.5],norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],dw_kss=[3, 3, 5, 5], dim_heads=[16, 16, 20, 20], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0,**kwargs)return modeldef EMO_5M(pretrained=False, **kwargs):model = EMO(# dim_in=3, num_classes=1000, img_size=224,depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[48, 72, 160, 288], exp_ratios=[2., 3., 4., 4.],norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],dw_kss=[3, 3, 5, 5], dim_heads=[24, 24, 32, 32], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0,**kwargs)return modeldef EMO_6M(pretrained=False, **kwargs):model = EMO(# dim_in=3, num_classes=1000, img_size=224,depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[48, 72, 160, 320], exp_ratios=[2., 3., 4., 5.],norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],dw_kss=[3, 3, 5, 5], dim_heads=[16, 24, 20, 32], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0,**kwargs)return modelif __name__ == "__main__":# Generating Sample imageimage_size = (1, 3, 640, 640)image = torch.rand(*image_size)# Modelmodel = EMO_6M()out = model(image)print(len(out))

四、手把手教你添加EMO

4.1 修改一

第一步还是建立文件,我们找到如下ultralytics/nn/modules文件夹下建立一个目录名字呢就是'Addmodules'文件夹(用群内的文件的话已经有了无需新建)!然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可


4.2 修改二 

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'(用群内的文件的话已经有了无需新建),然后在其内部导入我们的检测头如下图所示。


4.3 修改三 

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块(用群内的文件的话已经有了无需重新导入直接开始第四步即可)

从今天开始以后的教程就都统一成这个样子了,因为我默认大家用了我群内的文件来进行修改!!


4.4 修改四

添加如下两行代码!!!


4.5 修改五

找到七百多行大概把具体看图片,按照图片来修改就行,添加红框内的部分,注意没有()只是函数名。

        elif m in {自行添加对应的模型即可,下面都是一样的}:m = m(*args)c2 = m.width_list  # 返回通道列表backbone = True


4.6 修改六

用下面的代码替换红框内的内容。 

if isinstance(c2, list):m_ = mm_.backbone = True
else:m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # modulet = str(m)[8:-2].replace('__main__.', '')  # module type
m.np = sum(x.numel() for x in m_.parameters())  # number params
m_.i, m_.f, m_.type = i + 4 if backbone else i, f, t  # attach index, 'from' index, type
if verbose:LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m.np:10.0f}  {t:<45}{str(args):<30}')  # print
save.extend(x % (i + 4 if backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1)  # append to savelist
layers.append(m_)
if i == 0:ch = []
if isinstance(c2, list):ch.extend(c2)if len(c2) != 5:ch.insert(0, 0)
else:ch.append(c2)


4.7 修改七 

修改七这里非常要注意,不是文件开头YOLOv8的那predict,是400+行的RTDETR的predict!!!初始模型如下,用我给的代码替换即可!!!

代码如下->

 def predict(self, x, profile=False, visualize=False, batch=None, augment=False, embed=None):"""Perform a forward pass through the model.Args:x (torch.Tensor): The input tensor.profile (bool, optional): If True, profile the computation time for each layer. Defaults to False.visualize (bool, optional): If True, save feature maps for visualization. Defaults to False.batch (dict, optional): Ground truth data for evaluation. Defaults to None.augment (bool, optional): If True, perform data augmentation during inference. Defaults to False.embed (list, optional): A list of feature vectors/embeddings to return.Returns:(torch.Tensor): Model's output tensor."""y, dt, embeddings = [], [], []  # outputsfor m in self.model[:-1]:  # except the head partif m.f != -1:  # if not from previous layerx = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layersif profile:self._profile_one_layer(m, x, dt)if hasattr(m, 'backbone'):x = m(x)if len(x) != 5:  # 0 - 5x.insert(0, None)for index, i in enumerate(x):if index in self.save:y.append(i)else:y.append(None)x = x[-1]  # 最后一个输出传给下一层else:x = m(x)  # runy.append(x if m.i in self.save else None)  # save outputif visualize:feature_visualization(x, m.type, m.i, save_dir=visualize)if embed and m.i in embed:embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1))  # flattenif m.i == max(embed):return torch.unbind(torch.cat(embeddings, 1), dim=0)head = self.model[-1]x = head([y[j] for j in head.f], batch)  # head inferencereturn x

4.8 修改八

我们将下面的s用640替换即可,这一步也是部分的主干可以不修改,但有的不修改就会报错,所以我们还是修改一下。


4.9 RT-DETR不能打印计算量问题的解决

计算的GFLOPs计算异常不打印,所以需要额外修改一处, 我们找到如下文件'ultralytics/utils/torch_utils.py'文件内有如下的代码按照如下的图片进行修改,大家看好函数就行,其中红框的640可能和你的不一样, 然后用我给的代码替换掉整个代码即可。

def get_flops(model, imgsz=640):"""Return a YOLO model's FLOPs."""try:model = de_parallel(model)p = next(model.parameters())# stride = max(int(model.stride.max()), 32) if hasattr(model, 'stride') else 32  # max stridestride = 640im = torch.empty((1, 3, stride, stride), device=p.device)  # input image in BCHW formatflops = thop.profile(deepcopy(model), inputs=[im], verbose=False)[0] / 1E9 * 2 if thop else 0  # stride GFLOPsimgsz = imgsz if isinstance(imgsz, list) else [imgsz, imgsz]  # expand if int/floatreturn flops * imgsz[0] / stride * imgsz[1] / stride  # 640x640 GFLOPsexcept Exception:return 0


4.10 可选修改

有些读者的数据集部分图片比较特殊,在验证的时候会导致形状不匹配的报错,如果大家在验证的时候报错形状不匹配的错误可以固定验证集的图片尺寸,方法如下 ->

找到下面这个文件ultralytics/models/yolo/detect/train.py然后其中有一个类是DetectionTrainer class中的build_dataset函数中的一个参数rect=mode == 'val'改为rect=False


五、EMO的yaml文件

5.1 yaml文件

大家复制下面的yaml文件,然后通过我给大家的运行代码运行即可,RT-DETR的调参部分需要后面的文章给大家讲,现在目前免费给大家看这一部分不开放。

# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'# [depth, width, max_channels]l: [1.00, 1.00, 1024]# 支持下面的版本均可替换
#  __all__ = ['EMO_1M', 'EMO_2M', 'EMO_5M', 'EMO_6M']
backbone:# [from, repeats, module, args]- [-1, 1, EMO_1M, []]  # 4head:- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]]  # 5 input_proj.2- [-1, 1, AIFI, [1024, 8]] # 6- [-1, 1, Conv, [256, 1, 1]]  # 7, Y5, lateral_convs.0- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 8- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]]  # 9 input_proj.1- [[-2, -1], 1, Concat, [1]] # 10- [-1, 3, RepC3, [256, 0.5]]  # 11, fpn_blocks.0- [-1, 1, Conv, [256, 1, 1]]   # 12, Y4, lateral_convs.1- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 13- [2, 1, Conv, [256, 1, 1, None, 1, 1, False]]  # 14 input_proj.0- [[-2, -1], 1, Concat, [1]]  # 15 cat backbone P4- [-1, 3, RepC3, [256, 0.5]]    # X3 (16), fpn_blocks.1- [-1, 1, Conv, [256, 3, 2]]   # 17, downsample_convs.0- [[-1, 12], 1, Concat, [1]]  # 18 cat Y4- [-1, 3, RepC3, [256, 0.5]]    # F4 (19), pan_blocks.0- [-1, 1, Conv, [256, 3, 2]]   # 20, downsample_convs.1- [[-1, 7], 1, Concat, [1]]  # 21 cat Y5- [-1, 3, RepC3, [256, 0.5]]    # F5 (22), pan_blocks.1- [[16, 19, 22], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]]  # Detect(P3, P4, P5)


5.2 运行文件

大家可以创建一个train.py文件将下面的代码粘贴进去然后替换你的文件运行即可开始训练。

import warnings
from ultralytics import RTDETR
warnings.filterwarnings('ignore')if __name__ == '__main__':model = RTDETR('替换你想要运行的yaml文件')# model.load('') # 可以加载你的版本预训练权重model.train(data=r'替换你的数据集地址即可',cache=False,imgsz=640,epochs=72,batch=4,workers=0,device='0',project='runs/RT-DETR-train',name='exp',# amp=True)


5.3 成功训练截图

下面是成功运行的截图(确保我的改进机制是可用的),已经完成了有1个epochs的训练,图片太大截不全第2个epochs了。 


六、全文总结

从今天开始正式开始更新RT-DETR剑指论文专栏,本专栏的内容会迅速铺开,在短期呢大量更新,价格也会乘阶梯性上涨,所以想要和我一起学习RT-DETR改进,可以在前期直接关注,本文专栏旨在打造全网最好的RT-DETR专栏为想要发论文的家进行服务。

 专栏链接:RT-DETR剑指论文专栏,持续复现各种顶会内容——论文收割机RT-DETR

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/434946.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

MongoDB:从容器使用到 Mongosh、Python/Node.js 数据操作

文章目录 1. 容器与应用之间的关系介绍2. 使用 Docker 容器安装 MongoDB3. Mongosh 操作3.1 Mongosh 连接到 MongoDB3.2 基础操作与 CRUD 4. Python 操作 MongoDB5. Nodejs 操作 MongoDB参考文献 1. 容器与应用之间的关系介绍 MongoDB 的安装有时候并不是那么容易的&#xff0…

如何在有或没有备份的 iPhone 上检索已删除的短信

iPhone 清理垃圾短信时不小心删除了一些重要短信&#xff1f;想知道如何找回 iPhone 上已删除的短信吗&#xff1f;如果您已将设备备份到 iCloud 或 iTunes&#xff0c;则可以从备份恢复 iPhone 上的短信。如果没有备份&#xff0c;您可以尝试第三方iPhone短信恢复程序来恢复它…

Android如何通过按钮实现页面跳转方法

Hello大家好&#xff01;我是咕噜铁蛋&#xff01;在Android应用开发中&#xff0c;页面跳转是一项基本且常见的功能。通过按钮实现页面跳转可以为用户提供更好的交互体验&#xff0c;使应用更加灵活和易用。本文将介绍Android Studio中如何通过按钮实现页面跳转的方法&#xf…

用友U8接口-基础档案(4)

教程目录 用友U8接口-部署和简要说明(1) 用友U8接口-获取token&数据字段(2) 用友U8接口-系统管理(3) 概括 本文的操作需要正确部署U8HttpApi对本套接口基础档案目录说明&#xff0c;主要是存货档案&#xff0c;其他档案类似 获取token 参考教程目录2 存货档案 新增…

dos攻击与ddos攻击的区别

①DOS攻击&#xff1a; DOS&#xff1a;中文名称是拒绝服务&#xff0c;一切能引起DOS行为的攻击都被称为dos攻击。该攻击的效果是使得计算机或网络无法提供正常的服务。常见的DOS攻击有针对计算机网络带宽和连通性的攻击。 DOS是单机于单机之间的攻击。 DOS攻击的原理&#…

Spring Cloud + Vue前后端分离-第13章 网站开发

源代码在GitHub - 629y/course: Spring Cloud Vue前后端分离-在线课程 Spring Cloud Vue前后端分离-第13章 网站开发 13-1 网站模块的搭建 新建web模板 1.网站开发&#xff0c;增加web模块&#xff0c;使用命令&#xff1a;vue create web vue版本4.2.3 大家拿到一个v…

算法38:子数组的最小值之和(力扣907题)----单调栈

题目&#xff1a; 给定一个整数数组 arr&#xff0c;找到 min(b) 的总和&#xff0c;其中 b 的范围为 arr 的每个&#xff08;连续&#xff09;子数组。 示例 1&#xff1a; 输入&#xff1a;arr [3,1,2,4] 输出&#xff1a;17 解释&#xff1a; 子数组为 [3]&#xff0c;[…

【牛客刷题】笔试选择题整理(day1-day2)

每天都在进步呀 文章目录 1. 小数求模运算2. 进程的分区&#xff0c;这里说的不是JVM的分区。进程中&#xff0c;方法存放在方法区。3. 访问权限控制4. 继承与多态5. 与equals()6. 类加载顺序7. super()与this()7.1 super7.1.1 super调用父类构造方法7.1.2 super调用父类属性和…

openssl3.2 - 测试程序的学习

文章目录 openssl3.2 - 测试程序的学习概述笔记openssl3.2 - 测试程序的学习 - 准备openssl测试专用工程的模板openssl3.2 - 测试程序的学习 - test\aborttest.copenssl3.2 - 测试程序的学习 - test\sanitytest.copenssl3.2 - 测试程序的学习 - test\acvp_test.copenssl3.2 - 测…

语图奇缘:林浩然与杨凌芸的哲学漫画大冒险

语图奇缘&#xff1a;林浩然与杨凌芸的哲学漫画大冒险 Language Odyssey: The Philosophical Comic Adventure of Lin Haoran and Yang Lingyun 在一个充满逻辑谜题和言语陷阱的城市——逻言市&#xff0c;住着两位热衷于探索语言奥秘的年轻人&#xff0c;林浩然和杨凌芸。林浩…

AlexNet,ZFNet详解

1 AlexNet 网络结构 对于AlexNet网络来说&#xff0c;因为当时资源环境受限&#xff0c;他从第一步卷积开始就把一个图像分到两个GPU上训练&#xff0c;然后中间进行组合最后进行融合成全连接成1000个置信度 1 得到一张3x224x224的图像&#xff0c;然后进行11x11的卷积&…

02-opencv-上

机器视觉概述 机器视觉是人工智能正在快速发展的一个分支。简单说来&#xff0c;机器视觉就是用机器代替人眼来做测量和判断。机器视觉系统是通过机器视觉产品(即图像摄取装置&#xff0c;分CMOS和CCD两种)将被摄取目标转换成图像信号&#xff0c;传送给专用的图像处理系统&…