以Deformable_DETR为例说明训练过程

news/2024/11/28 20:17:14/文章来源:https://www.cnblogs.com/anti1hapi/p/18575085

以Deformable_DETR为例说明使用服务器训练过程

下载程序文件

根据论文提供的github地址fundamentalvision/Deformable-DETR: Deformable DETR: Deformable Transformers for End-to-End Object Detection.下载zip到本地

租用服务器

在autodl平台租用服务器,申请账号氪金之后去市场选GPU,最好选择空余GPU数量多的,要不然用的时候可能会用不了

image-20241128195701904

然后在下面选择需要的镜像,自己设置pytorch版本和python版本,亦或者,使用社区镜像,就是加入你需要复现的程序很多人都复现过,直接用别人配置好的环境,

image-20241128200013025

这里我使用社区镜像,直接搜索项目就能搜到,但是推荐使用基础镜像自己根据项目选择合适的版本,

链接服务器

这里使用pycharm链接服务器传文件,(有别的更快的方法传文件,自己搜)

先选择无卡模式开机,花钱少,然后进入pycharm-》设置-》项目-》解释器-》添加解释器-》使用ssh,

复制容器上的ssh账号,将端口号输入后面,然后@以及@符号之前的全部删除,之后再输入密码,就能链接成功,额额额,累了,贴个BV自己看吧BV1gn4y1o7PB

开始训练

(其实这一部分是先写的,因为跑起来之后想着赶紧关机省钱,别急,你来你也是。)

先下载需要的库,torch之类的镜像上已经下好了,接下来就按着GitHub上的来就行了

pip install -r requirements.txt

编译cuda

cd ./models/ops
sh ./make.sh
python test.py

下载数据集,项目中使用的是coco,但是coco很大,跑一次很久,而且传服务器上也很慢,这里笔者使用从coco数据集里截取的部分,有50、1000、3000张三个版本,里面也有原版本:**链接:https://pan.baidu.com/s/17wpMzJvzSQ-a3qbaqIbmdg?pwd=data
提取码:data **

数据集按照以下这种方式进行摆放:

Deformable_DETR/
└── data/└── coco/├── train2017/├── val2017/└── annotations/├── instances_train2017.json└── instances_val2017.json
└── configs/
└── ....../

开始训练,单张GPU单节点直接修改原git上的8改为1就可,其实直接删除应该也可以

GPUS_PER_NODE=1 ./tools/run_dist_launch.sh 1 ./configs/r50_deformable_detr.sh

改用镜像前出现各种各样的问题,使用社区镜像之后就没问题了,主要是自己设置的镜像没有社区镜像那个版本,

使用社区版后报的错有:

(MV) root@autodl-container-8250118952-357c0e14:~/autodl-tmp/Deformable_DETR# GPUS_PER_NODE=1 ./tools/run_dist_launch.sh 1 ./configs/r50_deformable_detr.sh
+ GPUS=1
+ RUN_COMMAND=./configs/r50_deformable_detr.sh
+ '[' 1 -lt 8 ']'
+ GPUS_PER_NODE=1
+ MASTER_ADDR=127.0.0.1
+ MASTER_PORT=29500
+ NODE_RANK=0
+ let NNODES=GPUS/GPUS_PER_NODE
+ python ./tools/launch.py --nnodes 1 --node_rank 0 --master_addr 127.0.0.1 --master_port 29500 --nproc_per_node 1 ./configs/r50_deformable_detr.sh
+ EXP_DIR=exps/r50_deformable_detr
+ PY_ARGS=
+ python -u main.py --output_dir exps/r50_deformable_detr
Traceback (most recent call last):File "main.py", line 21, in <module>import datasetsFile "/root/autodl-tmp/Deformable_DETR/datasets/__init__.py", line 13, in <module>from .coco import build as build_cocoFile "/root/autodl-tmp/Deformable_DETR/datasets/coco.py", line 22, in <module>from util.misc import get_local_rank, get_local_sizeFile "/root/autodl-tmp/Deformable_DETR/util/misc.py", line 32, in <module>from torchvision.ops.misc import _NewEmptyTensorOp
ImportError: cannot import name '_NewEmptyTensorOp' from 'torchvision.ops.misc' (/root/miniconda3/envs/MV/lib/python3.8/site-packages/torchvision/ops/misc.py)
Traceback (most recent call last):File "./tools/launch.py", line 192, in <module>main()File "./tools/launch.py", line 187, in mainraise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['./configs/r50_deformable_detr.sh']' returned non-zero exit status 1.

错误信息表明,程序在导入 torchvision.ops.misc 模块时无法找到 _NewEmptyTensorOp。这是由于 torchvision 的版本和代码中引用的 API 不兼容导致的。

首先检查torch和torchvision对Deformable-DETR的兼容情况

之后可以尝试手动替换或者修复_NewEmptyTensorOp的导入问题,在文件 util/misc.py 中,定位到:

from torchvision.ops.misc import _NewEmptyTensorOp

将这一行代码替换为:

try:from torchvision.ops.misc import _NewEmptyTensorOp
except ImportError:# Define a fallback for _NewEmptyTensorOp if not availableclass _NewEmptyTensorOp(torch.autograd.Function):@staticmethoddef forward(ctx, x, new_shape):return x.new_empty(new_shape)@staticmethoddef backward(ctx, grad):return grad, None

这个的作用是

1. 检查并修复 _NewEmptyTensorOp 导入问题

_NewEmptyTensorOptorchvision 提供的一个工具类,通常用于操作张量。它可能因以下原因出现问题:

  • 版本不兼容:不同版本的 torchvision_NewEmptyTensorOp 的定义和位置可能不同。
  • 移除或改名:较新版本可能已删除该类,导致导入失败。

修改意义
通过使用 try-except 块,动态检查 _NewEmptyTensorOp 是否可用。如果不可用,则提供一个自定义实现,确保代码在旧版或新版 torchvision 上都可以运行。


2. 自定义 _NewEmptyTensorOp 的实现

提供了一个备用实现,以应对 _NewEmptyTensorOp 无法导入的情况:

  • forward 方法:根据新的形状 new_shape 创建一个空张量,并保持原张量的类型和设备。
  • backward 方法:提供梯度回传支持,使操作可以参与反向传播。

作用

  • 保持代码完整性:避免因缺少 _NewEmptyTensorOp 而导致程序中断。
  • 支持梯度计算:自定义的 forwardbackward 确保张量操作在训练中不会破坏自动微分机制。

在修复后,清理缓存和编译的 CUDA 操作,重新运行命令:

python setup.py clean
python setup.py build develop

之后重新执行上面开始训练的脚本

会自动下载权重文件,之后开始训练:

# 下载权重文件
Namespace(aux_loss=True, backbone='resnet50', batch_size=2, bbox_loss_coef=5, cache_mode=False, clip_max_norm=0.1, cls_loss_coef=2, coco_panoptic_path=None, coco_path='./data/coco', dataset_file='coco', dec_layers=6, dec_n_points=4, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=1024, dist_backend='nccl', dist_url='env://', distributed=True, dropout=0.1, enc_layers=6, enc_n_points=4, epochs=50, eval=False, focal_alpha=0.25, frozen_weights=None, giou_loss_coef=2, gpu=0, hidden_dim=256, lr=0.0002, lr_backbone=2e-05, lr_backbone_names=['backbone.0'], lr_drop=40, lr_drop_epochs=None, lr_linear_proj_mult=0.1, lr_linear_proj_names=['reference_points', 'sampling_offsets'], mask_loss_coef=1, masks=False, nheads=8, num_feature_levels=4, num_queries=300, num_workers=2, output_dir='exps/r50_deformable_detr', position_embedding='sine', position_embedding_scale=6.283185307179586, rank=0, remove_difficult=False, resume='', seed=42, set_cost_bbox=5, set_cost_class=2, set_cost_giou=2, sgd=False, start_epoch=0, two_stage=False, weight_decay=0.0001, with_box_refine=False, world_size=1)
/root/miniconda3/envs/MV/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.warnings.warn(
/root/miniconda3/envs/MV/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|██████████████████████████████████████████████████████████████| 97.8M/97.8M [00:08<00:00, 12.0MB/s]# 开始训练,一共50轮
Start training
/root/autodl-tmp/Deformable_DETR/models/position_encoding.py:49: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats)
/root/miniconda3/envs/MV/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2894.)return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Epoch: [0]  [  0/500]  eta: 0:17:25  lr: 0.000200  class_error: 90.91  grad_norm: 90.38  loss: 33.3953 (33.3953)  loss_ce: 2.1406 (2.1406)  loss_bbox: 1.6266 (1.6266)  loss_giou: 1.6883 (1.6883)  loss_ce_0: 2.0890 (2.0890)  loss_bbox_0: 1.7336 (1.7336)  loss_giou_0: 1.7052 (1.7052)  loss_ce_1: 2.1477 (2.1477)  loss_bbox_1: 1.7186 (1.7186)  loss_giou_1: 1.6883 (1.6883)  loss_ce_2: 2.2523 (2.2523)  loss_bbox_2: 1.7217 (1.7217)  loss_giou_2: 1.6883 (1.6883)  loss_ce_3: 2.3523 (2.3523)  loss_bbox_3: 1.6521 (1.6521)  loss_giou_3: 1.6883 (1.6883)  loss_ce_4: 2.1289 (2.1289)  loss_bbox_4: 1.6850 (1.6850)  loss_giou_4: 1.6883 (1.6883)  loss_ce_unscaled: 1.0703 (1.0703)  class_error_unscaled: 90.9091 (90.9091)  loss_bbox_unscaled: 0.3253 (0.3253)  loss_giou_unscaled: 0.8442 (0.8442)  cardinality_error_unscaled: 294.5000 (294.5000)  loss_ce_0_unscaled: 1.0445 (1.0445)  loss_bbox_0_unscaled: 0.3467 (0.3467)  loss_giou_0_unscaled: 0.8526 (0.8526)  cardinality_error_0_unscaled: 293.0000 (293.0000)  loss_ce_1_unscaled: 1.0739 (1.0739)  loss_bbox_1_unscaled: 0.3437 (0.3437)  loss_giou_1_unscaled: 0.8442 (0.8442)  cardinality_error_1_unscaled: 294.5000 (294.5000)  loss_ce_2_unscaled: 1.1262 (1.1262)  loss_bbox_2_unscaled: 0.3443 (0.3443)  loss_giou_2_unscaled: 0.8442 (0.8442)  cardinality_error_2_unscaled: 294.5000 (294.5000)  loss_ce_3_unscaled: 1.1762 (1.1762)  loss_bbox_3_unscaled: 0.3304 (0.3304)  loss_giou_3_unscaled: 0.8442 (0.8442)  cardinality_error_3_unscaled: 294.5000 (294.5000)  loss_ce_4_unscaled: 1.0644 (1.0644)  loss_bbox_4_unscaled: 0.3370 (0.3370)  loss_giou_4_unscaled: 0.8442 (0.8442)  cardinality_error_4_unscaled: 294.5000 (294.5000)  time: 2.0913  data: 0.0000  max mem: 2595
Epoch: [0]  [ 10/500]  eta: 0:04:08  lr: 0.000200  class_error: 100.00  grad_norm: 59.73  loss: 33.8059 (36.6621)  loss_ce: 2.1492 (2.2435)  loss_bbox: 1.9925 (2.2554)  loss_giou: 1.7524 (1.7051)  loss_ce_0: 2.0454 (2.0046)  loss_bbox_0: 2.0563 (2.2853)  loss_giou_0: 1.7421 (1.7168)  loss_ce_1: 2.1477 (2.1195)  loss_bbox_1: 2.0127 (2.2745)  loss_giou_1: 1.7430 (1.7075)  loss_ce_2: 2.1196 (2.0959)  loss_bbox_2: 1.9863 (2.2725)  loss_giou_2: 1.7475 (1.7035)  loss_ce_3: 2.0785 (2.1923)  loss_bbox_3: 1.9909 (2.2674)  loss_giou_3: 1.7291 (1.7033)  loss_ce_4: 2.1289 (2.1414)  loss_bbox_4: 1.9995 (2.2686)  loss_giou_4: 1.7426 (1.7052)  loss_ce_unscaled: 1.0746 (1.1217)  class_error_unscaled: 100.0000 (93.7190)  loss_bbox_unscaled: 0.3985 (0.4511)  loss_giou_unscaled: 0.8762 (0.8525)  cardinality_error_unscaled: 295.0000 (294.2273)  loss_ce_0_unscaled: 1.0227 (1.0023)  loss_bbox_0_unscaled: 0.4113 (0.4571)  loss_giou_0_unscaled: 0.8710 (0.8584)  cardinality_error_0_unscaled: 295.0000 (293.7727)  loss_ce_1_unscaled: 1.0739 (1.0597)  loss_bbox_1_unscaled: 0.4025 (0.4549)  loss_giou_1_unscaled: 0.8715 (0.8538)  cardinality_error_1_unscaled: 295.0000 (294.2273)  loss_ce_2_unscaled: 1.0598 (1.0479)  loss_bbox_2_unscaled: 0.3973 (0.4545)  loss_giou_2_unscaled: 0.8738 (0.8518)  cardinality_error_2_unscaled: 295.0000 (294.2273)  loss_ce_3_unscaled: 1.0393 (1.0962)  loss_bbox_3_unscaled: 0.3982 (0.4535)  loss_giou_3_unscaled: 0.8645 (0.8516)  cardinality_error_3_unscaled: 295.0000 (294.2273)  loss_ce_4_unscaled: 1.0644 (1.0707)  loss_bbox_4_unscaled: 0.3999 (0.4537)  loss_giou_4_unscaled: 0.8713 (0.8526)  cardinality_error_4_unscaled: 295.0000 (294.2273)  time: 0.5081  data: 0.0000  max mem: 8155
Epoch: [0]  [ 20/500]  eta: 0:03:13  lr: 0.000200  class_error: 96.00  grad_norm: 66.10  loss: 33.1548 (35.3786)  loss_ce: 2.1193 (2.1767)  loss_bbox: 1.6732 (2.0712)  loss_giou: 1.7230 (1.7049)  loss_ce_0: 1.9877 (2.0215)  loss_bbox_0: 1.7229 (2.0959)  loss_giou_0: 1.7421 (1.7148)  loss_ce_1: 2.1756 (2.1125)  loss_bbox_1: 1.6997 (2.0856)  loss_giou_1: 1.7295 (1.7072)  loss_ce_2: 2.0581 (2.0707)  loss_bbox_2: 1.6692 (2.0807)  loss_giou_2: 1.7245 (1.7052)  loss_ce_3: 2.0403 (2.1157)  loss_bbox_3: 1.6682 (2.0794)  loss_giou_3: 1.7242 (1.7036)  loss_ce_4: 2.1486 (2.1487)  loss_bbox_4: 1.6742 (2.0805)  loss_giou_4: 1.7218 (1.7038)  loss_ce_unscaled: 1.0597 (1.0884)  class_error_unscaled: 100.0000 (92.4284)  loss_bbox_unscaled: 0.3346 (0.4142)  loss_giou_unscaled: 0.8615 (0.8525)  cardinality_error_unscaled: 295.0000 (293.8810)  loss_ce_0_unscaled: 0.9938 (1.0107)  loss_bbox_0_unscaled: 0.3446 (0.4192)  loss_giou_0_unscaled: 0.8710 (0.8574)  cardinality_error_0_unscaled: 295.0000 (293.6429)  loss_ce_1_unscaled: 1.0878 (1.0563)  loss_bbox_1_unscaled: 0.3399 (0.4171)  loss_giou_1_unscaled: 0.8648 (0.8536)  cardinality_error_1_unscaled: 295.0000 (293.8810)  loss_ce_2_unscaled: 1.0290 (1.0354)  loss_bbox_2_unscaled: 0.3338 (0.4161)  loss_giou_2_unscaled: 0.8623 (0.8526)  cardinality_error_2_unscaled: 295.0000 (293.8810)  loss_ce_3_unscaled: 1.0201 (1.0579)  loss_bbox_3_unscaled: 0.3336 (0.4159)  loss_giou_3_unscaled: 0.8621 (0.8518)  cardinality_error_3_unscaled: 295.0000 (293.8810)  loss_ce_4_unscaled: 1.0743 (1.0744)  loss_bbox_4_unscaled: 0.3348 (0.4161)  loss_giou_4_unscaled: 0.8609 (0.8519)  cardinality_error_4_unscaled: 295.0000 (293.8810)  time: 0.3184  data: 0.0000  max mem: 8155

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/843043.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Qt VTK加载openfoam计算结果

Qt VTK加载openfoam计算结果.foam文件。#include <QApplication> #include <QDebug> #include "qvtkopenglwidget.h" #include <vtkSmartPointer.h> #include <vtkGenericDataObjectReader.h> #include <vtkPolyDataMapper.h> #includ…

蓝桥3511飞机降落

样例输入 2 3 0 100 10 10 10 10 0 2 20 3 0 10 20 10 10 20 20 10 20 样例输出 YES NO思路: 具体来说,对于每架飞机,有起飞时间(t)、降落时间限制(d)和飞行时长(l)等信息,代码要判断能否按照一定规则安排这些飞机的起降顺序,使得所有飞机都能在其降落时间限制内完成…

多校A层冲刺NOIP2024模拟赛27终结篇

不知道是不是我打的最后一场模拟赛了,记录一下吧,总体来说还不错,虽然 \(T1\) 方案数求错爆零了,但 \(T3\) 场切了,暴力打满的话有265,希望 \(NOIP\) 时也可以不让自己遗憾吧。 A 【模板】分治FFT 考虑每加进来一个数的贡献 \(x_1*x_2+(x_1+x_2)*x_3+...=x_1*x_2+x_1*x_3…

elasticseach-head插件

git地址 https://github.com/mobz/elasticsearch-head 安装方式 浏览器插件 docker本地安装

支持超线程的numa架构

支持超线程的numa架构 物理硬件视角,将多个CPU封装在一起,这个封装被称为插槽Socket; Core是socket上独立的硬件单元; 通过intel的超线程HT技术进一步提升CPU的处理能力,OS看到的逻辑上的核Processor的数量。每个硬件线程都可以按逻辑cpu寻址,因此这个处理器看上去有八块…

schoolcms代码审计

sql注入 注入点:burp的数据包: POST /index.php?m=Admin&c=Article&a=Delete HTTP/1.1 Host: schoolcms Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.112 Safa…

Burp抓模拟器App HttpHttps数据包

Burp抓模拟器App Http/Https数据包Next抓模拟器中App Https数据包,本文模拟器环境为最新版本雷电模拟器9。首先仍需在模拟器内安装证书,Burp导出证书,导出步骤同上。雷电模拟器安装证书可能需要设置PIN码,依据提示设置安装即可。模拟器设置手动代理,将流量转发至自己PC。新…

Burp抓Web端应用HttpHttps数据包

Burp抓Web端应用Http/Https数据包抓Web端Https数据包,需提前在本机安装证书。打开Burp证书安装完成后,本机或浏览器代理插件设置代理,并将流量转发至Burp,尝试抓包。Web端Http/Https数据包抓包成功!

【论文精读】Lora

【论文精读】 Lora:Low-rank adaptation of large language models论文地址:Lora:Low-rank adaptation of large language models 年份:2021 引用量:8000+ 关键词:LLM的高效微调目录【论文精读】Lora:Low-rank adaptation of large language models1. 背景2. Lora方法3. 实…

2024-0xGame-WEB方向全题解

0xGame Round1 ez_rce 源码: from flask import Flask, request import subprocessapp = Flask(__name__)@app.route("/") def index():return open(__file__).read()@app.route("/calc", methods=[POST]) def calculator():expression = request.form.ge…

【开发】计算机延迟指标全解析:深入理解系统性能瓶颈

在计算机的世界里,“速度”一直是我们不懈追求的目标。从早期的计算机到如今的高性能设备,每一次技术进步都伴随着对速度的极致渴望。无论是处理器的运算速度,还是数据的传输与存储速度,都直接影响着我们使用计算机的体验。那你是否曾好奇,计算机中的“快”究竟是如何衡量…