Qwen2.5-0.5B siglip 预训练 / 微调实验

news/2025/2/27 22:31:02/文章来源:https://www.cnblogs.com/isumi/p/18742184

前言

视频 | 仓库

跟着教程做的，记录学习过程。

环境

双卡 3090 24G
CUDA 12.2
transformers 4.49

概述

通过在 Qwen2.5-0.5B 和 siglip 之间训练一个中间层构造 VLM。作者提到首次微调时，只冻结了视觉模型的参数，导致阶梯状的 loss 曲线，即发生过拟合。应冻结文本和视觉模型的全部参数，只训练中间层。

实验

预训练

未改动参数，跑了 5 小时左右，共 5 epoch，最终 loss 在 3.6 左右。

微调

教程用的是四卡 A100，双 3090 只能咬牙跑了。微调比预训练阶段费时一些，也需要更多的显存。首次实验跑到 33% 左右的样本爆显存了，第二次实验时改小了 batch_size，没什么效果。

测试

直接用微调到当前进度的模型进行测试，还是答出来了。

2025-02-27 17:04:50.278186: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-02-27 17:04:50.323774: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-02-27 17:04:51.275953: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
老虎在水里