GPTQ简介
2022年,Frantar等人发表了论文 GPTQ:Accurate Post-Training Quantization for Generative Pre-trained Transformers。
这篇论文详细介绍了一种训练后量化算法,适用于所有通用的预训练 Transformer模型,同时只有微小的性能下降。
GPTQ算法需要通过对量化模型进行推理来校准模型的量化权重。详细的量化算法在原始论文中有描述。
基于auto-gptq
开源实现库,transformers 支持使用GPTQ算法量化的模型。
使用 GPTQ 量化模型
为了使用 auto-gptq
库量化一个模型,我们需要向量化器传递一个数据集。
通常有两种方式构造数据集:
- 量化器支持的默认数据集(包括
['wikitext2','c4','c4-new','ptb','ptb-new']
) - 一个字符串列表(这些字符串将被用作数据集)
使用 GPTQ 算法支持的默认数据集来量化
在下面的示例中,让我们尝试使用"wikitext2"
数据集将模型量化为4位精度。支持的精度有[2, 4, 6, 8]
。
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
import torchmodel_name_or_path = "/models/opt-2.7b/"quantization_config = GPTQConfig(bits=4, # 量化精度group_size=128,dataset="wikitext2",desc_act=False,
)
/root/anaconda3/envs/gtpq/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.htmlfrom .autonotebook import tqdm as notebook_tqdm
逐层量化
关于 CUDA extension not installed
的说明: https://github.com/AutoGPTQ/AutoGPTQ/issues/249
quant_model = AutoModelForCausalLM.from_pretrained(model_name_or_path,quantization_config=quantization_config,device_map='auto')
/root/anaconda3/envs/gtpq/lib/python3.9/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:411: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.def forward(ctx, input, qweight, scales, qzeros, g_idx, bits, maxq):
/root/anaconda3/envs/gtpq/lib/python3.9/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:419: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.def backward(ctx, grad_output):
/root/anaconda3/envs/gtpq/lib/python3.9/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:461: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.@custom_fwd(cast_inputs=torch.float16)
CUDA extension not installed.
CUDA extension not installed.
Generating test split: 100%|████████████████████████████████████████████████████████████████| 4358/4358 [00:00<00:00, 546042.62 examples/s]
Generating train split: 100%|████████████████████████████████████████████████████████████| 36718/36718 [00:00<00:00, 1074751.07 examples/s]
Generating validation split: 100%|██████████████████████████████████████████████████████████| 3760/3760 [00:00<00:00, 787033.79 examples/s]
Quantizing model.decoder.layers blocks : 0%| | 0/32 [00:00<?, ?it/s]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:07, 1.53s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:05, 1.35s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.28s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:05<00:02, 1.24s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.23s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.03s/it]
Quantizing model.decoder.layers blocks : 3%|█▉ | 1/32 [00:13<06:46, 13.11s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.21s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.21s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.21s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.21s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.02s/it]
Quantizing model.decoder.layers blocks : 6%|███▉ | 2/32 [00:25<06:28, 12.96s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.20s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.20s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.20s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.20s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.21s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.03s/it]
Quantizing model.decoder.layers blocks : 9%|█████▊ | 3/32 [00:38<06:14, 12.90s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.21s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.21s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.21s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.21s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.22s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.04s/it]
Quantizing model.decoder.layers blocks : 12%|███████▊ | 4/32 [00:51<06:01, 12.90s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.21s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.21s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.22s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.04s/it]
Quantizing model.decoder.layers blocks : 16%|█████████▋ | 5/32 [01:04<05:48, 12.91s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.21s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.21s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.21s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.22s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.04s/it]
Quantizing model.decoder.layers blocks : 19%|███████████▋ | 6/32 [01:17<05:35, 12.91s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.20s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.20s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.20s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.20s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.21s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.02s/it]
Quantizing model.decoder.layers blocks : 22%|█████████████▌ | 7/32 [01:30<05:21, 12.88s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.28s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:05, 1.27s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.27s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:05<00:02, 1.24s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.23s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.04s/it]
Quantizing model.decoder.layers blocks : 25%|███████████████▌ | 8/32 [01:43<05:10, 12.93s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.21s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.21s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.21s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.21s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.21s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.04s/it]
Quantizing model.decoder.layers blocks : 28%|█████████████████▍ | 9/32 [01:56<04:57, 12.92s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.22s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.22s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.23s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.06s/it]
Quantizing model.decoder.layers blocks : 31%|███████████████████ | 10/32 [02:09<04:44, 12.94s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.22s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.22s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.22s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.06s/it]
Quantizing model.decoder.layers blocks : 34%|████████████████████▉ | 11/32 [02:22<04:32, 12.96s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.22s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.22s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.24s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.14s/it]
Quantizing model.decoder.layers blocks : 38%|██████████████████████▉ | 12/32 [02:35<04:20, 13.04s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.28s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:05, 1.27s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.25s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.24s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.23s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.06s/it]
Quantizing model.decoder.layers blocks : 41%|████████████████████████▊ | 13/32 [02:48<04:07, 13.05s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.23s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.22s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.23s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.06s/it]
Quantizing model.decoder.layers blocks : 44%|██████████████████████████▋ | 14/32 [03:01<03:54, 13.04s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.25s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:05<00:02, 1.26s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.27s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:13<00:00, 3.18s/it]
Quantizing model.decoder.layers blocks : 47%|████████████████████████████▌ | 15/32 [03:15<03:43, 13.16s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.23s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.23s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.22s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.22s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.23s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.07s/it]
Quantizing model.decoder.layers blocks : 50%|██████████████████████████████▌ | 16/32 [03:28<03:29, 13.12s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.23s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.23s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.24s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.13s/it]
Quantizing model.decoder.layers blocks : 53%|████████████████████████████████▍ | 17/32 [03:41<03:17, 13.15s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.28s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:05, 1.28s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.27s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:05<00:02, 1.27s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.27s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:13<00:00, 3.17s/it]
Quantizing model.decoder.layers blocks : 56%|██████████████████████████████████▎ | 18/32 [03:54<03:05, 13.25s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.27s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:05, 1.27s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.25s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.23s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.23s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.07s/it]
Quantizing model.decoder.layers blocks : 59%|████████████████████████████████████▏ | 19/32 [04:07<02:51, 13.20s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.22s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.22s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.23s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.07s/it]
Quantizing model.decoder.layers blocks : 62%|██████████████████████████████████████▏ | 20/32 [04:20<02:37, 13.15s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.22s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.22s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.23s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.08s/it]
Quantizing model.decoder.layers blocks : 66%|████████████████████████████████████████ | 21/32 [04:33<02:24, 13.12s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.22s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.22s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.23s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.07s/it]
Quantizing model.decoder.layers blocks : 69%|█████████████████████████████████████████▉ | 22/32 [04:46<02:10, 13.09s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.28s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:05, 1.27s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.25s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:05<00:02, 1.24s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.24s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.09s/it]
Quantizing model.decoder.layers blocks : 72%|███████████████████████████████████████████▊ | 23/32 [05:00<01:58, 13.11s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.23s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.22s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.23s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.08s/it]
Quantizing model.decoder.layers blocks : 75%|█████████████████████████████████████████████▊ | 24/32 [05:13<01:44, 13.10s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.23s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.22s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.22s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.23s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.06s/it]
Quantizing model.decoder.layers blocks : 78%|███████████████████████████████████████████████▋ | 25/32 [05:26<01:31, 13.07s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.20s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.21s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.21s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.23s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.24s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.09s/it]
Quantizing model.decoder.layers blocks : 81%|█████████████████████████████████████████████████▌ | 26/32 [05:39<01:18, 13.07s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.22s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.23s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.25s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:13<00:00, 3.19s/it]
Quantizing model.decoder.layers blocks : 84%|███████████████████████████████████████████████████▍ | 27/32 [05:52<01:05, 13.17s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.23s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.22s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.22s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.22s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:13<00:00, 3.15s/it]
Quantizing model.decoder.layers blocks : 88%|█████████████████████████████████████████████████████▍ | 28/32 [06:05<00:52, 13.20s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.22s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.21s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.21s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:04<00:02, 1.21s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.22s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.13s/it]
Quantizing model.decoder.layers blocks : 91%|███████████████████████████████████████████████████████▎ | 29/32 [06:19<00:39, 13.20s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.27s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:05, 1.27s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.27s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:05<00:02, 1.28s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.26s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:12<00:00, 3.10s/it]
Quantizing model.decoder.layers blocks : 94%|█████████████████████████████████████████████████████████▏ | 30/32 [06:32<00:26, 13.22s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.23s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:04, 1.22s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.25s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:05<00:02, 1.26s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.27s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:13<00:00, 3.18s/it]
Quantizing model.decoder.layers blocks : 97%|███████████████████████████████████████████████████████████ | 31/32 [06:45<00:13, 13.28s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:01<00:06, 1.28s/it]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:02<00:05, 1.28s/it]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:03<00:03, 1.28s/it]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:05<00:02, 1.28s/it]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:06<00:01, 1.29s/it]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:13<00:00, 3.20s/it]
Quantizing model.decoder.layers blocks : 100%|█████████████████████████████████████████████████████████████| 32/32 [06:59<00:00, 13.11s/it]
/root/anaconda3/envs/gtpq/lib/python3.9/site-packages/transformers/modeling_utils.py:5055: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` insteadwarnings.warn(
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
检查量化模型正确性
通过检查线性层的属性
来确保模型已正确量化,它们应该包含qweight
和qzeros
属性,这些属性应该是torch.int32
数据类型。
quant_model.model.decoder.layers[0].self_attn.q_proj.__dict__
{'training': True,'_parameters': {},'_buffers': {'qweight': tensor([[ 1766754698, -1249142373, 1183631034, ..., -2038658921,-2037544795, -1956877206],[ 1772710025, 1739893370, -1500087466, ..., 2021033895,-662329995, 1756019066],[ -895658394, -2007414633, -1951893913, ..., -1429760649,-980833883, 1451914633],...,[ 2025363833, -1412855115, 1539086490, ..., -342189415,-1737062521, -1950833303],[ 2023326856, -1432974698, -1788251003, ..., -1161734024,2043176297, 1449571741],[ 1769506966, -2021291654, -1182103157, ..., 1999993942,1753778600, 1970816134]], device='cuda:0', dtype=torch.int32),'qzeros': tensor([[2004318071, 2004318071, 2004318071, ..., 2004318071, 2004318071,2004318071],[2004318071, 2004318071, 2004318071, ..., 2004318071, 2004318071,2004318071],[2004318071, 2004318071, 2004318071, ..., 2004318071, 2004318071,2004318071],...,[2004318071, 2004318071, 2004318071, ..., 2004318071, 2004318071,2004318071],[2004318071, 2004318071, 2004318071, ..., 2004318071, 2004318071,2004318071],[2004318071, 2004318071, 2004318071, ..., 2004318071, 2004318071,2004318071]], device='cuda:0', dtype=torch.int32),'scales': tensor([[0.0046, 0.0046, 0.0046, ..., 0.0078, 0.0068, 0.0056],[0.0053, 0.0041, 0.0071, ..., 0.0097, 0.0085, 0.0078],[0.0083, 0.0074, 0.0055, ..., 0.0076, 0.0074, 0.0089],...,[0.0050, 0.0055, 0.0056, ..., 0.0068, 0.0073, 0.0088],[0.0043, 0.0046, 0.0046, ..., 0.0078, 0.0097, 0.0060],[0.0093, 0.0063, 0.0062, ..., 0.0061, 0.0069, 0.0057]],device='cuda:0', dtype=torch.float16),'g_idx': tensor([ 0, 0, 0, ..., 19, 19, 19], device='cuda:0', dtype=torch.int32),'bias': tensor([-0.1272, 0.0172, 0.0103, ..., -0.0928, 0.0567, 0.0510],device='cuda:0', dtype=torch.float16)},'_non_persistent_buffers_set': set(),'_backward_pre_hooks': OrderedDict(),'_backward_hooks': OrderedDict(),'_is_full_backward_hook': None,'_forward_hooks': OrderedDict(),'_forward_hooks_with_kwargs': OrderedDict(),'_forward_hooks_always_called': OrderedDict(),'_forward_pre_hooks': OrderedDict(),'_forward_pre_hooks_with_kwargs': OrderedDict(),'_state_dict_hooks': OrderedDict(),'_state_dict_pre_hooks': OrderedDict(),'_load_state_dict_pre_hooks': OrderedDict(),'_load_state_dict_post_hooks': OrderedDict(),'_modules': {},'infeatures': 2560,'outfeatures': 2560,'bits': 4,'group_size': 128,'maxq': 15,'half_indim': 1280,'use_cuda_fp16': True,'wf': tensor([[ 0, 4, 8, 12, 16, 20, 24, 28]], dtype=torch.int32),'kernel_switch_threshold': 128,'autogptq_cuda_available': False,'autogptq_cuda': None,'trainable': False,'device': device(type='cuda', index=0)}
# 保存模型权重
quant_model.save_pretrained("models/opt-2.7b-gptq")
使用 GPU 加载模型并生成文本
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)text = "Merry Christmas! I'm glad to"
inputs = tokenizer(text, return_tensors="pt").to(0)out = quant_model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Merry Christmas! I'm glad to see you're still here.
Thank you! Merry Christmas to you too!
使用自定义数据集量化模型
通过字符串列表来自定义一个数据集,建议样本数不少于128(样本数太少会影响模型性能)
from transformers import AutoModelForCausalLM, GPTQConfig, AutoTokenizermodel_name_or_path = "facebook/opt-2.7b"
custom_dataset = ["auto-gptq is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm."]custom_quantization_config = GPTQConfig(bits=4,group_size=128,desc_act=False,dataset=custom_dataset
)custom_quant_model = AutoModelForCausalLM.from_pretrained(model_name_or_path,quantization_config=custom_quantization_config,torch_dtype=torch.float16,device_map="auto")
Error while downloading from https://cdn-lfs.hf-mirror.com/repos/eb/c7/ebc76bbc35570e8388a27954ac35d187f6e7829ee2479c2a26bb7f94cae1d419/476391562b9b2635c672d15bd69cfa881c5c824ebe89b00fee6c58947541e6e4?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27pytorch_model.bin%3B+filename%3D%22pytorch_model.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1736780312&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTczNjc4MDMxMn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9yZXBvcy9lYi9jNy9lYmM3NmJiYzM1NTcwZTgzODhhMjc5NTRhYzM1ZDE4N2Y2ZTc4MjllZTI0NzljMmEyNmJiN2Y5NGNhZTFkNDE5LzQ3NjM5MTU2MmI5YjI2MzVjNjcyZDE1YmQ2OWNmYTg4MWM1YzgyNGViZTg5YjAwZmVlNmM1ODk0NzU0MWU2ZTQ%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=hFsCAKwFfCUrYa0bgeEwtn-kn17ds0DtIe1Tb0XXnS9EK8BvQ06zTn6VS37qSBWLzcjJ0%7EpZG2xj2P0SC5rNj-Fm6%7EEY9hc97GRPhNr319oQHuUu5GwPLS8mglV-BPU8YZdU-eMTmtf9YDv0kZUpIMoBfH3HoHZmHPgh2yuJovV38wpr5wXhOeI2aKofGyi5tHjNFChts9NXDo-6JbNVuTPLfiHOSsSuKtbLpTLBXC8xO3yW2m%7ES3i77FXSljhdomwdNy6uLbf-kjSh1ROfQClVx7iU9VnrFT579FGCQRVHF6XnKL%7ELTYeo4vrv1yF2xEXNrvvbYk9pO5-S0ru0mBw__&Key-Pair-Id=K3RPWS32NSSJCE: HTTPSConnectionPool(host='cdn-lfs.hf-mirror.com', port=443): Read timed out.
Trying to resume download...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:- Avoid using `tokenizers` before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Quantizing model.decoder.layers blocks : 0%| | 0/32 [00:00<?, ?it/s]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.31it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.32it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.32it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.32it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.31it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:06<00:00, 1.59s/it]
Quantizing model.decoder.layers blocks : 3%|█▉ | 1/32 [00:06<03:36, 7.00s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.25it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.25it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.24it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.24it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:04<00:00, 1.22it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.63s/it]
Quantizing model.decoder.layers blocks : 6%|███▉ | 2/32 [00:14<03:34, 7.15s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.32it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.32it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.32it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.32it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.30it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.66s/it]
Quantizing model.decoder.layers blocks : 9%|█████▊ | 3/32 [00:21<03:28, 7.18s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.26it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.25it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.28it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.28it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.26it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.67s/it]
Quantizing model.decoder.layers blocks : 12%|███████▊ | 4/32 [00:28<03:22, 7.24s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:04, 1.24it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.25it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.25it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.25it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:04<00:00, 1.24it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.67s/it]
Quantizing model.decoder.layers blocks : 16%|█████████▋ | 5/32 [00:36<03:16, 7.29s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.30it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.30it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.30it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.27it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.25it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.62s/it]
Quantizing model.decoder.layers blocks : 19%|███████████▋ | 6/32 [00:43<03:08, 7.25s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:04, 1.24it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.24it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.24it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.23it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:04<00:00, 1.22it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.72s/it]
Quantizing model.decoder.layers blocks : 22%|█████████████▌ | 7/32 [00:50<03:03, 7.35s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:04, 1.24it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.25it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.25it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.28it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.29it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.60s/it]
Quantizing model.decoder.layers blocks : 25%|███████████████▌ | 8/32 [00:58<02:54, 7.28s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.34it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.33it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.32it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.32it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.31it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:06<00:00, 1.57s/it]
Quantizing model.decoder.layers blocks : 28%|█████████████████▍ | 9/32 [01:04<02:44, 7.17s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.36it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:02, 1.35it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.35it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:02<00:01, 1.34it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.33it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:06<00:00, 1.56s/it]
Quantizing model.decoder.layers blocks : 31%|███████████████████ | 10/32 [01:11<02:35, 7.08s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.34it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:02, 1.34it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.35it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:02<00:01, 1.35it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.30it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.67s/it]
Quantizing model.decoder.layers blocks : 34%|████████████████████▉ | 11/32 [01:19<02:29, 7.12s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:04, 1.22it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.24it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.27it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.30it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.30it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.58s/it]
Quantizing model.decoder.layers blocks : 38%|██████████████████████▉ | 12/32 [01:26<02:22, 7.10s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.33it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.32it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.32it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.32it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.31it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.60s/it]
Quantizing model.decoder.layers blocks : 41%|████████████████████████▊ | 13/32 [01:33<02:14, 7.08s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.31it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.33it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.33it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:02<00:01, 1.34it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.33it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:06<00:00, 1.56s/it]
Quantizing model.decoder.layers blocks : 44%|██████████████████████████▋ | 14/32 [01:40<02:06, 7.02s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.33it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:02, 1.34it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.34it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:02<00:01, 1.34it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.33it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:06<00:00, 1.56s/it]
Quantizing model.decoder.layers blocks : 47%|████████████████████████████▌ | 15/32 [01:46<01:58, 6.98s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.33it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.33it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.34it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.33it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.32it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.60s/it]
Quantizing model.decoder.layers blocks : 50%|██████████████████████████████▌ | 16/32 [01:53<01:51, 6.99s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.31it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.31it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.31it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.31it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.30it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.61s/it]
Quantizing model.decoder.layers blocks : 53%|████████████████████████████████▍ | 17/32 [02:01<01:45, 7.02s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.32it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.32it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.32it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.32it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.31it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.59s/it]
Quantizing model.decoder.layers blocks : 56%|██████████████████████████████████▎ | 18/32 [02:08<01:38, 7.02s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:04, 1.24it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.23it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.24it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.24it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:04<00:00, 1.23it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.69s/it]
Quantizing model.decoder.layers blocks : 59%|████████████████████████████████████▏ | 19/32 [02:15<01:32, 7.15s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.30it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.32it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.33it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.33it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.31it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:06<00:00, 1.57s/it]
Quantizing model.decoder.layers blocks : 62%|██████████████████████████████████████▏ | 20/32 [02:22<01:25, 7.09s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.31it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.32it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.32it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.33it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.32it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:06<00:00, 1.58s/it]
Quantizing model.decoder.layers blocks : 66%|████████████████████████████████████████ | 21/32 [02:29<01:17, 7.05s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:04, 1.20it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.21it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.26it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.29it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.29it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.60s/it]
Quantizing model.decoder.layers blocks : 69%|█████████████████████████████████████████▉ | 22/32 [02:36<01:10, 7.08s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.32it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.31it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.32it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.31it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.31it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.59s/it]
Quantizing model.decoder.layers blocks : 72%|███████████████████████████████████████████▊ | 23/32 [02:43<01:03, 7.06s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:04, 1.22it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.22it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.22it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.22it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:04<00:00, 1.21it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.73s/it]
Quantizing model.decoder.layers blocks : 75%|█████████████████████████████████████████████▊ | 24/32 [02:51<00:57, 7.22s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.31it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.32it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.32it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.32it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.30it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.61s/it]
Quantizing model.decoder.layers blocks : 78%|███████████████████████████████████████████████▋ | 25/32 [02:58<00:50, 7.18s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.31it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.31it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.31it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.32it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.31it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:06<00:00, 1.59s/it]
Quantizing model.decoder.layers blocks : 81%|█████████████████████████████████████████████████▌ | 26/32 [03:05<00:42, 7.12s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.31it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.31it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.32it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.32it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.31it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.59s/it]
Quantizing model.decoder.layers blocks : 84%|███████████████████████████████████████████████████▍ | 27/32 [03:12<00:35, 7.09s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.32it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.32it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.32it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.32it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.31it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.60s/it]
Quantizing model.decoder.layers blocks : 88%|█████████████████████████████████████████████████████▍ | 28/32 [03:19<00:28, 7.08s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:04, 1.22it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.30it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.31it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.33it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.32it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:06<00:00, 1.58s/it]
Quantizing model.decoder.layers blocks : 91%|███████████████████████████████████████████████████████▎ | 29/32 [03:26<00:21, 7.05s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.34it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.26it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.26it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.29it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.30it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.59s/it]
Quantizing model.decoder.layers blocks : 94%|█████████████████████████████████████████████████████████▏ | 30/32 [03:33<00:14, 7.05s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.33it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.32it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.27it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.26it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.24it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.70s/it]
Quantizing model.decoder.layers blocks : 97%|███████████████████████████████████████████████████████████ | 31/32 [03:40<00:07, 7.16s/it]
[Antizing layers inside the block: 0%| | 0/6 [00:00<?, ?it/s]
[Antizing layers inside the block: 17%|███████████▎ | 1/6 [00:00<00:03, 1.31it/s]
[Antizing layers inside the block: 33%|██████████████████████▋ | 2/6 [00:01<00:03, 1.32it/s]
[Antizing layers inside the block: 50%|██████████████████████████████████ | 3/6 [00:02<00:02, 1.33it/s]
[Antizing layers inside the block: 67%|█████████████████████████████████████████████▎ | 4/6 [00:03<00:01, 1.32it/s]
[Antizing layers inside the block: 83%|████████████████████████████████████████████████████████▋ | 5/6 [00:03<00:00, 1.31it/s]
[Antizing layers inside the block: 100%|████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00, 1.60s/it]
Quantizing model.decoder.layers blocks : 100%|█████████████████████████████████████████████████████████████| 32/32 [03:47<00:00, 7.12s/it]
/root/anaconda3/envs/gtpq/lib/python3.9/site-packages/transformers/modeling_utils.py:5055: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` insteadwarnings.warn(
使用自己的数据集效果很差
text = "Merry Christmas! I'm glad to"
inputs = tokenizer(text, return_tensors="pt").to(0)out = custom_quant_model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Merry Christmas! I'm glad to.
.
.
.
.
.
.
.
.
.
.
.
.
.
.