麒麟系统SP2 与昇腾300I芯片测试qwen7B模型记录

1. 查看系统版本

uname -a

Linux localhost.localdomain 4.19.90-24.4.v2101.ky10.aarch64 #1 SMP Mon May 24 14:45:37 CST 2021 aarch64 aarch64 aarch64 GNU/Linux

2. 查看显卡

npu-smi info

前情提要:

官网给出支持昇腾910架构,刚好有300I资源,测试一下,给大家提供参考~~菜鸟一枚还需向各位大佬学习

https://github.com/QwenLM/Qwen/tree/5aa84bdfd3237b37f01bc88cd49b3279b9a71d0b/ascend-supporticon-default.png?t=N7T8https://github.com/QwenLM/Qwen/tree/5aa84bdfd3237b37f01bc88cd49b3279b9a71d0b/ascend-support主要测试参考该方法,暂时不做深入研究。 

暂时了解 该系统可以做简单的算法模型,主要是架构不同,需要重新写算法,可以安装pytorch、tensorflow和mindformers等。

查看具体参数:

uname -m && cat /etc/*release

 

aarch64
Kylin Linux Advanced Server release V10 (Sword)
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Sword)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Sword)"
ANSI_COLOR="0;31"Kylin Linux Advanced Server release V10 (Sword)

3. 配置docker,有两种配置方法,一种在官网下载,一种直接用命令yum 安装即可

4. 安装minconda ,注意安装arrch64版本即可

5.按照教程配置,这里不做详细介绍了,直接给出记录

6.没有使用教程启动docker的命令,使用以下命令。

sudo docker run -it --rm -u root --network=host --ipc=host --device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2  --device=/dev/davinci3 --device=/dev/davinci4 --device=/dev/davinci5 --device=/dev/davinci6 --device=/dev/davinci7 --name=6bff46b104b8 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi -v /etc/ascend_install.info:/etc/ascend_install.info  -v /root/qwen/Qwen-7B-Chat:/data/qwen/models/Qwen-7B-Chat -v /var/log/npu/:/usr/slog  qwenllm/qwen-mindspore /bin/bash

成功启动docker。

7.转换模型

python3 /data/qwen/mindformers/research/qwen/convert_weight.py

/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.return self._float_to_str(self.smallest_subnormal)
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.return self._float_to_str(self.smallest_subnormal)
Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码,尤其如果你在9月25日前已经开始使用Qwen-7B,千万注意不要使用错误代码和模型。
Flash attention will be disabled because it does NOT support fp32.
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|??????????????????????????????????????????????????????????????????????????????| 8/8 [00:03<00:00,  2.35it/s]
Parameter (name=transformer.wte.weight, shape=torch.Size([151936, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.wte.weight->transformer.wte.embedding_weight
Parameter (name=transformer.h.0.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.ln_1.weight->transformer.layers.0.attention_norm.weight
Parameter (name=transformer.h.0.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.attn.c_attn.weight->transformer.layers.0.attn.c_attn.weight
Parameter (name=transformer.h.0.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.attn.c_attn.bias->transformer.layers.0.attn.c_attn.bias
Parameter (name=transformer.h.0.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.attn.c_proj.weight->transformer.layers.0.attention.wo.weight
Parameter (name=transformer.h.0.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.ln_2.weight->transformer.layers.0.ffn_norm.weight
Parameter (name=transformer.h.0.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.mlp.w1.weight->transformer.layers.0.feed_forward.w1.weight
Parameter (name=transformer.h.0.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.mlp.w2.weight->transformer.layers.0.feed_forward.w3.weight
Parameter (name=transformer.h.0.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.0.mlp.c_proj.weight->transformer.layers.0.feed_forward.w2.weight
Parameter (name=transformer.h.1.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.ln_1.weight->transformer.layers.1.attention_norm.weight
Parameter (name=transformer.h.1.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.attn.c_attn.weight->transformer.layers.1.attn.c_attn.weight
Parameter (name=transformer.h.1.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.attn.c_attn.bias->transformer.layers.1.attn.c_attn.bias
Parameter (name=transformer.h.1.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.attn.c_proj.weight->transformer.layers.1.attention.wo.weight
Parameter (name=transformer.h.1.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.ln_2.weight->transformer.layers.1.ffn_norm.weight
Parameter (name=transformer.h.1.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.mlp.w1.weight->transformer.layers.1.feed_forward.w1.weight
Parameter (name=transformer.h.1.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.mlp.w2.weight->transformer.layers.1.feed_forward.w3.weight
Parameter (name=transformer.h.1.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.1.mlp.c_proj.weight->transformer.layers.1.feed_forward.w2.weight
Parameter (name=transformer.h.2.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.ln_1.weight->transformer.layers.2.attention_norm.weight
Parameter (name=transformer.h.2.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.attn.c_attn.weight->transformer.layers.2.attn.c_attn.weight
Parameter (name=transformer.h.2.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.attn.c_attn.bias->transformer.layers.2.attn.c_attn.bias
Parameter (name=transformer.h.2.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.attn.c_proj.weight->transformer.layers.2.attention.wo.weight
Parameter (name=transformer.h.2.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.ln_2.weight->transformer.layers.2.ffn_norm.weight
Parameter (name=transformer.h.2.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.mlp.w1.weight->transformer.layers.2.feed_forward.w1.weight
Parameter (name=transformer.h.2.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.mlp.w2.weight->transformer.layers.2.feed_forward.w3.weight
Parameter (name=transformer.h.2.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.2.mlp.c_proj.weight->transformer.layers.2.feed_forward.w2.weight
Parameter (name=transformer.h.3.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.ln_1.weight->transformer.layers.3.attention_norm.weight
Parameter (name=transformer.h.3.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.attn.c_attn.weight->transformer.layers.3.attn.c_attn.weight
Parameter (name=transformer.h.3.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.attn.c_attn.bias->transformer.layers.3.attn.c_attn.bias
Parameter (name=transformer.h.3.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.attn.c_proj.weight->transformer.layers.3.attention.wo.weight
Parameter (name=transformer.h.3.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.ln_2.weight->transformer.layers.3.ffn_norm.weight
Parameter (name=transformer.h.3.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.mlp.w1.weight->transformer.layers.3.feed_forward.w1.weight
Parameter (name=transformer.h.3.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.mlp.w2.weight->transformer.layers.3.feed_forward.w3.weight
Parameter (name=transformer.h.3.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.3.mlp.c_proj.weight->transformer.layers.3.feed_forward.w2.weight
Parameter (name=transformer.h.4.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.ln_1.weight->transformer.layers.4.attention_norm.weight
Parameter (name=transformer.h.4.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.attn.c_attn.weight->transformer.layers.4.attn.c_attn.weight
Parameter (name=transformer.h.4.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.attn.c_attn.bias->transformer.layers.4.attn.c_attn.bias
Parameter (name=transformer.h.4.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.attn.c_proj.weight->transformer.layers.4.attention.wo.weight
Parameter (name=transformer.h.4.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.ln_2.weight->transformer.layers.4.ffn_norm.weight
Parameter (name=transformer.h.4.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.mlp.w1.weight->transformer.layers.4.feed_forward.w1.weight
Parameter (name=transformer.h.4.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.mlp.w2.weight->transformer.layers.4.feed_forward.w3.weight
Parameter (name=transformer.h.4.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.4.mlp.c_proj.weight->transformer.layers.4.feed_forward.w2.weight
Parameter (name=transformer.h.5.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.ln_1.weight->transformer.layers.5.attention_norm.weight
Parameter (name=transformer.h.5.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.attn.c_attn.weight->transformer.layers.5.attn.c_attn.weight
Parameter (name=transformer.h.5.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.attn.c_attn.bias->transformer.layers.5.attn.c_attn.bias
Parameter (name=transformer.h.5.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.attn.c_proj.weight->transformer.layers.5.attention.wo.weight
Parameter (name=transformer.h.5.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.ln_2.weight->transformer.layers.5.ffn_norm.weight
Parameter (name=transformer.h.5.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.mlp.w1.weight->transformer.layers.5.feed_forward.w1.weight
Parameter (name=transformer.h.5.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.mlp.w2.weight->transformer.layers.5.feed_forward.w3.weight
Parameter (name=transformer.h.5.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.5.mlp.c_proj.weight->transformer.layers.5.feed_forward.w2.weight
Parameter (name=transformer.h.6.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.ln_1.weight->transformer.layers.6.attention_norm.weight
Parameter (name=transformer.h.6.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.attn.c_attn.weight->transformer.layers.6.attn.c_attn.weight
Parameter (name=transformer.h.6.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.attn.c_attn.bias->transformer.layers.6.attn.c_attn.bias
Parameter (name=transformer.h.6.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.attn.c_proj.weight->transformer.layers.6.attention.wo.weight
Parameter (name=transformer.h.6.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.ln_2.weight->transformer.layers.6.ffn_norm.weight
Parameter (name=transformer.h.6.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.mlp.w1.weight->transformer.layers.6.feed_forward.w1.weight
Parameter (name=transformer.h.6.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.mlp.w2.weight->transformer.layers.6.feed_forward.w3.weight
Parameter (name=transformer.h.6.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.6.mlp.c_proj.weight->transformer.layers.6.feed_forward.w2.weight
Parameter (name=transformer.h.7.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.ln_1.weight->transformer.layers.7.attention_norm.weight
Parameter (name=transformer.h.7.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.attn.c_attn.weight->transformer.layers.7.attn.c_attn.weight
Parameter (name=transformer.h.7.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.attn.c_attn.bias->transformer.layers.7.attn.c_attn.bias
Parameter (name=transformer.h.7.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.attn.c_proj.weight->transformer.layers.7.attention.wo.weight
Parameter (name=transformer.h.7.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.ln_2.weight->transformer.layers.7.ffn_norm.weight
Parameter (name=transformer.h.7.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.mlp.w1.weight->transformer.layers.7.feed_forward.w1.weight
Parameter (name=transformer.h.7.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.mlp.w2.weight->transformer.layers.7.feed_forward.w3.weight
Parameter (name=transformer.h.7.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.7.mlp.c_proj.weight->transformer.layers.7.feed_forward.w2.weight
Parameter (name=transformer.h.8.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.ln_1.weight->transformer.layers.8.attention_norm.weight
Parameter (name=transformer.h.8.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.attn.c_attn.weight->transformer.layers.8.attn.c_attn.weight
Parameter (name=transformer.h.8.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.attn.c_attn.bias->transformer.layers.8.attn.c_attn.bias
Parameter (name=transformer.h.8.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.attn.c_proj.weight->transformer.layers.8.attention.wo.weight
Parameter (name=transformer.h.8.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.ln_2.weight->transformer.layers.8.ffn_norm.weight
Parameter (name=transformer.h.8.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.mlp.w1.weight->transformer.layers.8.feed_forward.w1.weight
Parameter (name=transformer.h.8.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.mlp.w2.weight->transformer.layers.8.feed_forward.w3.weight
Parameter (name=transformer.h.8.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.8.mlp.c_proj.weight->transformer.layers.8.feed_forward.w2.weight
Parameter (name=transformer.h.9.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.ln_1.weight->transformer.layers.9.attention_norm.weight
Parameter (name=transformer.h.9.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.attn.c_attn.weight->transformer.layers.9.attn.c_attn.weight
Parameter (name=transformer.h.9.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.attn.c_attn.bias->transformer.layers.9.attn.c_attn.bias
Parameter (name=transformer.h.9.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.attn.c_proj.weight->transformer.layers.9.attention.wo.weight
Parameter (name=transformer.h.9.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.ln_2.weight->transformer.layers.9.ffn_norm.weight
Parameter (name=transformer.h.9.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.mlp.w1.weight->transformer.layers.9.feed_forward.w1.weight
Parameter (name=transformer.h.9.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.mlp.w2.weight->transformer.layers.9.feed_forward.w3.weight
Parameter (name=transformer.h.9.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.9.mlp.c_proj.weight->transformer.layers.9.feed_forward.w2.weight
Parameter (name=transformer.h.10.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.ln_1.weight->transformer.layers.10.attention_norm.weight
Parameter (name=transformer.h.10.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.attn.c_attn.weight->transformer.layers.10.attn.c_attn.weight
Parameter (name=transformer.h.10.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.attn.c_attn.bias->transformer.layers.10.attn.c_attn.bias
Parameter (name=transformer.h.10.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.attn.c_proj.weight->transformer.layers.10.attention.wo.weight
Parameter (name=transformer.h.10.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.ln_2.weight->transformer.layers.10.ffn_norm.weight
Parameter (name=transformer.h.10.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.mlp.w1.weight->transformer.layers.10.feed_forward.w1.weight
Parameter (name=transformer.h.10.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.mlp.w2.weight->transformer.layers.10.feed_forward.w3.weight
Parameter (name=transformer.h.10.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.10.mlp.c_proj.weight->transformer.layers.10.feed_forward.w2.weight
Parameter (name=transformer.h.11.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.ln_1.weight->transformer.layers.11.attention_norm.weight
Parameter (name=transformer.h.11.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.attn.c_attn.weight->transformer.layers.11.attn.c_attn.weight
Parameter (name=transformer.h.11.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.attn.c_attn.bias->transformer.layers.11.attn.c_attn.bias
Parameter (name=transformer.h.11.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.attn.c_proj.weight->transformer.layers.11.attention.wo.weight
Parameter (name=transformer.h.11.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.ln_2.weight->transformer.layers.11.ffn_norm.weight
Parameter (name=transformer.h.11.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.mlp.w1.weight->transformer.layers.11.feed_forward.w1.weight
Parameter (name=transformer.h.11.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.mlp.w2.weight->transformer.layers.11.feed_forward.w3.weight
Parameter (name=transformer.h.11.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.11.mlp.c_proj.weight->transformer.layers.11.feed_forward.w2.weight
Parameter (name=transformer.h.12.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.ln_1.weight->transformer.layers.12.attention_norm.weight
Parameter (name=transformer.h.12.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.attn.c_attn.weight->transformer.layers.12.attn.c_attn.weight
Parameter (name=transformer.h.12.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.attn.c_attn.bias->transformer.layers.12.attn.c_attn.bias
Parameter (name=transformer.h.12.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.attn.c_proj.weight->transformer.layers.12.attention.wo.weight
Parameter (name=transformer.h.12.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.ln_2.weight->transformer.layers.12.ffn_norm.weight
Parameter (name=transformer.h.12.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.mlp.w1.weight->transformer.layers.12.feed_forward.w1.weight
Parameter (name=transformer.h.12.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.mlp.w2.weight->transformer.layers.12.feed_forward.w3.weight
Parameter (name=transformer.h.12.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.12.mlp.c_proj.weight->transformer.layers.12.feed_forward.w2.weight
Parameter (name=transformer.h.13.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.ln_1.weight->transformer.layers.13.attention_norm.weight
Parameter (name=transformer.h.13.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.attn.c_attn.weight->transformer.layers.13.attn.c_attn.weight
Parameter (name=transformer.h.13.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.attn.c_attn.bias->transformer.layers.13.attn.c_attn.bias
Parameter (name=transformer.h.13.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.attn.c_proj.weight->transformer.layers.13.attention.wo.weight
Parameter (name=transformer.h.13.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.ln_2.weight->transformer.layers.13.ffn_norm.weight
Parameter (name=transformer.h.13.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.mlp.w1.weight->transformer.layers.13.feed_forward.w1.weight
Parameter (name=transformer.h.13.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.mlp.w2.weight->transformer.layers.13.feed_forward.w3.weight
Parameter (name=transformer.h.13.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.13.mlp.c_proj.weight->transformer.layers.13.feed_forward.w2.weight
Parameter (name=transformer.h.14.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.ln_1.weight->transformer.layers.14.attention_norm.weight
Parameter (name=transformer.h.14.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.attn.c_attn.weight->transformer.layers.14.attn.c_attn.weight
Parameter (name=transformer.h.14.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.attn.c_attn.bias->transformer.layers.14.attn.c_attn.bias
Parameter (name=transformer.h.14.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.attn.c_proj.weight->transformer.layers.14.attention.wo.weight
Parameter (name=transformer.h.14.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.ln_2.weight->transformer.layers.14.ffn_norm.weight
Parameter (name=transformer.h.14.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.mlp.w1.weight->transformer.layers.14.feed_forward.w1.weight
Parameter (name=transformer.h.14.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.mlp.w2.weight->transformer.layers.14.feed_forward.w3.weight
Parameter (name=transformer.h.14.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.14.mlp.c_proj.weight->transformer.layers.14.feed_forward.w2.weight
Parameter (name=transformer.h.15.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.ln_1.weight->transformer.layers.15.attention_norm.weight
Parameter (name=transformer.h.15.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.attn.c_attn.weight->transformer.layers.15.attn.c_attn.weight
Parameter (name=transformer.h.15.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.attn.c_attn.bias->transformer.layers.15.attn.c_attn.bias
Parameter (name=transformer.h.15.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.attn.c_proj.weight->transformer.layers.15.attention.wo.weight
Parameter (name=transformer.h.15.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.ln_2.weight->transformer.layers.15.ffn_norm.weight
Parameter (name=transformer.h.15.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.mlp.w1.weight->transformer.layers.15.feed_forward.w1.weight
Parameter (name=transformer.h.15.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.mlp.w2.weight->transformer.layers.15.feed_forward.w3.weight
Parameter (name=transformer.h.15.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.15.mlp.c_proj.weight->transformer.layers.15.feed_forward.w2.weight
Parameter (name=transformer.h.16.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.ln_1.weight->transformer.layers.16.attention_norm.weight
Parameter (name=transformer.h.16.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.attn.c_attn.weight->transformer.layers.16.attn.c_attn.weight
Parameter (name=transformer.h.16.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.attn.c_attn.bias->transformer.layers.16.attn.c_attn.bias
Parameter (name=transformer.h.16.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.attn.c_proj.weight->transformer.layers.16.attention.wo.weight
Parameter (name=transformer.h.16.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.ln_2.weight->transformer.layers.16.ffn_norm.weight
Parameter (name=transformer.h.16.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.mlp.w1.weight->transformer.layers.16.feed_forward.w1.weight
Parameter (name=transformer.h.16.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.mlp.w2.weight->transformer.layers.16.feed_forward.w3.weight
Parameter (name=transformer.h.16.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.16.mlp.c_proj.weight->transformer.layers.16.feed_forward.w2.weight
Parameter (name=transformer.h.17.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.ln_1.weight->transformer.layers.17.attention_norm.weight
Parameter (name=transformer.h.17.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.attn.c_attn.weight->transformer.layers.17.attn.c_attn.weight
Parameter (name=transformer.h.17.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.attn.c_attn.bias->transformer.layers.17.attn.c_attn.bias
Parameter (name=transformer.h.17.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.attn.c_proj.weight->transformer.layers.17.attention.wo.weight
Parameter (name=transformer.h.17.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.ln_2.weight->transformer.layers.17.ffn_norm.weight
Parameter (name=transformer.h.17.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.mlp.w1.weight->transformer.layers.17.feed_forward.w1.weight
Parameter (name=transformer.h.17.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.mlp.w2.weight->transformer.layers.17.feed_forward.w3.weight
Parameter (name=transformer.h.17.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.17.mlp.c_proj.weight->transformer.layers.17.feed_forward.w2.weight
Parameter (name=transformer.h.18.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.ln_1.weight->transformer.layers.18.attention_norm.weight
Parameter (name=transformer.h.18.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.attn.c_attn.weight->transformer.layers.18.attn.c_attn.weight
Parameter (name=transformer.h.18.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.attn.c_attn.bias->transformer.layers.18.attn.c_attn.bias
Parameter (name=transformer.h.18.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.attn.c_proj.weight->transformer.layers.18.attention.wo.weight
Parameter (name=transformer.h.18.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.ln_2.weight->transformer.layers.18.ffn_norm.weight
Parameter (name=transformer.h.18.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.mlp.w1.weight->transformer.layers.18.feed_forward.w1.weight
Parameter (name=transformer.h.18.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.mlp.w2.weight->transformer.layers.18.feed_forward.w3.weight
Parameter (name=transformer.h.18.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.18.mlp.c_proj.weight->transformer.layers.18.feed_forward.w2.weight
Parameter (name=transformer.h.19.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.ln_1.weight->transformer.layers.19.attention_norm.weight
Parameter (name=transformer.h.19.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.attn.c_attn.weight->transformer.layers.19.attn.c_attn.weight
Parameter (name=transformer.h.19.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.attn.c_attn.bias->transformer.layers.19.attn.c_attn.bias
Parameter (name=transformer.h.19.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.attn.c_proj.weight->transformer.layers.19.attention.wo.weight
Parameter (name=transformer.h.19.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.ln_2.weight->transformer.layers.19.ffn_norm.weight
Parameter (name=transformer.h.19.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.mlp.w1.weight->transformer.layers.19.feed_forward.w1.weight
Parameter (name=transformer.h.19.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.mlp.w2.weight->transformer.layers.19.feed_forward.w3.weight
Parameter (name=transformer.h.19.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.19.mlp.c_proj.weight->transformer.layers.19.feed_forward.w2.weight
Parameter (name=transformer.h.20.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.ln_1.weight->transformer.layers.20.attention_norm.weight
Parameter (name=transformer.h.20.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.attn.c_attn.weight->transformer.layers.20.attn.c_attn.weight
Parameter (name=transformer.h.20.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.attn.c_attn.bias->transformer.layers.20.attn.c_attn.bias
Parameter (name=transformer.h.20.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.attn.c_proj.weight->transformer.layers.20.attention.wo.weight
Parameter (name=transformer.h.20.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.ln_2.weight->transformer.layers.20.ffn_norm.weight
Parameter (name=transformer.h.20.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.mlp.w1.weight->transformer.layers.20.feed_forward.w1.weight
Parameter (name=transformer.h.20.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.mlp.w2.weight->transformer.layers.20.feed_forward.w3.weight
Parameter (name=transformer.h.20.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.20.mlp.c_proj.weight->transformer.layers.20.feed_forward.w2.weight
Parameter (name=transformer.h.21.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.ln_1.weight->transformer.layers.21.attention_norm.weight
Parameter (name=transformer.h.21.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.attn.c_attn.weight->transformer.layers.21.attn.c_attn.weight
Parameter (name=transformer.h.21.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.attn.c_attn.bias->transformer.layers.21.attn.c_attn.bias
Parameter (name=transformer.h.21.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.attn.c_proj.weight->transformer.layers.21.attention.wo.weight
Parameter (name=transformer.h.21.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.ln_2.weight->transformer.layers.21.ffn_norm.weight
Parameter (name=transformer.h.21.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.mlp.w1.weight->transformer.layers.21.feed_forward.w1.weight
Parameter (name=transformer.h.21.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.mlp.w2.weight->transformer.layers.21.feed_forward.w3.weight
Parameter (name=transformer.h.21.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.21.mlp.c_proj.weight->transformer.layers.21.feed_forward.w2.weight
Parameter (name=transformer.h.22.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.ln_1.weight->transformer.layers.22.attention_norm.weight
Parameter (name=transformer.h.22.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.attn.c_attn.weight->transformer.layers.22.attn.c_attn.weight
Parameter (name=transformer.h.22.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.attn.c_attn.bias->transformer.layers.22.attn.c_attn.bias
Parameter (name=transformer.h.22.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.attn.c_proj.weight->transformer.layers.22.attention.wo.weight
Parameter (name=transformer.h.22.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.ln_2.weight->transformer.layers.22.ffn_norm.weight
Parameter (name=transformer.h.22.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.mlp.w1.weight->transformer.layers.22.feed_forward.w1.weight
Parameter (name=transformer.h.22.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.mlp.w2.weight->transformer.layers.22.feed_forward.w3.weight
Parameter (name=transformer.h.22.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.22.mlp.c_proj.weight->transformer.layers.22.feed_forward.w2.weight
Parameter (name=transformer.h.23.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.ln_1.weight->transformer.layers.23.attention_norm.weight
Parameter (name=transformer.h.23.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.attn.c_attn.weight->transformer.layers.23.attn.c_attn.weight
Parameter (name=transformer.h.23.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.attn.c_attn.bias->transformer.layers.23.attn.c_attn.bias
Parameter (name=transformer.h.23.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.attn.c_proj.weight->transformer.layers.23.attention.wo.weight
Parameter (name=transformer.h.23.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.ln_2.weight->transformer.layers.23.ffn_norm.weight
Parameter (name=transformer.h.23.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.mlp.w1.weight->transformer.layers.23.feed_forward.w1.weight
Parameter (name=transformer.h.23.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.mlp.w2.weight->transformer.layers.23.feed_forward.w3.weight
Parameter (name=transformer.h.23.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.23.mlp.c_proj.weight->transformer.layers.23.feed_forward.w2.weight
Parameter (name=transformer.h.24.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.ln_1.weight->transformer.layers.24.attention_norm.weight
Parameter (name=transformer.h.24.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.attn.c_attn.weight->transformer.layers.24.attn.c_attn.weight
Parameter (name=transformer.h.24.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.attn.c_attn.bias->transformer.layers.24.attn.c_attn.bias
Parameter (name=transformer.h.24.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.attn.c_proj.weight->transformer.layers.24.attention.wo.weight
Parameter (name=transformer.h.24.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.ln_2.weight->transformer.layers.24.ffn_norm.weight
Parameter (name=transformer.h.24.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.mlp.w1.weight->transformer.layers.24.feed_forward.w1.weight
Parameter (name=transformer.h.24.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.mlp.w2.weight->transformer.layers.24.feed_forward.w3.weight
Parameter (name=transformer.h.24.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.24.mlp.c_proj.weight->transformer.layers.24.feed_forward.w2.weight
Parameter (name=transformer.h.25.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.ln_1.weight->transformer.layers.25.attention_norm.weight
Parameter (name=transformer.h.25.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.attn.c_attn.weight->transformer.layers.25.attn.c_attn.weight
Parameter (name=transformer.h.25.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.attn.c_attn.bias->transformer.layers.25.attn.c_attn.bias
Parameter (name=transformer.h.25.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.attn.c_proj.weight->transformer.layers.25.attention.wo.weight
Parameter (name=transformer.h.25.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.ln_2.weight->transformer.layers.25.ffn_norm.weight
Parameter (name=transformer.h.25.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.mlp.w1.weight->transformer.layers.25.feed_forward.w1.weight
Parameter (name=transformer.h.25.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.mlp.w2.weight->transformer.layers.25.feed_forward.w3.weight
Parameter (name=transformer.h.25.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.25.mlp.c_proj.weight->transformer.layers.25.feed_forward.w2.weight
Parameter (name=transformer.h.26.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.ln_1.weight->transformer.layers.26.attention_norm.weight
Parameter (name=transformer.h.26.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.attn.c_attn.weight->transformer.layers.26.attn.c_attn.weight
Parameter (name=transformer.h.26.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.attn.c_attn.bias->transformer.layers.26.attn.c_attn.bias
Parameter (name=transformer.h.26.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.attn.c_proj.weight->transformer.layers.26.attention.wo.weight
Parameter (name=transformer.h.26.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.ln_2.weight->transformer.layers.26.ffn_norm.weight
Parameter (name=transformer.h.26.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.mlp.w1.weight->transformer.layers.26.feed_forward.w1.weight
Parameter (name=transformer.h.26.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.mlp.w2.weight->transformer.layers.26.feed_forward.w3.weight
Parameter (name=transformer.h.26.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.26.mlp.c_proj.weight->transformer.layers.26.feed_forward.w2.weight
Parameter (name=transformer.h.27.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.ln_1.weight->transformer.layers.27.attention_norm.weight
Parameter (name=transformer.h.27.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.attn.c_attn.weight->transformer.layers.27.attn.c_attn.weight
Parameter (name=transformer.h.27.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.attn.c_attn.bias->transformer.layers.27.attn.c_attn.bias
Parameter (name=transformer.h.27.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.attn.c_proj.weight->transformer.layers.27.attention.wo.weight
Parameter (name=transformer.h.27.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.ln_2.weight->transformer.layers.27.ffn_norm.weight
Parameter (name=transformer.h.27.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.mlp.w1.weight->transformer.layers.27.feed_forward.w1.weight
Parameter (name=transformer.h.27.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.mlp.w2.weight->transformer.layers.27.feed_forward.w3.weight
Parameter (name=transformer.h.27.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.27.mlp.c_proj.weight->transformer.layers.27.feed_forward.w2.weight
Parameter (name=transformer.h.28.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.ln_1.weight->transformer.layers.28.attention_norm.weight
Parameter (name=transformer.h.28.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.attn.c_attn.weight->transformer.layers.28.attn.c_attn.weight
Parameter (name=transformer.h.28.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.attn.c_attn.bias->transformer.layers.28.attn.c_attn.bias
Parameter (name=transformer.h.28.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.attn.c_proj.weight->transformer.layers.28.attention.wo.weight
Parameter (name=transformer.h.28.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.ln_2.weight->transformer.layers.28.ffn_norm.weight
Parameter (name=transformer.h.28.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.mlp.w1.weight->transformer.layers.28.feed_forward.w1.weight
Parameter (name=transformer.h.28.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.mlp.w2.weight->transformer.layers.28.feed_forward.w3.weight
Parameter (name=transformer.h.28.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.28.mlp.c_proj.weight->transformer.layers.28.feed_forward.w2.weight
Parameter (name=transformer.h.29.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.ln_1.weight->transformer.layers.29.attention_norm.weight
Parameter (name=transformer.h.29.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.attn.c_attn.weight->transformer.layers.29.attn.c_attn.weight
Parameter (name=transformer.h.29.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.attn.c_attn.bias->transformer.layers.29.attn.c_attn.bias
Parameter (name=transformer.h.29.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.attn.c_proj.weight->transformer.layers.29.attention.wo.weight
Parameter (name=transformer.h.29.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.ln_2.weight->transformer.layers.29.ffn_norm.weight
Parameter (name=transformer.h.29.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.mlp.w1.weight->transformer.layers.29.feed_forward.w1.weight
Parameter (name=transformer.h.29.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.mlp.w2.weight->transformer.layers.29.feed_forward.w3.weight
Parameter (name=transformer.h.29.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.29.mlp.c_proj.weight->transformer.layers.29.feed_forward.w2.weight
Parameter (name=transformer.h.30.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.ln_1.weight->transformer.layers.30.attention_norm.weight
Parameter (name=transformer.h.30.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.attn.c_attn.weight->transformer.layers.30.attn.c_attn.weight
Parameter (name=transformer.h.30.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.attn.c_attn.bias->transformer.layers.30.attn.c_attn.bias
Parameter (name=transformer.h.30.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.attn.c_proj.weight->transformer.layers.30.attention.wo.weight
Parameter (name=transformer.h.30.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.ln_2.weight->transformer.layers.30.ffn_norm.weight
Parameter (name=transformer.h.30.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.mlp.w1.weight->transformer.layers.30.feed_forward.w1.weight
Parameter (name=transformer.h.30.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.mlp.w2.weight->transformer.layers.30.feed_forward.w3.weight
Parameter (name=transformer.h.30.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.30.mlp.c_proj.weight->transformer.layers.30.feed_forward.w2.weight
Parameter (name=transformer.h.31.ln_1.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.ln_1.weight->transformer.layers.31.attention_norm.weight
Parameter (name=transformer.h.31.attn.c_attn.weight, shape=torch.Size([12288, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.attn.c_attn.weight->transformer.layers.31.attn.c_attn.weight
Parameter (name=transformer.h.31.attn.c_attn.bias, shape=torch.Size([12288]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.attn.c_attn.bias->transformer.layers.31.attn.c_attn.bias
Parameter (name=transformer.h.31.attn.c_proj.weight, shape=torch.Size([4096, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.attn.c_proj.weight->transformer.layers.31.attention.wo.weight
Parameter (name=transformer.h.31.ln_2.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.ln_2.weight->transformer.layers.31.ffn_norm.weight
Parameter (name=transformer.h.31.mlp.w1.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.mlp.w1.weight->transformer.layers.31.feed_forward.w1.weight
Parameter (name=transformer.h.31.mlp.w2.weight, shape=torch.Size([11008, 4096]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.mlp.w2.weight->transformer.layers.31.feed_forward.w3.weight
Parameter (name=transformer.h.31.mlp.c_proj.weight, shape=torch.Size([4096, 11008]), dtype=torch.float32, requires_grad=True)
name:  transformer.h.31.mlp.c_proj.weight->transformer.layers.31.feed_forward.w2.weight
Parameter (name=transformer.ln_f.weight, shape=torch.Size([4096]), dtype=torch.float32, requires_grad=True)
Parameter (name=lm_head.weight, shape=torch.Size([151936, 4096]), dtype=torch.float32, requires_grad=True)
Saving converted weights to /data/qwen/models/Qwen-7B-Chat/qwen-7b-chat.ckpt...
Done

配置路径,启动推理脚本。

cd /data/qwen/mindformers/research/qwen

export PYTHONPATH=/data/qwen/mindformers:$PYTHONPATH

python3 infer_qwen.py

/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.return self._float_to_str(self.smallest_subnormal)
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.setattr(self, word, getattr(machar, word).flat[0])
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.return self._float_to_str(self.smallest_subnormal)
[Warning]Can not find libascendalog.so
[Warning]Can not find libascendalog.so
Traceback (most recent call last):File "/data/qwen/mindformers/research/qwen/infer_qwen.py", line 4, in <module>from mindformers.trainer import TrainerFile "/data/qwen/mindformers/mindformers/__init__.py", line 17, in <module>from mindformers import core, auto_class, dataset, \File "/data/qwen/mindformers/mindformers/core/__init__.py", line 19, in <module>from .metric import build_metricFile "/data/qwen/mindformers/mindformers/core/metric/__init__.py", line 17, in <module>from .metric import *File "/data/qwen/mindformers/mindformers/core/metric/metric.py", line 37, in <module>from mindformers.models import BasicTokenizerFile "/data/qwen/mindformers/mindformers/models/__init__.py", line 21, in <module>from .blip2 import *File "/data/qwen/mindformers/mindformers/models/blip2/__init__.py", line 17, in <module>from .blip2_config import Blip2ConfigFile "/data/qwen/mindformers/mindformers/models/blip2/blip2_config.py", line 23, in <module>from mindformers.models.llama import LlamaConfigFile "/data/qwen/mindformers/mindformers/models/llama/__init__.py", line 18, in <module>from .llama import LlamaForCausalLM, LlamaForCausalLMWithLora, LlamaModelFile "/data/qwen/mindformers/mindformers/models/llama/llama.py", line 30, in <module>from mindspore.nn.layer.flash_attention import FlashAttentionFile "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/nn/layer/flash_attention.py", line 24, in <module>from mindspore.ops._op_impl._custom_op.flash_attention.flash_attention_impl import get_flash_attentionFile "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/__init__.py", line 17, in <module>from mindspore.ops._op_impl._custom_op.dsd_impl import dsd_matmulFile "/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/dsd_impl.py", line 17, in <module>from te import tikFile "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/te/__init__.py", line 128, in <module>from tbe import tvmFile "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/__init__.py", line 44, in <module>import tvmFile "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/__init__.py", line 26, in <module>from ._ffi.base import TVMError, __version__, _RUNTIME_ONLYFile "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/__init__.py", line 28, in <module>from .base import register_errorFile "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/base.py", line 72, in <module>_LIB, _LIB_NAME = _load_lib()File "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/base.py", line 52, in _load_liblib_path = libinfo.find_lib_path()File "/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/tvm/_ffi/libinfo.py", line 147, in find_lib_pathraise RuntimeError(message)
RuntimeError: Cannot find the files.
List of candidates:
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/lib/plugin/cpu/libtvm.so
/usr/local/Ascend/driver/libtvm.so
/data/qwen/mindformers/research/qwen/libtvm.so
/usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/bin/libtvm.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/aarch64-linux/ccec_compiler/bin/libtvm.so
/root/miniconda3/envs/mindspore2.2_py39/bin/libtvm.so
/root/miniconda3/condabin/libtvm.so
/usr/local/sbin/libtvm.so
/usr/local/bin/libtvm.so
/usr/sbin/libtvm.so
/usr/bin/libtvm.so
/usr/sbin/libtvm.so
/usr/bin/libtvm.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/libtvm.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/libtvm.so
/root/miniconda3/envs/mindspore2.2_py39/lib/python3.9/site-packages/mindspore/lib/plugin/cpu/libtvm_runtime.so
/usr/local/Ascend/driver/libtvm_runtime.so
/data/qwen/mindformers/research/qwen/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/bin/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/aarch64-linux/ccec_compiler/bin/libtvm_runtime.so
/root/miniconda3/envs/mindspore2.2_py39/bin/libtvm_runtime.so
/root/miniconda3/condabin/libtvm_runtime.so
/usr/local/sbin/libtvm_runtime.so
/usr/local/bin/libtvm_runtime.so
/usr/sbin/libtvm_runtime.so
/usr/bin/libtvm_runtime.so
/usr/sbin/libtvm_runtime.so
/usr/bin/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/python/site-packages/tbe/libtvm_runtime.so
/usr/local/Ascend/ascend-toolkit/7.0.RC1/libtvm_runtime.so

报错信息,应该是和配置芯片架构中缺少的文件,当前不做深入探究。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/286897.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

java线程池执行任务时异常被吃掉

问题 今天在测试环境通过线程池执行任务时突然被中断&#xff0c;跟踪日志发现代码跑到一半后面的日志就不再打印&#xff0c;而且也没有任何异常堆栈信息&#xff0c;也就是说程序执行被中断了&#xff0c;后面反复尝试经排查发现是线程池使用不当导致。 测试验证 我们用线程…

〖Python网络爬虫实战㊸〗- 极验滑块介绍(五)

订阅&#xff1a;新手可以订阅我的其他专栏。免费阶段订阅量1000 python项目实战 Python编程基础教程系列&#xff08;零基础小白搬砖逆袭) 说明&#xff1a;本专栏持续更新中&#xff0c;订阅本专栏前必读关于专栏〖Python网络爬虫实战〗转为付费专栏的订阅说明作者&#xff1…

光伏收益计算工具:助力可再生能源发展的关键

随着全球对可再生能源需求的不断增加&#xff0c;光伏发电作为清洁、可再生的能源形式&#xff0c;越来越受到人们的关注。然而&#xff0c;要评估光伏系统的经济效益和投资回报&#xff0c;需要一个准确的光伏收益计算工具。 光伏收益计算工具是一种专门用于计算光伏系统发电量…

vue el-date-picker中datetime类型对今天之后的日期包含时分禁用

vue el-date-picker中datetime类型对今天之后的日期包含时分禁用 目前对选择秒那一列未禁用 <template><div><el-date-pickerv-model"deactivateTime"type"datetime"format"yyyy-MM-dd HH:mm:ss"value-format"yyyy-MM-dd HH…

MATLAB 点云中心化 (40)

MATLAB 点云中心化 一、算法介绍二、算法实现一、算法介绍 使用点云集合中的坐标计算质心,这里将其作为中心,将每个点坐标减去该中心坐标,即可得到中心化的点云,这在很多处理中是必须进行的一个步骤:相当于点云移动到以质心为原点的坐标系 (主要是计算质心和点云偏移两个…

带你学C语言~指针(2)

目录 &#x1f3c9;前言 &#x1f680; 数组名的理解 &#x1f680;使用指针访问数组 ✈一维数组传参的本质 ✈冒泡排序 &#x1f3c6;二级指针 &#x1f3c6;指针数组 &#x1f3c6;指针数组模拟二维数组 &#x1f389;结束语 &#x1f3c9;前言 上一章&#xff0c;小…

Java基础回顾——异常处理

文章目录 介绍语法抛出异常方法定义使用throws多catchfinally 自定义异常断言logging其他日志库commons loggingLog4j 介绍 一些错误是用户造成的&#xff08;类型输入错误&#xff09;&#xff0c;一些错误的随机出现&#xff08;网络终端、内存耗尽。。。&#xff09;&#…

20倍压缩比!微软提出大模型提示压缩框架LLMLingua

近期&#xff0c;越来越多研究在探索大型语言模型&#xff08;LLM&#xff09;在实际应用中的推理和生成能力。随着 ChatGPT 等模型的广泛研究与应用&#xff0c;如何在保留关键信息的同时&#xff0c;压缩较长的提示成为当前大模型研究的问题之一。 为了加速模型推理并降低成本…

题目讲解(1到5)

1 编写一个能够输出 Hello,World! 的程序。 提示&#xff1a; 使用英文标点符号&#xff1b;Hello,World! 逗号后面没有空格。H 和 W 为大写字母。 输入格式 无 输出格式 无 输入输出样例 输入 #1复制 无 输出 #1复制 Hello,World! 这个不会写没啥讲的自裁吧,程序…

C : DS二叉排序树之删除(详细思路解答)

Description 给出一个数据序列&#xff0c;建立二叉排序树&#xff0c;并实现删除功能 对二叉排序树进行中序遍历&#xff0c;可以得到有序的数据序列 Input 第一行输入t&#xff0c;表示有t个数据序列 第二行输入n&#xff0c;表示首个序列包含n个数据 第三行输入n个数据…

JS逆向实战——开发者工具检测

说明&#xff1a;仅供学习使用&#xff0c;请勿用于非法用途&#xff0c;若有侵权&#xff0c;请联系博主删除 作者&#xff1a;zhu6201976 一、背景 在JS逆向领域&#xff0c;Chrome开发者工具是核心&#xff0c;抓包、调试、看调用栈等都离不开它。可以说&#xff0c;逆向人…

Linux基础入门笔记

Linux基础入门笔记&#xff0c;具体可见下载链接 大家阅读时可善用目录功能&#xff0c;可以提高大家的阅读效率 下载地址&#xff1a;Linux笔记 初始linux 操作系统&#xff1a;操作系统是用户和计算机硬件之间的桥梁&#xff0c;调度和管理计算机硬件进行工作常见操作系统 W…