基于昇腾服务器NPU卡部署,mindie容器镜像+大模型方式        "port" : 1025,        "managementPort" : 1026,        "metricsPort" : 1027,模型文件存放路径cd /modelscope/QWQ-32B  【mindie20t17容器镜像需要安装4.51.0_transformers.tar】(已更新到T18镜像无需在安装)容器内操作:tar xf 4.51.0_transformers.tarpip install --no-index --find-links=./transformers_offline_packages transformers==4.51.0  使用自行构建的普通用户镜像,并且规避容器相关权限风险,可以使用以下命令指定用户与设备:docker run -it -d --net=host --shm-size=500g \    --name QWQ-32B-test \    --device=/dev/davinci_manager \    --device=/dev/hisi_hdc \    --device=/dev/devmm_svm \    --device=/dev/davinci0 \    --device=/dev/davinci1 \    --device=/dev/davinci2 \    --device=/dev/davinci3 \    --device=/dev/davinci4 \    --device=/dev/davinci5 \    --device=/dev/davinci6 \    --device=/dev/davinci7 \--privileged=true \    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \    -v /usr/local/sbin:/usr/local/sbin:ro \    -v /modelscope/QWQ-32B:/modelscope/QWQ-32B \    mindie20t18:latest bash 进入容器docker exec -it ${容器名称} bash  # 设置CANN包的环境变量source /usr/local/Ascend/ascend-toolkit/set_env.sh# 关闭虚拟内存export PYTORCH_NPU_ALLOC_CONF=expandable_segments:False  服务化推理 打开配置文件vim /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json 更改配置文件[root@ip241 /]# cat /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json{    "Version" : "1.0.0",    "LogConfig" :    {        "logLevel" : "Info",        "logFileSize" : 20,        "logFileNum" : 20,        "logPath" : "logs/mindie-server.log"    },     "ServerConfig" :    {        "ipAddress" : "10.32.xx.111",        "managementIpAddress" : "10.32.xx.111",        "port" : 1025,        "managementPort" : 1026,        "metricsPort" : 1027,        "allowAllZeroIpListening" : false,        "maxLinkNum" : 1000,        "httpsEnabled" : false,           //关闭https        "fullTextEnabled" : false,        "tlsCaPath" : "security/ca/",        "tlsCaFile" : ["ca.pem"],        "tlsCert" : "security/certs/server.pem",        "tlsPk" : "security/keys/server.key.pem",        "tlsPkPwd" : "security/pass/key_pwd.txt",        "tlsCrlPath" : "security/certs/",        "tlsCrlFiles" : ["server_crl.pem"],        "managementTlsCaFile" : ["management_ca.pem"],        "managementTlsCert" : "security/certs/management/server.pem",        "managementTlsPk" : "security/keys/management/server.key.pem",        "managementTlsPkPwd" : "security/pass/management/key_pwd.txt",        "managementTlsCrlPath" : "security/management/certs/",        "managementTlsCrlFiles" : ["server_crl.pem"],        "kmcKsfMaster" : "tools/pmt/master/ksfa",        "kmcKsfStandby" : "tools/pmt/standby/ksfb",        "inferMode" : "standard",        "interCommTLSEnabled" : true,        "interCommPort" : 1121,        "interCommTlsCaPath" : "security/grpc/ca/",        "interCommTlsCaFiles" : ["ca.pem"],        "interCommTlsCert" : "security/grpc/certs/server.pem",        "interCommPk" : "security/grpc/keys/server.key.pem",        "interCommPkPwd" : "security/grpc/pass/key_pwd.txt",        "interCommTlsCrlPath" : "security/grpc/certs/",        "interCommTlsCrlFiles" : ["server_crl.pem"],        "openAiSupport" : "vllm"    },     "BackendConfig" : {        "backendName" : "mindieservice_llm_engine",        "modelInstanceNumber" : 1,        "npuDeviceIds" : [[0,1,2,3]],       //根据实际NPU卡调整        "tokenizerProcessNumber" : 8,        "multiNodesInferEnabled" : false,        "multiNodesInferPort" : 1120,        "interNodeTLSEnabled" : true,        "interNodeTlsCaPath" : "security/grpc/ca/",        "interNodeTlsCaFiles" : ["ca.pem"],        "interNodeTlsCert" : "security/grpc/certs/server.pem",        "interNodeTlsPk" : "security/grpc/keys/server.key.pem",        "interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",        "interNodeTlsCrlPath" : "security/grpc/certs/",        "interNodeTlsCrlFiles" : ["server_crl.pem"],        "interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",        "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",        "ModelDeployConfig" :        {            "maxSeqLen" : 102400,          //总序列长度(输入 + 输出)的最大值            "maxInputTokenLen" : 71680,     //最大输入 token 数            "truncation" : false,            "ModelConfig" : [                {                    "modelInstanceType" : "Standard",                    "modelName" : "QwQ-32B",                    "modelWeightPath" : "/modelscope/QWQ-32B",                    "worldSize" : 4,          //根据上面NPU卡数量来                    "cpuMemSize" : 5,                    "npuMemSize" : -1,                    "backendType" : "atb",                    "trustRemoteCode" : false                }            ]        },         "ScheduleConfig" :        {            "templateType" : "Standard",            "templateName" : "Standard_LLM",            "cacheBlockSize" : 128,             "maxPrefillBatchSize" : 50,            "maxPrefillTokens" : 86016,        //允许最大输入 token 数            "prefillTimeMsPerReq" : 150,            "prefillPolicyType" : 0,             "decodeTimeMsPerReq" : 50,            "decodePolicyType" : 0,             "maxBatchSize" : 200,            "maxIterTimes" : 30720,      #模型最大可能输出长度的倍数            "maxPreemptCount" : 0,            "supportSelectBatch" : false,            "maxQueueDelayMicroseconds" : 5000        }    }}    #容器内导入环境变量source /usr/local/Ascend/ascend-toolkit/set_env.shsource /usr/local/Ascend/mindie/set_env.shsource /usr/local/Ascend/nnal/atb/set_env.shsource /usr/local/Ascend/atb-models/set_env.sh#文件路径赋权chmod -R 640 /modelscope/QWQ-32B/   拉起服务化#cd /usr/local/Ascend/mindie/latest/mindie-service/bin#nohup  ./mindieservice_daemon & 关服务pkill -9 'mindie|python' #    新建窗口测试(VLLM接口)#curl http://10.32.xx.111:1025/generate -d '{"prompt": "你是谁?","max_tokens": 100,"stream": false,"do_sample":true,"repetition_penalty": 1.00,"temperature": 0.01,"top_p": 0.001,"top_k": 1,"model": "llama"}'    常见问题     ImportError: cannot import name 'shard_checkpoint' from 'transformers.modeling_utils'. 降低transformers版本可解决。 pip install transformers==4.46.3 --force-reinstallpip install numpy==1.26.4 --force-reinstall ============================================================================查看接口调用情况监控起服务窗口导入环境变量export MIES_SERVICE_MONITOR_MODE=1  curl -H "Accept: application/json" -H "Content-type: application/json" -X GET http://10.32.xx.111:1027/metrics   ============性能测试======容器内操作=======================export MINDIE_LOG_TO_STDOUT="benchmark:1; client:1"   #性能测试完结果输出到屏幕 [root@ip241 config]# pwdcd /usr/local/lib/python3.11/site-packages/mindiebenchmark/config[root@ip241 config]# cat config.json {    "LogConfig": {        "LOG_PATH": "~/mindie/log",        "LOG_TO_FILE": "1",        "LOG_TO_STDOUT": "benchmark:1; client:1",        "LOG_LEVEL": "INFO",        "LOG_VERBOSE": "true",        "LOG_ROTATE": "-fs 20 -r 10"    },    "CertConfig": {        "CA_CERT": "/path/to/cacert.pem",        "KEY_FILE": "/path/to/client.pem.key",        "CERT_FILE": "/path/to/client.pem",        "CRL_FILE": "/path/to/crl.pem"    },    "OutputConfig": {        "INSTANCE_PATH": "./instance"    },    "ServerConfig": {        "ENABLE_MANAGEMENT": false,        "MAX_LINK_NUM": 1000    }} [root@ip241 config]#[root@ip241 config]# cat synthetic_config.json{    "Input":{        "Method": "uniform",        "Params": {"MinValue": 1, "MaxValue": 200}    },    "Output": {        "Method": "gaussian",        "Params": {"Mean": 100, "Var": 200, "MinValue": 1, "MaxValue": 100}    },    "RequestCount": 100}[[root@ip241 config]#[root@ip241config]#cat /usr/local/lib/python3.11/site-packages/mindieclient/python/config/config.json{    "LogConfig": {        "LOG_PATH": "~/mindie/log",        "LOG_TO_FILE": "1",        "LOG_TO_STDOUT": "benchmark:1; client:1",        "LOG_LEVEL": "INFO",        "LOG_VERBOSE": "true",        "LOG_ROTATE": "-fs 20 -r 10"    }} 设置640权限chmod 640 /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/synthetic_config.jsonchmod 640 /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/config.jsonchmod 640 /usr/local/lib/python3.11/site-packages/mindieclient/python/config/config.json  进入QWQ-32B的容器 性能测试benchmark --DatasetType "synthetic" --ModelName QwQ-32B --ModelPath "/iflytek/modelscope/QWQ32B" --TestType vllm_client --Http http://10.32.xx.111:1025 --ManagementHttp http://10.32.xx.111:1026 --Concurrency 128 --MaxOutputLen 512 --TaskKind stream --Tokenizer True --SyntheticConfigPath /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/synthetic_config.json[图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/user/attachment/v1/download?aid=2009293169388834816[图片]     Common MetricValue说明CurrentTime2025-03-31 12:15:17当前时间,即性能测试执行完毕的时间。TimeElapsed5.5949 s性能测试的总耗时,单位是秒。DataSourceNone数据源,这里显示为 None,表示没有特定的数据源。Failed0( 0.0% )失败的请求数及其占比。在这个例子中,没有任何请求失败。Returned100( 100.0% )成功返回的请求数及其占比。在这个例子中,所有 100 个请求都成功返回。Total100[ 100.0% ]总请求数及其占比。在这个例子中,总共发送了 100 个请求,全部成功。Concurrency128并发请求数,即同时发送的请求数量。ModelNameQwQ-32B使用的模型名称。lpct7.9003 ms平均每个请求的延迟(Latency Per Client Time),单位是毫秒。Throughput17.8733 req/s每秒处理的请求数(吞吐量),单位是请求/秒。GenerateSpeed1687.6008 token/s模型每秒生成的 token 数量,单位是 token/秒。GenerateSpeedPerClient13.1844 token/s每个客户端每秒生成的 token 数量,单位是 token/秒。accuracy/准确率。在这个例子中,准确率未提供数据,因此显示为 /。 cat  /usr/local/lib/python3.11/site-packages/mindiebenchmark/config/synthetic_config.json[图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/user/attachment/v1/download?aid=2009293169388834817[图片] 具体解释1. Input(输入)· Method: 数据生成的方法。在这个例子中,使用的是 "uniform" 方法,即均匀分布。· Params: 均匀分布的具体参数。o MinValue: 均匀分布的最小值,这里是 1。o MaxValue: 均匀分布的最大值,这里是 200。2. Output(输出)· Method: 数据生成的方法。在这个例子中,使用的是 "gaussian" 方法,即高斯分布(正态分布)。· Params: 高斯分布的具体参数。o Mean: 高斯分布的均值,这里是 100。o Var: 高斯分布的方差,这里是 200。o MinValue: 高斯分布的最小截断值,这里是 1。o MaxValue: 高斯分布的最大截断值,这里是 100。3. RequestCount(请求数量)· RequestCount: 请求的数量,表示要生成的数据点的数量,这里是 100。