本地化部署大模型,基于http://ip/metrics返回的数据进行性能监控方案一:前端UI界面+JavaScript+Chart.js动态渲染图表和数据可视化前端代码较多不便展示,感兴趣的可以借助AI自行编写方案二:部署Prometheus+Grafana,将http://ip/metrics返回的数据通过Grafana渲染,效果图如下[图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/jdc/api/attachment/downLoadByAid?path=202507/25/15283964d57bed-6b00-45e4-bb29-66e96d8620e7.png&aid=1155687&bussinessType=2&tid=1149202[图片]   以下为方案二部署Prometheus+Grafana方式进行大模型性能监控过程一、Prometheus服务端部署#部署tar -xzvf prometheus-2.37.5.linux-amd64.tar.gzmv prometheus-2.37.5.linux-amd64 /usr/local/prometheus/cd /usr/local/prometheus/vim prometheus.ymlnohup ./prometheus --config.file=/usr/local/prometheus/prometheus.yml & #测试浏览器访问http://ip/9090[图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/jdc/api/attachment/downLoadByAid?path=202507/25/152838fab067f3-aa8d-4e7d-b06a-c7c68a0b1468.png&aid=1155677&bussinessType=2&tid=1149202[图片] #修改配置文件后重启服务vim  /usr/local/prometheus/prometheus.ymlpkill prometheusnohup ./prometheus --config.file=/usr/local/prometheus/prometheus.yml &[图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/jdc/api/attachment/downLoadByAid?path=202507/25/152839b3eb6913-76ca-4577-8b5f-5f3d601db1ce.png&aid=1155679&bussinessType=2&tid=1149202[图片]    二、 Grafana WEB部署#部署,内网环境提前获取rpm包rpm -ivh grafana-enterprise-9.3.2-1.x86_64.rpmyum install -y --nogpgcheck grafana-enterprise-9.3.2-1.x86_64.rpmsudo systemctl start grafana-serversudo systemctl enable grafana-serversudo systemctl status grafana-servertail  -fn200 /var/log/grafana/grafana.logcat /etc/grafana/grafana.ini #测试浏览器访问http://ip/3000[图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/jdc/api/attachment/downLoadByAid?path=202507/25/152839a79a0ff7-6ddf-4739-84e7-de315e671051.png&aid=1155682&bussinessType=2&tid=1149202[图片]     三、 Grafana+Prometheus整合u 修改prometheus.yml配置(见prometheus服务端部署),把要监控的AI大模型http://ip/mertics地址配置进去,要保证能够打开有prometheus的类型监控数据,如下图[图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/jdc/api/attachment/downLoadByAid?path=202507/25/1528392a436a8e-9aea-4c92-83a8-0ed47d38d7d9.png&aid=1155680&bussinessType=2&tid=1149202[图片] #测试登录prometheus,http://ip/9090,Status---Targets,能看到配置的,状态要是绿色UP的,点击Graph有监控数据则正常[图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/jdc/api/attachment/downLoadByAid?path=202507/25/15283909375ec0-dd35-4d74-a30f-d1d92b53fc55.png&aid=1155681&bussinessType=2&tid=1149202[图片] [图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/jdc/api/attachment/downLoadByAid?path=202507/25/152839ae18520b-ae9e-42fd-b111-919b977ef520.png&aid=1155678&bussinessType=2&tid=1149202[图片]    u Grafana---设置---Add data source,根据实际情况选择数据源类型,修改Name,填写prometheus的URL地址[图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/jdc/api/attachment/downLoadByAid?path=202507/25/152839828df613-c9f8-4302-81da-c8603cc32dde.png&aid=1155683&bussinessType=2&tid=1149202[图片] [图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/jdc/api/attachment/downLoadByAid?path=202507/25/152839f94d2759-4e87-4bc5-99dd-fb3175db28d6.png&aid=1155684&bussinessType=2&tid=1149202[图片]  配完之后点击Dashboards查看[图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/jdc/api/attachment/downLoadByAid?path=202507/25/152838bd3953c1-c776-437e-a723-b04c8d04506d.png&aid=1155676&bussinessType=2&tid=1149202[图片] [图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/jdc/api/attachment/downLoadByAid?path=202507/25/1528396d7b6e96-fa6f-4898-be3b-c1742c1a3b9c.png&aid=1155686&bussinessType=2&tid=1149202[图片]   四、 手搓Dashboard面板未找到合适的模板,根据/metrics返回指标自定义一个Dashboard根据大模型http://ip/metrics返回的数据'num_requests_running': '当前正在处理的请求数量','num_requests_waiting': '当前等待处理的请求数量','num_requests_swapped': '已交换到CPU的请求数量','request_received_total': '收到的请求总数','request_success_total': '成功处理的请求总数','request_failed_total': '失败的请求总数','avg_prompt_throughput_toks_per_s': '平均每秒处理的输入token数','avg_generation_throughput_toks_per_s': '平均每秒生成的输出token数','npu_cache_usage_perc': 'NPU缓存使用率','cpu_cache_usage_perc': 'CPU缓存使用率','npu_prefix_cache_hit_rate': 'NPU前缀缓存命中率','failed_request_perc': '请求失败率','prompt_tokens_total': '总输入token数','generation_tokens_total': '总输出token数','num_preemptions_total': '引擎抢占累计次数','time_to_first_token_seconds': '首token响应时间','time_per_output_token_seconds': '每个输出token生成时间','e2e_request_latency_seconds': '端到端请求延迟','request_prompt_tokens': '请求输入token数量','request_generation_tokens': '请求输出token数量'逐个添加对应指标,如下图[图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/jdc/api/attachment/downLoadByAid?path=202507/25/15283991e4aaae-f8fc-4b75-b874-a0baf03c3b3c.png&aid=1155685&bussinessType=2&tid=1149202[图片] [图片]https://jdc100.huawei.com/CommunityGatewayService/com.huawei.ipd.sppm.jdcforum:JDCCommunityUserService/CommunityUserService/jdc/api/attachment/downLoadByAid?path=202507/25/152841f979e19b-0421-42c6-8e1b-5aff4e6ce263.png&aid=1155688&bussinessType=2&tid=1149202[图片]