my benchmark

本文介绍了一套详尽的系统性能测试方案,包括数据收集、Erlangs计算、CPU使用率、磁盘I/O操作及网络流量监测等关键指标的计算方法。通过对这些指标的深入分析,有助于理解系统的负载情况及瓶颈所在。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Data retrieval

    during the tests several commands were issued to gather information about the system, this has been done through running multiple shells on the benchmark server and running the commands to gather the information.

dstat -M swengine2 -M php -cdmnt | tee /tmp/fisier.dstat

iostat -xm 1 | tee /tmp/fisier.iostat

iostat -tm 1 | tee /tmp/fisier.iops

while true ; do echo "`date +%F-%T` `cat /proc/loadavg`" >> /tmp/fisier.load && sleep 1; done

    Calculate erlangs
    The file fisier.dstat is edited and all rows which on column 1 have entries of 0 are deleted (there may be a small delay between the moment the command is run and when the server is actually starting to receive calls). Also at the end of the file, all rows which on column 1 have entries of 0 are also deleted (due to the moment the server stopped processing calls and when the monitoring was stopped, there could be a delay).

    awk '$1 !~ /0/ {++count; sum+=$1}END {print sum/count}' fisier.dstat

    Calculate averages

    awk '{user[NR] = $1; nice[NR] = $2; asystem[NR] = $3; iowait[NR] = $4; steal[NR] = $5; idle[NR] = $6} /Device/ {++count; i = NR-2; sumiowait+=iowait[i]; sumuser+=user[i]; sumnice+=nice[i]; sumasystem+=asystem[i]; sumsteal+=steal[i]; sumidle+=idle[i];} END {print "Averages\n| %iowait | "sumiowait/count" |\n| %user   | "sumuser/count" |\n| %nice   | "sumnice/count"       |\n| %system | "sumasystem/count" |\n| %steal  | "sumsteal/count"       |\n| %idle   | "sumidle/count" |"}' fisier.iostat

    Calculate %CPU Usage

Substract %CPU Idle (from the output of the above command) from 100 and you get %CPU Usage

    Loadavg

    awk '{++count; sum+=$2} END {print (sum/count)}' fisier.load

    Transactions per second (replace xvda with the name of the HDD or partition you want to get the results for, e.g: sda, sdb1, sda6, etc.)

    awk '$1 ~ /^xvda$/ {++count; total+=$2; rps+=$3; wps+=$4} END {print total/count " average transactions per second\n"rps/count" average MB_read/s\n"wps/count" average MB_wrtn/s"}' fisier.iops

    On the PRODUCT folder within the Workdir folder, go under the directory named after the virtual IP of the server, tmp, manager, open dstat.csv, delete the first 7 rows, then save the file as dstat.csv.1.
    Refused call

    awk -F, '{if($3 > count) count=$3} { if($4 > max) max=$4} END { s=(max/count)*100; printf "%2.3f%\n", s }' dstat.csv.1

    Memory used

    awk -F, '{++count; sum+=$18} END {print (sum/count)/1024/1024" MB"}' dstat.csv.1

    Network received

    awk -F, '{++count; sum+=$13} END {print (sum/count)/1024/1024" MB"}' dstat.csv.1

    Network sent

    awk -F, '{++count; sum+=$14} END {print (sum/count)/1024/1024" MB"}' dstat.csv.1

 

 

 

内容概要:《中文大模型基准测评2025年上半年报告》由SuperCLUE团队发布,详细评估了2025年上半年中文大模型的发展状况。报告涵盖了大模型的关键进展、国内外大模型全景图及差距、专项测评基准介绍等。通过SuperCLUE基准,对45个国内外代表性大模型进行了六大任务(数学推理、科学推理、代码生成、智能体Agent、精确指令遵循、幻觉控制)的综合测评。结果显示,海外模型如o3、o4-mini(high)在推理任务上表现突出,而国内模型如Doubao-Seed-1.6-thinking-250715在智能体Agent和幻觉控制任务上表现出色。此外,报告还分析了模型性价比、效能区间分布,并对代表性模型如Doubao-Seed-1.6-thinking-250715、DeepSeek-R1-0528、GLM-4.5等进行了详细介绍。整体来看,国内大模型在特定任务上已接近国际顶尖水平,但在综合推理能力上仍有提升空间。 适用人群:对大模型技术感兴趣的科研人员、工程师、产品经理及投资者。 使用场景及目标:①了解2025年上半年中文大模型的发展现状与趋势;②评估国内外大模型在不同任务上的表现差异;③为技术选型和性能优化提供参考依据。 其他说明:报告提供了详细的测评方法、评分标准及结果分析,确保评估的科学性和公正性。此外,SuperCLUE团队还发布了多个专项测评基准,涵盖多模态、文本、推理等多个领域,为业界提供全面的测评服务。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值