Optimizing for the CPU / compiler

本文探讨了标准C语言与特定处理器架构之间的不匹配问题,特别是在嵌入式系统中。通过实例说明如何利用处理器的特殊功能,如逻辑移位、饱和加法和字节交换等操作来优化代码效率。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

By Nigel Jones

原文地址: http://embeddedgurus.com/stack-overflow/2012/06/optimizing-for-the-cpu-compiler/

It is well known that standard C language features map horribly on to the architecture of many processors. While the mapping is obvious and appalling for some processors (low end PICs, 8051 spring to mind), it’s still not necessarily great at the 32 bit end of the spectrum where processors without floating point units can be hit hard with C’s floating point promotion rules. While this is all obvious stuff, it’s essentially about what those CPUs are lacking. Where it gets really interesting in the embedded space is when you have a processor that has all sorts of specialized features that are great for embedded systems – but which simply do not map on to the C language view of the world. Some examples will illustrate my point.

Arithmetic vs. Logical shifting

The C language does of course have support for performing shift operations. However, these are strictly arithmetic shifts. That is when bits get shifted off the end of an integer type, they are simply lost. Logical shifting, sometimes known as rotation, is different in that bits simply get rotated back around (often through the carry bit but not always). Now while arithmetic shifting is great for, well arithmetic operations, there are plenty of occasions in which I find myself wanting to perform a rotation. Now can I write a rotation function in C – sure – but it’s a real pain in the tuches.

Saturated addition

If you have ever had to design and implement an integer digital filter, I am sure you found yourself yearning for an addition operator that will saturate rather than overflow. [In this form of arithmetic, if the integral type would overflow as the result of an operation, then the processor simply returns the minimum or maximum value as appropriate].  Processors that the designers think might be required to perform digital filtering will have this feature built directly into their instruction sets.  By contrast the C language has zero direct support for such operations, which must be coded using nasty checks and masks.

Nibble swapping

Swapping the upper and lower nibbles of a byte is a common operation in cryptography and related fields. As a result many processors include this ever so useful instruction in their instruction sets. While you can of course write C code to do it, it’s horrible looking and grossly inefficient when compared to the built in instruction.

Implications

If you look over the examples quoted I’m sure you noticed a theme:

  1. Yes I can write C code to achieve the desired functionality.
  2. The resultant C code is usually ugly and horribly inefficient when compared to the intrinsic function of the processor.

Now in many cases, C compilers simply don’t give you access to these intrinsic functions, other than resorting to the inline assembler. Unfortunately, using the inline assembler causes a lot of problems. For example:

  1. It will often force the compiler to not optimize the enclosing function.
  2. It’s really easy to screw it up.
  3. It’s banned by most coding standards.

As a result, the intrinsic features can’t be used anyway. However, there are embedded compilers out there that support intrinsic functions. For example here’s how to swap nibbles using IAR’s AVR compiler:

foo = __swap_nibbles(bar);

There are several things to note about this:

  1. Because it’s a compiler intrinsic function, there are no issues with optimization.
  2. Similarly because one works with standard variable names, there is no particular likelihood of getting this wrong.
  3. Because it looks like a function call, there isn’t normally a problem with coding standards.

This then leads to one of the essential quandaries of embedded systems. Is it better to write completely standard (and hence presumably portable) C code, or should one take every advantage of neat features that are offered by your CPU (and if it is any good), your compiler?

I made my peace with this decision many years ago and fall firmly into the camp of take advantage of every neat feature offered by the CPU / compiler – even if it is non-standard. My rationale for doing so is as follows:

  1. Porting code from one CPU to another happens rarely. Thus to burden the bulk of systems with this mythical possibility seems weird to me.
  2. End users do not care. When was the last time you heard someone extoll the use of standard code in the latest widget? Instead end users care about speed, power and battery life. All things that can come about by having the most efficient code possible.
  3. It seems downright rude not to use those features that the CPU designer built in to the CPU just because some purist says I should not.

Having said this, I do of course understand completely if you are in the business of selling software components (e.g. an AES library), where using intrinsic / specialized instructions could be a veritable pain. However for the rest of the industry I say use those intrinsic functions! As always, let the debate begin.

内容概要:《中文大模型基准测评2025年上半年报告》由SuperCLUE团队发布,详细评估了2025年上半年中文大模型的发展状况。报告涵盖了大模型的关键进展、国内外大模型全景图及差距、专项测评基准介绍等。通过SuperCLUE基准,对45个国内外代表性大模型进行了六大任务(数学推理、科学推理、代码生成、智能体Agent、精确指令遵循、幻觉控制)的综合测评。结果显示,海外模型如o3、o4-mini(high)在推理任务上表现突出,而国内模型如Doubao-Seed-1.6-thinking-250715在智能体Agent和幻觉控制任务上表现出色。此外,报告还分析了模型性价比、效能区间分布,并对代表性模型如Doubao-Seed-1.6-thinking-250715、DeepSeek-R1-0528、GLM-4.5等进行了详细介绍。整体来看,国内大模型在特定任务上已接近国际顶尖水平,但在综合推理能力上仍有提升空间。 适用人群:对大模型技术感兴趣的科研人员、工程师、产品经理及投资者。 使用场景及目标:①了解2025年上半年中文大模型的发展现状与趋势;②评估国内外大模型在不同任务上的表现差异;③为技术选型和性能优化提供参考依据。 其他说明:报告提供了详细的测评方法、评分标准及结果分析,确保评估的科学性和公正性。此外,SuperCLUE团队还发布了多个专项测评基准,涵盖多模态、文本、推理等多个领域,为业界提供全面的测评服务。
资源下载链接为: https://pan.quark.cn/s/f989b9092fc5 单点定位是卫星导航技术的核心方法,通过接收卫星信号来确定接收器在地球上的位置。它主要涉及分析卫星发射的时间戳、伪距以及卫星轨道信息。MATLAB凭借其强大的数值计算和数据处理能力,可以用来编写程序实现单点定位。RINEX(Receiver Independent Exchange Format)观测文件是一种通用格式,用于存储各种接收机产生的观测数据,如伪距、载波相位和多普勒频移等,便于不同软件进行数据交换和处理。 在MATLAB中实现单点定位的程序通常包括以下步骤:首先,读取RINEX观测文件,解析卫星信号数据,包括处理文件头信息、识别有效观测时段以及提取卫星ID、伪距和时间戳等关键信息。其次,利用星历数据计算卫星在特定时间的位置。星历数据由卫星导航系统地面站提供,包含卫星的精确轨道参数。接下来,对原始伪距进行改正,考虑大气延迟、卫星钟偏和接收机钟偏等因素,这需要对大气折射率进行建模以及估计卫星和接收机的时钟误差。然后,基于改正后的伪距,利用三角定位原理计算接收机的位置,通常采用最小二乘法或其他优化算法来获得最佳解。最后,将计算出的接收机位置与已知点坐标进行比较,评估定位精度,并以经纬度、海拔高度等形式输出结果。 在MATLAB程序single_point_position.m中,可以看到上述步骤的具体实现。代码可能包含RINEX文件解析函数、卫星轨道计算模块、伪距改正函数以及定位计算和输出部分。通过学习和理解该源码,不仅可以深入掌握单点定位原理,还能提升MATLAB编程和处理导航数据的能力。单点定位在实际应用中常用于初步定位或作为更复杂定位方法的基础,如差分定位和动态定位。它在科学研究、导航设备测试和大地测量等领域具有重要价值。通过不断优化这些程序,可以提高定位精度,满足实际需求。
<think>嗯,用户问的是关于Microsoft (R) C/C++ Optimizing Compiler Version 19.XX.XXXXX for x64的东西。首先,我需要确认这个编译器的基本信息。Microsoft的C/C++编译器版本号中的19对应的是Visual Studio的版本,比如VS2015是14.00,但有时候版本号可能有变化。可能需要查证一下版本号的对应关系。比如,VS2013是12,VS2015是14,VS2017是15,VS2019是16,VS2022是17,所以19可能不是对应的VS版本,可能用户提到的版本号是旧版的,或者有其他含义。 接下来,用户可能想知道这个编译器的用途,比如它是用来编译x64架构的程序,属于Microsoft Visual Studio的一部分。需要说明它的功能,比如优化编译器,支持C/C++标准,生成64位代码,适用于Windows平台等。 然后,用户可能需要知道如何获取和使用这个编译器。通常,这个编译器是随着Visual Studio安装的,用户需要安装相应的Visual Studio版本,并在安装时选择C++开发工作负载。另外,也可以单独安装Build Tools,不需要完整的IDE。 另外,版本号的格式通常是主版本号.次版本号.构建号.修订号,比如19.00.24215.1,其中19是主版本号,对应特定的VS版本。需要确认不同版本号对应的VS版本,可能需要指出19对应的VS版本,但根据之前的了解,19可能对应的是某个特定版本,但实际可能需要纠正用户的版本号,比如VS2015是14,而版本号19可能属于较新的版本,或者用户可能记错了版本号。这个时候可能需要说明版本号的对应关系,或者指出可能的混淆点。 另外,用户可能想知道如何查看自己安装的编译器版本,可以通过命令行运行cl.exe /Bv来查看详细版本信息。这可能是用户需要的操作步骤。 可能用户在使用过程中遇到了问题,比如编译错误,或者想确认编译器是否安装正确,因此需要指导如何验证编译器版本和安装情况。 还需要提到编译器支持的C++标准,比如C++11/14/17/20等,不同版本的编译器支持程度不同,比如VS2019开始支持C++20的部分特性,而更早的版本可能支持较少。 此外,优化编译器的特性,比如优化选项/O1、/O2等,如何影响代码性能,可能也是用户关心的内容。 总结下来,回答的结构应该包括:编译器简介、版本号与Visual Studio的对应关系、主要功能、安装与配置方法、常用命令和选项、如何查看版本信息,以及支持的标准和优化特性。需要注意使用中文,结构清晰,分点说明,同时确保技术细节准确,比如版本号的对应可能需要再次确认,避免误导用户。</think>以下是对 Microsoft C/C++ 优化编译器(x64 版本号 19.XX.XXXXX)的逐步解析: --- ### 1. **编译器简介** - **官方名称**:Microsoft (R) C/C++ Optimizing Compiler(简称 MSVC) - **定位**:微软 Visual Studio 套件中的核心编译工具链,专为 Windows 平台开发原生 x64 程序。 - **功能**: - 支持 C/C++ 代码编译、链接和优化 - 生成 64 位 (x64) 可执行文件(如 `.exe`、`.dll`) - 提供调试信息生成(`.pdb` 文件) --- ### 2. **版本号含义** - **格式**:`19.XX.XXXXX` - `19`:主版本号,对应 Visual Studio 版本(例如:VS 2019 对应版本号 `19.XX`) - `XX.XXXXX`:次版本号和构建号,表示具体更新版本(如安全补丁或功能增强) - **示例**:`19.30.30706` 表示 VS 2022 的某个更新版本(注:需结合 Visual Studio 实际版本核对) --- ### 3. **获取与安装** - **方式一**:通过 Visual Studio 安装 1. 下载 Visual Studio(社区版/专业版/企业版) 2. 选择工作负载 **使用 C++ 的桌面开发** 3. 勾选 **MSVC v143 - VS 2022 C++ x64/x86 生成工具**(版本号可能因 VS 版本而异) - **方式二**:独立安装 Build Tools - 访问 [Visual Studio Build Tools](https://visualstudio.microsoft.com/zh-hans/downloads/#build-tools-for-visual-studio-2022) 单独下载编译器工具链。 --- ### 4. **验证编译器** 在命令行中执行以下命令查看版本: ```cmd cl.exe /Bv ``` 输出示例: ``` Microsoft (R) C/C++ Optimizing Compiler Version 19.30.30706 for x64 Copyright (C) Microsoft Corporation. All rights reserved. ...(其他编译选项信息) ``` --- ### 5. **常用编译命令** - **基础编译**: ```cmd cl /EHsc /Fe:output.exe input.cpp ``` - `/EHsc`:启用 C++ 异常处理 - `/Fe`:指定输出文件名 - **优化选项**: - `/O1`:最小化代码体积 - `/O2`:最大化代码速度 - `/Ox`:综合优化(推荐) --- ### 6. **支持标准** - **C 语言**:C11(部分支持) - **C++ 语言**:C++20(逐步支持,需启用 `/std:c++20` 编译选项) - **平台特性**:Windows API、DirectX、COM 组件等原生集成。 --- ### 7. **典型应用场景** - 开发 Windows 桌面应用、驱动程序、游戏引擎 - 高性能计算(HPC)或算法密集型程序的优化 - 与 Visual Studio IDE 无缝协作,支持调试和性能分析工具(如 Profiler) --- ### 8. **注意事项** - **兼容性**:生成的二进制文件仅支持 Windows x64 系统。 - **依赖管理**:需配置 Windows SDK 和运行时库(如 `ucrt.lib`、`kernel32.lib`)。 - **跨平台**:若需跨平台编译,建议结合 CMake 或 LLVM/Clang 工具链。 --- 如有具体使用问题(如代码优化、链接错误),可进一步提供代码片段或错误信息,以便针对性解答。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值