BITCODE DEMYSTIFIED

本文深入探讨了苹果推出的Bitcode特性,解释了它如何通过减少应用中未使用的对象代码来节省磁盘空间,并讨论了该特性可能带来的安全性挑战。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

A few months ago Apple announced a ‘new feature,’ called ‘Bitcode.’ In this article, I will try to answer the questions like what is Bitcode, what problems it aims to solve, what issues it introduces and so on.

What is Bitcode?

To answer this question let’s look at what compilers do for us. Here is a brief overview of compilation process:

  • Lexer: takes source code as an input and translates it into a stream of tokens;
  • Parser: takes stream of tokens as an input and translates it into an AST;
  • Semantic Analysis: takes an AST as an input, checks if a program is correct (method called with correct amount of parameters, method called on object actually exists and non-private, etc.), fills in ‘missing types’ (e.g.: let x = yx has type of y) and passes AST to the next phase;
  • Code Generation: takes an AST as an input and emits some high-level IR (intermediate representation);
  • Optimization: takes IR, makes optimizations and emits IR which is potentially faster and/or smaller;
  • AsmPrinter: another code generation phase, it takes IR and emits assembly for particular CPU;
  • Assembler: takes assembly and converts it into an object code (stream of 0s and 1s);
  • Linker: usually programs refer to already compiled routines from other programs (e.g.: printf) to avoid recompilation of the same code over and over. Until this phase these links do not have correct addresses, they are just placeholders. Linker’s job is to resolve those placeholders so that they point to the correct addresses of their corresponding routines.

You can find more details here: The Compiler.

In the modern world these phases are split into two parts: compiler frontend (lexerparsersemantic analysiscode generation) and compiler backend (optimizationasm printerassemblerlinker). This separation makes much sense for both language designers and hardware manufacturers. If you want to create a new programming language you ‘just’ need to implement a frontend, and you get all available optimizations and support of different CPUs for free. On the other hand, if you created a new chip, you ‘just’ need to extend the backend and you get all the available languages (frontends) support for your CPU.

Below you can see a picture that illustrates compilation process using Clang and LLVM:

This picture clearly demonstrates how communication between frontend and backend is done using IR, LLVM has it is own format, that can be encoded using LLVM bitstream file format - Bitcode.

Just to recall it explicitly - Bitcode is a bitstream representation of LLVM IR.

What problems Apple’s Bitcode aims to solve?

Again, we need to dive a bit deeper and look at how an OS runs programs. This description is not precise and is given just to illustrate the process. For more details I can recommend reading this article: How OS X Executes Applications.

OS X and iOS can run on different CPUs (i386x86_64armarm64, etc.), if you want to run a program on any OS X/iOS setup, then the program should contain object code for each platform. Here is how a binary might look like:

When you run a program, OS reads the ‘Table Of Contents’ and looks for a slice corresponding to the OS CPU. For instance, if you run operating system on x86_64, then OS will load object code for x86_64 into a memory and run the program.

What’s happening with other slices? Nothing, they just waste your disk space.

This is the problem Apple wants to solve: currently, all the apps on the AppStore contain object code for arm and arm64 CPUs. Moreover, third-party proprietary libraries or frameworks contain object code for i386x86_64arm and arm64, so you can use them to test the app on a device or simulator. (Can you imagine how many copies of Google Analytics for i386 you have in your pocket?)

UPD: I do not know why, but I was sure that final executable contains these slices as well (i386x86_64, etc.), but it seems they are stripped during the build phase.

Apple did not give us that many details about how the Bitcode and App Thinning works, so let me assume how it may look:

When you submit an app (including Bitcode) Apple’s ‘BlackBox’ recompiles it for each supported platform and drops any ‘useless’ object code, so AppStore has a copy of the app for each CPU. When an end user wants to install the app - she installs the only version for the particular processor, without any unused stuff.

Bitcode might save up to 50% of disk space per program.

UPD: Of course, I do not take in count resources, it is just about binary itself. For instance, an app I am working on currently has size ~40 megabytes (including assets, xibs. fonts), a size of a binary itself is ~16 megabytes. I checked sizes of each slice: ~7MB for armv7 and 9MB for arm64, if we crop just one of them, it will decrease the size of the app by ~20%.

What problems do Bitcode introduce?

The idea of Bitcode and recompiling for each platform looks really great, and it is a huge improvement, though it has downsides as well: the biggest one is security.

To get the benefits of Bitcode, you should submit your app including Bitcode (surprisingly). If you use some proprietary third-party library, then it also should contain Bitcode, hence as a maintainer of a proprietary library, you should distribute the library with Bitcode.

To recall: Bitcode is just another form of LLVM IR.

LLVM IR

Let’s write some code to see LLVM IR in action.

// main.c
extern int printf(const char *fmt, ...);

int main() {
  printf("Hello World\n");
  return 0;
}

Run the following:

clang -S -emit-llvm main.c

And you’ll have main.ll containing IR:

@.str = private unnamed_addr constant [13 x i8] c"Hello World\0A\00", align 1

; Function Attrs: nounwind ssp uwtable
define i32 @main() #0 {
  %1 = alloca i32, align 4
  store i32 0, i32* %1
  %2 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([13 x i8]* @.str, i32 0, i32 0))
  ret i32 0
}

declare i32 @printf(i8*, ...) #1

What can we see here? It is a bit more verbose than original C code, but it is still much more readable than assembler. Malefactors will be much happier to work with this representation, than with disassembled version of a binary (and they do not even have to pay for tools such Hopper or IDA).

How could malefactor get the IR?

iOS and OS X executables have their own format - Mach-O (read Parsing Mach-O files for more details). Mach-O file contains several segments such as Read-Only Data, Code, Symbol Table, etc. One of those sections contain xar archive with Bitcode:

It is really easy to retrieve it automatically, here I wrote a simple C program that does just that: bitcode_retriever. The workflow is pretty straightforward. Let’s assume that some_binary is a Mach-O file that contains object code for two CPUs (arm and x86_64), and each object code is built using two source files:

$ bitcode_retriever some_binary
arm.xar
x86_64.xar
$ xar -xvf arm.xar
1
2
$ llvm-dis 1 # outputs 1.ll
$ llvm-dis 2 # outputs 2.ll

Bitcode does not store any information about original filenames but uses numbers instead (123, etc.). Also, probably you do not have llvm-disinstalled/built on your machine, but you can easily obtain it, see this article for more details: Getting Started with Clang/LLVM on OS X.

Another potential issue (can’t confirm it) - Bitcode thingie works only for iOS 9, so if you submit your app to the AppStore and it includes Bitcode, then malefactor can get the whole IR from your app using iOS 78 and jailbroken device.

I know only one way to secure the IR - obfuscation. This task is not trivial itself, and it requires even much more efforts if you want to introduce this phase into your Xcode-Driven development flow.

Summary

  • Bitcode is a bitstream file format for LLVM IR
  • one of its goals is to decrease a size of an app by eliminating unused object code
  • malefactor can obtain your app or library, retrieve the IR from it and steal your ‘secret algorithm.’
基于数据挖掘的音乐推荐系统设计与实现 需要一个代码说明,不需要论文 采用python语言,django框架,mysql数据库开发 编程环境:pycharm,mysql8.0 系统分为前台+后台模式开发 网站前台: 用户注册, 登录 搜索音乐,音乐欣赏(可以在线进行播放) 用户登陆时选择相关感兴趣的音乐风格 音乐收藏 音乐推荐算法:(重点) 本课题需要大量用户行为(如播放记录、收藏列表)、音乐特征(如音频特征、歌曲元数据)等数据 (1)根据用户之间相似性或关联性,给一个用户推荐与其相似或有关联的其他用户所感兴趣的音乐; (2)根据音乐之间的相似性或关联性,给一个用户推荐与其感兴趣的音乐相似或有关联的其他音乐。 基于用户的推荐和基于物品的推荐 其中基于用户的推荐是基于用户的相似度找出相似相似用户,然后向目标用户推荐其相似用户喜欢的东西(和你类似的人也喜欢**东西); 而基于物品的推荐是基于物品的相似度找出相似的物品做推荐(喜欢该音乐的人还喜欢了**音乐); 管理员 管理员信息管理 注册用户管理,审核 音乐爬虫(爬虫方式爬取网站音乐数据) 音乐信息管理(上传歌曲MP3,以便前台播放) 音乐收藏管理 用户 用户资料修改 我的音乐收藏 完整前后端源码,部署后可正常运行! 环境说明 开发语言:python后端 python版本:3.7 数据库:mysql 5.7+ 数据库工具:Navicat11+ 开发软件:pycharm
MPU6050是一款广泛应用在无人机、机器人和运动设备中的六轴姿态传感器,它集成了三轴陀螺仪和三轴加速度计。这款传感器能够实时监测并提供设备的角速度和线性加速度数据,对于理解物体的动态运动状态至关重要。在Arduino平台上,通过特定的库文件可以方便地与MPU6050进行通信,获取并解析传感器数据。 `MPU6050.cpp`和`MPU6050.h`是Arduino库的关键组成部分。`MPU6050.h`是头文件,包含了定义传感器接口和函数声明。它定义了类`MPU6050`,该类包含了初始化传感器、读取数据等方法。例如,`begin()`函数用于设置传感器的工作模式和I2C地址,`getAcceleration()`和`getGyroscope()`则分别用于获取加速度和角速度数据。 在Arduino项目中,首先需要包含`MPU6050.h`头文件,然后创建`MPU6050`对象,并调用`begin()`函数初始化传感器。之后,可以通过循环调用`getAcceleration()`和`getGyroscope()`来不断更新传感器读数。为了处理这些原始数据,通常还需要进行校准和滤波,以消除噪声和漂移。 I2C通信协议是MPU6050与Arduino交互的基础,它是一种低引脚数的串行通信协议,允许多个设备共享一对数据线。Arduino板上的Wire库提供了I2C通信的底层支持,使得用户无需深入了解通信细节,就能方便地与MPU6050交互。 MPU6050传感器的数据包括加速度(X、Y、Z轴)和角速度(同样为X、Y、Z轴)。加速度数据可以用来计算物体的静态位置和动态运动,而角速度数据则能反映物体转动的速度。结合这两个数据,可以进一步计算出物体的姿态(如角度和角速度变化)。 在嵌入式开发领域,特别是使用STM32微控制器时,也可以找到类似的库来驱动MPU6050。STM32通常具有更强大的处理能力和更多的GPIO口,可以实现更复杂的控制算法。然而,基本的传感器操作流程和数据处理原理与Arduino平台相似。 在实际应用中,除了基本的传感器读取,还可能涉及到温度补偿、低功耗模式设置、DMP(数字运动处理器)功能的利用等高级特性。DMP可以帮助处理传感器数据,实现更高级的运动估计,减轻主控制器的计算负担。 MPU6050是一个强大的六轴传感器,广泛应用于各种需要实时运动追踪的项目中。通过 Arduino 或 STM32 的库文件,开发者可以轻松地与传感器交互,获取并处理数据,实现各种创新应用。博客和其他开源资源是学习和解决问题的重要途径,通过这些资源,开发者可以获得关于MPU6050的详细信息和实践指南
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值