测试多线程对多核cpu的分支预测的影响

最新推荐文章于 2024-02-19 23:43:26 发布

iteye_5282

最新推荐文章于 2024-02-19 23:43:26 发布

阅读量257

点赞数

文章标签： java

本文通过实测Intel Core i5处理器在多线程环境下的分支预测性能，揭示了在多线程场景下分支预测的复杂性和挑战。详细分析了不同测试情况下的分支预测结果，并探讨了预测器的历史表、不同核心间的预测差异及避免多个线程执行同一代码段的重要性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

前言：

现代的cpu都有流水线，分支预测功能，CPU的分支预测准确性可以达到98%以上，但是如果预测失败，则流水线失效，性能损失很严重。

CPU使用的分支预测技术可以参考：

处理器分支预测研究的历史和现状.pdf

同时多线程处理器上的动态分支预测器设计方案研究.pdf

正确地利用这些特性，可以写出高效的程序。

比如在写if，else语句时，应当把大概率事件放到if语句中，把小概率事件放到else语句中。

但是通常这种考虑都是基于单线程的，在多线程下有可能出现意外情况，比如多个线程同时执行同一处的代码。

测试：

下面基于Intel Core i5的一些多线程分支预测的测试。

测试思路（真实测试时发现不止以下三种情况，详细见下面的测试结果）：

两个线程执行同一处代码，而if判断总为true。

两个线程执行同一处代码，一个的if判断总为true，另一个的if判断时为true，时为false。

两个线程执行不同的代码（逻辑功能一样，只是位置不同），一个的if判断总为true，另一个的if判断时为true，时为false。

代码如下：

其中test1测试的是同一处代码，当传入偶数参数时，则if判断总为true，当传入奇数参数时，if判断时为true时为false。

test2函数测试的是不同一处的代码，当传入偶数参数时，则if判断总为true，当传入奇数参数时，if判断时为true时为false。

import java.util.concurrent.CountDownLatch;
public class Test {
	public static int loop = 1000000000;
	public static int sum = 0;
	public static CountDownLatch startGate;
	public static CountDownLatch endGate;
	
	public static void test1(int x1, int x2) throws InterruptedException{
		startGate = new CountDownLatch(1);
		endGate = new CountDownLatch(2);
		new Thread(new T1(x1)).start();
		new Thread(new T1(x2)).start();
		Test.startGate.countDown();
		Test.endGate.await();
	}
	public static void test2(int x1, int x2) throws InterruptedException{
		startGate = new CountDownLatch(1);
		endGate = new CountDownLatch(2);
		new Thread(new T1(x1)).start();
		new Thread(new T2(x2)).start();
		Test.startGate.countDown();
		Test.endGate.await();
	}
}

class T1 implements Runnable{
	int xxx = 0;
	public T1(int xxx){
		this.xxx = xxx;
	}
	@Override
	public void run() {
		try {
			int sum = 0;
			int temp = 0;
			Test.startGate.await();
			long start = System.nanoTime();
			for(int i = 0; i < Test.loop; ++i){
				temp += xxx;
				if(temp % 2 == 0){
					sum += 100;
				}else{
					sum += 200;
				}
			}
			Test.sum += sum;
			long end = System.nanoTime();
			System.out.format("%s, T1(%d): %d\n", Thread.currentThread().getName(), xxx, end - start);
		} catch (InterruptedException e) {
			e.printStackTrace();
		}finally{
			Test.endGate.countDown();
		}
	}
}
class T2 implements Runnable{
	int xxx = 0;
	public T2(int xxx){
		this.xxx = xxx;
	}
	@Override
	public void run() {
		try {
			int sum = 0;
			int temp = 0;
			Test.startGate.await();
			long start = System.nanoTime();
			for(int i = 0; i < Test.loop; ++i){
				temp += xxx;
				if(temp % 2 == 0){
					sum += 100;
				}else{
					sum += 200;
				}
			}
			Test.sum += sum;
			long end = System.nanoTime();
			System.out.format("%s, T2(%d): %d\n", Thread.currentThread().getName(), xxx, end - start);
		} catch (InterruptedException e) {
			e.printStackTrace();
		}finally{
			Test.endGate.countDown();
		}
	}
}

因为测试的情况多种，简洁表述如下：

一个test1函数会有两个结果，如test1(2, 3)得到两个结果2.1s，2.2s，表示两个线程执行同一份代码，一个的if判断总是true（2是偶数），另一个的总是false（3是奇数），其中第一个线程平均执行一次计算的时间是2.1s，第二个线程平均执行一次计算的时间是2.2s。

测试的main函数中有两个for循环，每个10次。在表述测试结果时，用简略表示，如：

一行数据表示执行一次main函数的结果，后面的时间是粗略平均计算得到的

test1(2,3)

2.1s

test1(2,4)

2.1s

main函数：

	public static void main(String[] args) throws InterruptedException {
		for(int i = 0; i < 10; ++i){
			test1(2, 3);
		}
		System.out.println("!!!!!!!!!!!!!!!!!!!!");
		for(int i = 0; i < 10; ++i){
			test1(2, 4);
		}		
	}

测试结果如下：