VC11, std functional bind/function, big performance hit under x64

lyljp

于 2013-05-07 14:06:34 发布

阅读量674

点赞数

分类专栏： C++

C++ 专栏收录该内容

13 篇文章

订阅专栏

In testing thefollowing code with both VC10 and 11 (beta, ultimate) I've noticed a remarkableperformance difference. When compiled for Win32 they perform about the same,with 11 gaining a slight edge. But when compiled for x64, the VC11 version isseveral times slower (measured in performance counter ticks). In both cases Iuse the default project settings, adding a new config for x64 and not modifyingthe compiler command line.

*edit I shouldmention that VC10 is not patched to SP1.

#include <Windows.h>
#include <functional>
#include <vector>
#include <iostream>
#include <conio.h>


typedef std::function<void(void)> func; 
typedef std::vector<func> funcVector;
typedef std::vector<func>::iterator ifuncVector;


void Function1()
{
	int j = 0;
	for( int i = 0; i < 100; i++ )
		j += i;
}

void Function2()
{
	float j = 1.0f;
	for( int i = 0; i < 100; i++ )
		j *= (i+1);
}

void Function3()
{
	double j = 1.0;
	for( int i = 0; i < 100; i++ )
		j *= (i+1);
}


LARGE_INTEGER timingStart, timingEnd;
__int64 diff;
int nrCalls = 1;

int main()
{
	funcVector funcs;

	while ( true )
	{
		std::cout << "Enter number of calls or 0 to quit" << std::endl;
		std::cin >> nrCalls;
		if( !std::cin.good() || nrCalls == 0 )
			break;

		QueryPerformanceCounter(&timingStart);
		for( int i = 0; i < nrCalls; i++ )
		{
			funcs.push_back( std::bind( &Function1 ) );
			funcs.push_back( std::bind( &Function2 ) );
			funcs.push_back( std::bind( &Function3 ) );
			for( ifuncVector i = funcs.begin(); i != funcs.end(); i++ )
			{
				(*i)();
			}
			funcs.clear();
		}
		QueryPerformanceCounter(&timingEnd);

		diff = timingEnd.QuadPart - timingStart.QuadPart;

		std::cout << "Timing for " << nrCalls << " calls:" << std::endl;
		std::cout <<  diff << " ticks " << std::endl;
		std::cout << "***************************" << std::endl;
	}

	return 0;
}

==============================================

Hi, I maintainVC's STL, and I was just alerted to this 5-month-old post. We're nowtracking this as DevDiv#490878 in our internal database, and I've analyzed itto figure out what's going on here (and why it appeared to bex64-specific). To summarize: your function pointer is 8 bytes onx64. VC11 changed the representation of bound functors to store theirbound arguments in tuples; you have no bound arguments, so we store atuple<>. That's an empty 1-byte class, so the overall bound functoris 16 bytes (for alignment), when it used to be 8 with VC10. Then, yougive it to std::function. That adds a vtable pointer (necessary forstd::function's magic type erasure), another 8 bytes. It also stores astd::allocator (empty 1-byte class), which is a second regression from VC10(where we avoided storing std::allocator here). The total size is 32bytes, which is greater than our Small Functor Optimization limit on x64 of 24bytes. (That's undocumented and we could change it at any time, but that's thecurrent value.) Functors less than or equal to 24 bytes, when thenecessary vtable pointer is included, are stored directly within the std::function. Larger functors must be dynamically allocated. This dynamic memoryallocation (and deallocation) is responsible for the massive performance hityou've observed.

Note that whileyou can't do anything about std::function storing std::allocator unnecessarily,you can avoid using bind() which is responsible for half of the bloathere. Function pointers can be directly given to std::function. Otherwise, you can "bind" things with lambdas, which are moreefficient and have more natural syntax even aside from this unnecessary bloatissue.

If you have anyfurther questions, please E-mail me atstl@microsoft.com (I am unfortunatelytoo busy to continuously monitor the forums).