Fast Cube Roots

本文探讨了在缺乏特定立方根函数的情况下,利用幂函数计算立方根的方法,并对比了Kahan、Turkowski等人的实现。重点介绍了Newton's方法及其变种Halley's方法在立方根计算中的应用,分析了各种方法的速度、精度和成本。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


原文:http://metamerist.com/cbrt/cbrt.htm



 home


In Search of a Fast Cube Root


Following are thoughts and observations regarding the machine extraction ofcube roots with a particular emphasis on speed of calculations.

Cube roots are a topic of interest, because they're used in many commonformulae and a fast cube root function isn't included with Microsoft VisualStudio.

In the absence of a special cube root function, a typical strategy iscalculation via a power function (e.g., pow(x, 1.0/3.0)).This can be problematic in terms of speed and in terms of accuracywhen negative numbers aren't handled properly.

All of the implementations of cube root functions I discovered rely on Newton'smethod.  Notable implementations include William. M.Kahan's cube root function as found in the BSD sources (kahan'scbrt) and Ken Turkowski's as presented in an Apple technical report (CubeRoot).Kahan's implementation is particularly robust, handling negative numbers, NaN,INF and subnormals. Turkowski's paper includes interesting facts and insights.  



Cube Root by Newton's method

Given anapproximation ‘a’ of the cube root of ‘R’,the following formula will produce ‘b’, a closerapproximation to the cube root of ‘R’.



or

 

Repeated application yields results progressively closer to the true cube rootof R, with the number of bits of precision doubling with each iteration.Consequently, the accuracy of the initial guess fed into the iteration isespecially important. Desired degrees of precision can be achieved with feweriterations when initial estimates are more accurate. Computationally, each iterationis relatively inexpensive: one divide, two multiplies and two adds.

Turkowski's approach factors the cube root of R into the product of a power oftwo and a cube root of a small value with a limited range. He then uses aquadratic approximation to estimate the cube root of the small value. Thisapproach yields an initial value with approximately 6 bits of precision to seedthe Newton-Raphson iteration.

Kahan uses a bit hack to generate an initial estimate with 5 bits of precision.The approach exploits the properties of IEEE 754 floating point numbers byleveraging the fact that their binary representation is close to a log2reppresentation. The following code demonstrates the technique applied for thedouble precision case. (Note: Kahan's code also includes support forhandling subnormal numbers. The following code assumes 32-bit integers).


inline double cbrt_5d(double d)
{
   const unsigned int B1 = 715094163;
   double t = 0.0;
   unsigned int* pt = (unsigned int*) &t;
   unsigned int* px = (unsigned int*) &d;
   pt[1]=px[1]/3+B1;
   return t;



General Nth Root Approximations

I've generalized this approach to provide initial approximations for nth rootsgenerally for IEEE 754 floats and doubles in the following:

template<int n>
inline float nth_rootf(floatx)
{
   const int ebits = 8;
   const int fbits = 23;

   int& i = (int&) x;
   const int bias = (1 << (ebits-1))-1;
   i = (i - (bias << fbits)) / n + (bias << fbits);

   return x;
}

template <int n>
inline double nth_rootd(doublex)
{
   const int ebits = 11;
   const int fbits = 52;

   _int64& i = (_int64&) x;
   const _int64 bias = (1 << (ebits-1))-1;
   i = (i - (bias << fbits)) / n + (bias << fbits);

   return x;
}


To use the code to generate an nth root approximation, n is specified as a C++template parameter. For example, a float approximation for the cube root of 5.0can be can be generated with the call
nth_rootf<3>(5.0f). In my experience, thesefunctions provide estimates in the range of 5 to 6 bits of precision.



Halley's Method applied to CubeRoots

Recent investigations into faster converging cube root approximations led to amethod offered by Otis E. Lancaster in Machine Method for the Extraction ofa Cube Root
1. Lancaster’s method is calculated relativelyefficiently, and it exhibits cubic convergence in contrast to the quadraticconvergence of Newton's method. It takes the following form (where ‘b’ is a closer approximation tothe cube root of R than ‘a’).




The method is actually an application of a method devised by comet discovererEdmond Halley (see Halley'sMethod).  Over the years it hasbeen repeatedly rediscovered and attributed to others including Lambert andBailey. With slight massaging, the equation can be reduced to the following:
 



In this form, the calculation cost amounts to one divide, three multiplies andthree additions. Although this exceeds Newton-Raphson by a multiply and an add,speed improvements can be reaped thanks to the faster rate at whichHalley’s method converges and the relatively high cost of divide, whichis a factor in each iteration of either method.


Comparisonof Various Methods

 

Followingare times and accuracy estimates of various methods including the bit hackapproximation (cbrt_5f, cbrt_5d), up to three iterations of Halley’smethod and up to four iterations of Newton’s methods. These results wereobtained on an Intel Core Duo system using C++ compiled with Microsoft VisualStudio 2005. 

 

Inthe tests, one million cube roots are calculated uniformly across the rangefrom 0.0 to 1.0. Times are reported in milliseconds. The second columnindicates an estimate of the minimum number of bits of precision in the worstcase found. The last column indicates an estimate of the average number of bitsof precision per evaluation. 

 

Thecbrt_5f and cbrt_5d functions were used to seed the iterative methods.It’s important to note that a better approximation will converge evenmore quickly. Given the strategy used, a single iteration of Halley’smethod appears sufficient for 16 bits of precision. Also, note the costs ofvarious methods in comparison to the standard pow( ) function.

 

 

32-bit float tests

----------------------------------------

cbrt_5f      8.8 ms    5 mbp   6.223 abp

pow       144.5 ms   23 mbp  23.000 abp

halley x 1  31.8ms   15 mbp  18.961 abp

halley x 2  59.0ms   23 mbp  23.000 abp

newton x 1  23.4ms   10 mbp  12.525 abp

newton x 2  48.9ms   20 mbp  22.764 abp

newton x 3  72.0ms   23 mbp  23.000 abp

newton x 4  89.6ms   23 mbp  23.000 abp

 

 

64-bit double tests

----------------------------------------

cbrt_5d     23.5 ms    5 mbp   6.299 abp

pow       151.3 ms   52 mbp  52.000 abp

halley x 1  56.5ms   16 mbp  19.503 abp

halley x 2  95.3ms   47 mbp  51.575 abp

halley x 3 122.1 ms   51 mbp  51.965 abp

newton x 1  48.8ms   10 mbp  12.596 abp

newton x 2  78.6ms   20 mbp  25.205 abp

newton x 3 104.5 ms   40 mbp  47.681 abp

newton x 4 134.7 ms   51 mbp  51.992 abp

 

 

Thesource code is available here.



1. Otis E. Lancaster, Machine Method for the Extraction of Cube Root Journal ofthe American Statistical Association, Vol. 37, No. 217. (Mar., 1942), pp.112-115.


// CubeRoot.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <windows.h>
#include <math.h>

// get accurate timer (Win32)
double GetTimer()
{
	LARGE_INTEGER F, N;
	QueryPerformanceFrequency(&F);
	QueryPerformanceCounter(&N);
	return double(N.QuadPart)/double(F.QuadPart);
}

typedef float  (*cuberootfnf) (float);
typedef double (*cuberootfnd) (double);

// estimate bits of precision (32-bit float case)
inline int bits_of_precision(float a, float b)
{
	const double kd = 1.0 / log(2.0);

	if (a==b)
		return 23;

	const double kdmin = pow(2.0, -23.0);

	double d = fabs(a-b);
	if (d < kdmin)
		return 23;

	return int(-log(d)*kd);
}

// estiamte bits of precision (64-bit double case)
inline int bits_of_precision(double a, double b)
{
	const double kd = 1.0 / log(2.0);

	if (a==b)
		return 52;

	const double kdmin = pow(2.0, -52.0);

	double d = fabs(a-b);
	if (d < kdmin)
		return 52;

	return int(-log(d)*kd);
}

// cube root via x^(1/3)
float pow_cbrtf(float x)
{
	return pow(x, 1.0f/3.0f);
}

// cube root via x^(1/3)
double pow_cbrtd(double x)
{
	return pow(x, 1.0/3.0);
}

// cube root approximation using bit hack for 32-bit float
__forceinline float cbrt_5f(float f)
{
	unsigned int* p = (unsigned int *) &f;
	*p = *p/3 + 709921077;
	return f;
}

// cube root approximation using bit hack for 64-bit float 
// adapted from Kahan's cbrt
__forceinline double cbrt_5d(double d)
{
	const unsigned int B1 = 715094163;
	double t = 0.0;
	unsigned int* pt = (unsigned int*) &t;
	unsigned int* px = (unsigned int*) &d;
	pt[1]=px[1]/3+B1;
	return t;
}

// cube root approximation using bit hack for 64-bit float 
// adapted from Kahan's cbrt
__forceinline double quint_5d(double d)
{
	return sqrt(sqrt(d));

	const unsigned int B1 = 71509416*5/3;
	double t = 0.0;
	unsigned int* pt = (unsigned int*) &t;
	unsigned int* px = (unsigned int*) &d;
	pt[1]=px[1]/5+B1;
	return t;
}

// iterative cube root approximation using Halley's method (float)
__forceinline float cbrta_halleyf(const float a, const float R)
{
	const float a3 = a*a*a;
    const float b= a * (a3 + R + R) / (a3 + a3 + R);
	return b;
}

// iterative cube root approximation using Halley's method (double)
__forceinline double cbrta_halleyd(const double a, const double R)
{
	const double a3 = a*a*a;
    const double b= a * (a3 + R + R) / (a3 + a3 + R);
	return b;
}

// iterative cube root approximation using Newton's method (float)
__forceinline float cbrta_newtonf(const float a, const float x)
{
//    return (1.0 / 3.0) * ((a + a) + x / (a * a));
	return a - (1.0f / 3.0f) * (a - x / (a*a));
}

// iterative cube root approximation using Newton's method (double)
__forceinline double cbrta_newtond(const double a, const double x)
{
	return (1.0/3.0) * (x / (a*a) + 2*a);
}

// cube root approximation using 1 iteration of Halley's method (double)
double halley_cbrt1d(double d)
{
	double a = cbrt_5d(d);
	return cbrta_halleyd(a, d);
}

// cube root approximation using 1 iteration of Halley's method (float)
float halley_cbrt1f(float d)
{
	float a = cbrt_5f(d);
	return cbrta_halleyf(a, d);
}

// cube root approximation using 2 iterations of Halley's method (double)
double halley_cbrt2d(double d)
{
	double a = cbrt_5d(d);
	a = cbrta_halleyd(a, d);
	return cbrta_halleyd(a, d);
}

// cube root approximation using 3 iterations of Halley's method (double)
double halley_cbrt3d(double d)
{
	double a = cbrt_5d(d);
	a = cbrta_halleyd(a, d);
	a = cbrta_halleyd(a, d);
	return cbrta_halleyd(a, d);
}


// cube root approximation using 2 iterations of Halley's method (float)
float halley_cbrt2f(float d)
{
	float a = cbrt_5f(d);
	a = cbrta_halleyf(a, d);
	return cbrta_halleyf(a, d);
}

// cube root approximation using 1 iteration of Newton's method (double)
double newton_cbrt1d(double d)
{
	double a = cbrt_5d(d);
	return cbrta_newtond(a, d);
}

// cube root approximation using 2 iterations of Newton's method (double)
double newton_cbrt2d(double d)
{
	double a = cbrt_5d(d);
	a = cbrta_newtond(a, d);
	return cbrta_newtond(a, d);
}

// cube root approximation using 3 iterations of Newton's method (double)
double newton_cbrt3d(double d)
{
	double a = cbrt_5d(d);
	a = cbrta_newtond(a, d);
	a = cbrta_newtond(a, d);
	return cbrta_newtond(a, d);
}

// cube root approximation using 4 iterations of Newton's method (double)
double newton_cbrt4d(double d)
{
	double a = cbrt_5d(d);
	a = cbrta_newtond(a, d);
	a = cbrta_newtond(a, d);
	a = cbrta_newtond(a, d);
	return cbrta_newtond(a, d);
}

// cube root approximation using 2 iterations of Newton's method (float)
float newton_cbrt1f(float d)
{
	float a = cbrt_5f(d);
	return cbrta_newtonf(a, d);
}

// cube root approximation using 2 iterations of Newton's method (float)
float newton_cbrt2f(float d)
{
	float a = cbrt_5f(d);
	a = cbrta_newtonf(a, d);
	return cbrta_newtonf(a, d);
}

// cube root approximation using 3 iterations of Newton's method (float)
float newton_cbrt3f(float d)
{
	float a = cbrt_5f(d);
	a = cbrta_newtonf(a, d);
	a = cbrta_newtonf(a, d);
	return cbrta_newtonf(a, d);
}

// cube root approximation using 4 iterations of Newton's method (float)
float newton_cbrt4f(float d)
{
	float a = cbrt_5f(d);
	a = cbrta_newtonf(a, d);
	a = cbrta_newtonf(a, d);
	a = cbrta_newtonf(a, d);
	return cbrta_newtonf(a, d);
}

double TestCubeRootf(const char* szName, cuberootfnf cbrt, double rA, double rB, int rN)
{
	const int N = rN;
 	
	float dd = float((rB-rA) / N);

	// calculate 1M numbers
	int i=0;
	float d = (float) rA;

	double t = GetTimer();
	double s = 0.0;

	for(d=(float) rA, i=0; i<N; i++, d += dd)
	{
		s += cbrt(d);
	}

	t = GetTimer() - t;

	printf("%-10s %5.1f ms ", szName, t*1000.0);

	double maxre = 0.0;
	double bits = 0.0;
	double worstx=0.0;
	double worsty=0.0;
	int minbits=64;

	for(d=(float) rA, i=0; i<N; i++, d += dd)
	{
		float a = cbrt((float) d);	
		float b = (float) pow((double) d, 1.0/3.0);

		int bc = bits_of_precision(a, b);
		bits += bc;

		if (b > 1.0e-6)
		{
			if (bc < minbits)
			{
				minbits = bc;
				worstx = d;
				worsty = a;
			}
		}
	}

	bits /= N;

    printf(" %3d mbp  %6.3f abp\n", minbits, bits);

	return s;
}


double TestCubeRootd(const char* szName, cuberootfnd cbrt, double rA, double rB, int rN)
{
	const int N = rN;
	
	double dd = (rB-rA) / N;

	int i=0;

	double t = GetTimer();
	
	double s = 0.0;
	double d = 0.0;

	for(d=rA, i=0; i<N; i++, d += dd)
	{
		s += cbrt(d);
	}

	t = GetTimer() - t;

	printf("%-10s %5.1f ms ", szName, t*1000.0);

	double bits = 0.0;
	double maxre = 0.0;	
	double worstx = 0.0;
	double worsty = 0.0;
	int minbits = 64;
	for(d=rA, i=0; i<N; i++, d += dd)
	{
		double a = cbrt(d);	
		double b = pow(d, 1.0/3.0);

		int bc = bits_of_precision(a, b); // min(53, count_matching_bitsd(a, b) - 12);
		bits += bc;

		if (b > 1.0e-6)
		{
			if (bc < minbits)
			{
				bits_of_precision(a, b);
				minbits = bc;
				worstx = d;
				worsty = a;
			}
		}
	}

	bits /= N;

    printf(" %3d mbp  %6.3f abp\n", minbits, bits);

	return s;
}

int _tmain(int argc, _TCHAR* argv[])
{
	// a million uniform steps through the range from 0.0 to 1.0
	// (doing uniform steps in the log scale would be better)
	double a = 0.0;
	double b = 1.0;
	int n = 1000000;

	printf("32-bit float tests\n");
	printf("----------------------------------------\n");
	TestCubeRootf("cbrt_5f", cbrt_5f, a, b, n);
	TestCubeRootf("pow", pow_cbrtf, a, b, n);
	TestCubeRootf("halley x 1", halley_cbrt1f, a, b, n);
	TestCubeRootf("halley x 2", halley_cbrt2f, a, b, n);
	TestCubeRootf("newton x 1", newton_cbrt1f, a, b, n);
	TestCubeRootf("newton x 2", newton_cbrt2f, a, b, n);
	TestCubeRootf("newton x 3", newton_cbrt3f, a, b, n);
	TestCubeRootf("newton x 4", newton_cbrt4f, a, b, n);
	printf("\n\n");

	printf("64-bit double tests\n");
	printf("----------------------------------------\n");
	TestCubeRootd("cbrt_5d", cbrt_5d, a, b, n);
	TestCubeRootd("pow", pow_cbrtd, a, b, n);
	TestCubeRootd("halley x 1", halley_cbrt1d, a, b, n);
	TestCubeRootd("halley x 2", halley_cbrt2d, a, b, n);
	TestCubeRootd("halley x 3", halley_cbrt3d, a, b, n);
	TestCubeRootd("newton x 1", newton_cbrt1d, a, b, n);
	TestCubeRootd("newton x 2", newton_cbrt2d, a, b, n);
	TestCubeRootd("newton x 3", newton_cbrt3d, a, b, n);
	TestCubeRootd("newton x 4", newton_cbrt4d, a, b, n);
	printf("\n\n");

	getchar();

	return 0;
}



资源下载链接为: https://pan.quark.cn/s/22ca96b7bd39 在当今的软件开发领域,自动化构建与发布是提升开发效率和项目质量的关键环节。Jenkins Pipeline作为一种强大的自动化工具,能够有效助力Java项目的快速构建、测试及部署。本文将详细介绍如何利用Jenkins Pipeline实现Java项目的自动化构建与发布。 Jenkins Pipeline简介 Jenkins Pipeline是运行在Jenkins上的一套工作流框架,它将原本分散在单个或多个节点上独立运行的任务串联起来,实现复杂流程的编排与可视化。它是Jenkins 2.X的核心特性之一,推动了Jenkins从持续集成(CI)向持续交付(CD)及DevOps的转变。 创建Pipeline项目 要使用Jenkins Pipeline自动化构建发布Java项目,首先需要创建Pipeline项目。具体步骤如下: 登录Jenkins,点击“新建项”,选择“Pipeline”。 输入项目名称和描述,点击“确定”。 在Pipeline脚本中定义项目字典、发版脚本和预发布脚本。 编写Pipeline脚本 Pipeline脚本是Jenkins Pipeline的核心,用于定义自动化构建和发布的流程。以下是一个简单的Pipeline脚本示例: 在上述脚本中,定义了四个阶段:Checkout、Build、Push package和Deploy/Rollback。每个阶段都可以根据实际需求进行配置和调整。 通过Jenkins Pipeline自动化构建发布Java项目,可以显著提升开发效率和项目质量。借助Pipeline,我们能够轻松实现自动化构建、测试和部署,从而提高项目的整体质量和可靠性。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值