IEEE floating-point exceptions in C

IEEE floating-point exceptions in C

分类: VC++
2009.10.29 21:33 作者:shengzhcn | 评论:0 | 阅读:204

IEEE floating-point exceptions in C

This page will answer the following questions.

  • My program just printed out 1.#IND or 1.#INF (on Windows) or nan or inf (on Linux). What happened?
  • How can I tell if a number is really a number and not a NaN or an infinity?
  • How can I find out more details at runtime about kinds of NaNs and infinities?
  • Do you have any sample code to show how this works?
  • Where can I learn more?

These questions have to do with floating point exceptions. If you get some strange non-numeric output where you're expecting a number, you've either exceeded the finite limits of floating point arithmetic or you've asked for some result that is undefined. To keep things simple, I'll stick to working with the double floating point type. Similar remarks hold for float types.

Debugging 1.#IND, 1.#INF, nan, and inf

If your operation would generate a larger positive number than could be stored in a double, the operation will return 1.#INF on Windows or inf on Linux. Similarly your code will return -1.#INF or -inf if the result would be a negative number too large to store in a double. Dividing a positive number by zero produces a positive infinity and dividing a negative number by zero produces a negative infinity. Example code at the end of this page will demonstrate some operations that produce infinities.

Some operations don't make mathematical sense, such as taking the square root of a negative number. (Yes, this operation makes sense in the context of complex numbers, but a double represents a real number and so there is no double to represent the result.) The same is true for logarithms of negative numbers. Both sqrt(-1.0) and log(-1.0) would return a NaN, the generic term for a "number" that is "not a number". Windows displays a NaN as  -1.#IND ("IND" for "indeterminate") while Linux displays nan. Other operations that would return a NaN include 0/0, 0*∞, and ∞/∞. See the sample code below for examples.

In short, if you get 1.#INF or inf, look for overflow or division by zero. If you get 1.#IND or nan, look for illegal operations. Maybe you simply have a bug. If it's more subtle and you have something that is difficult to compute, see Avoiding Overflow, Underflow, and Loss of Precision. That article gives tricks for computing results that have intermediate steps overflow if computed directly.

Testing for NaNs and infinities

Next suppose you want to test whether a number is an infinity or a NaN. For example, you may want to write to a log file print a debug message when a numerical result goes bad, or you may want to execute some sort of alternate logic in your code. There are simple, portable ways to get summary information and more complicated, less portable ways to get more information.

First, the simple solution. If you want to test whether a double variable contains a valid number, you can check whether x == x. This looks like it should always be true, but it's not! Ordinary numbers always equal themselves, but NaNs do not. I've used this trick on Windows, Linux, and Mac OSX. If you ever use this trick, put big bold comments around your code so that some well-meaning person won't come behind you and delete what he or she things is useless code. Better yet, put the test in a well-documented function in a library that has controlled access. The following function will test whether x is a (possibly infinite) number.

        
    bool IsNumber(double x) 
    {
        // This looks like it should always be true, 
        // but it's false if x is a NaN.
        return (x == x); 
    }
    

To test whether a variable contains a finite number, (i.e. not a NaN and not an infinity) you can use code like the following.

        
    bool IsFiniteNumber(double x) 
    {
        return (x <= DBL_MAX && x >= -DBL_MAX); 
    }    
    

Here DBL_MAX is a constant defined in float.h as the largest double that can be represented. Comparisons with NaNs always fail, even when comparing to themselves, and so the test above will fail for a NaN. If x is not a NaN but is infinite, one of the two tests will fail depending on whether it is a positive infinity or negative infinity.

Getting more information programmatically

To get more detail about the type of a floating point number, there is a function _fpclass on Windows and a corresponding function fp_class_d on Linux. I have not been able to get the corresponding Linux code to work and so I'll stick to what I've tested and just talk about Windows from here on out.

The Windows function _fpclass returns one of the following values:


        _FPCLASS_SNAN   // signaling NaN
        _FPCLASS_QNAN   // quiet NaN
        _FPCLASS_NINF   // negative infinity
        _FPCLASS_NN     // negative normal
        _FPCLASS_ND     // negative denormal
        _FPCLASS_NZ     // -0
        _FPCLASS_PZ     //  0
        _FPCLASS_PD     // positive denormal
        _FPCLASS_PN     // positive normal
        _FPCLASS_PINF   // positive infinity    
        

The following code illustrates which kinds of operations result in which kinds of numbers. To port this code to Linux, the FPClass function would need to use fp_class_d and its corresponding constants.

        #include <cfloat>
        #include <iostream>
        #include <sstream>
        #include <cmath>

        using namespace std;

        string FPClass(double x)
        {
            int i = _fpclass(x);
            string s;
            switch (i)
            {
            case _FPCLASS_SNAN: s = "Signaling NaN";                break;
            case _FPCLASS_QNAN: s = "Quiet NaN";                    break; 
            case _FPCLASS_NINF: s = "Negative infinity (-INF)";     break; 
            case _FPCLASS_NN:   s = "Negative normalized non-zero"; break;
            case _FPCLASS_ND:   s = "Negative denormalized";        break; 
            case _FPCLASS_NZ:   s = "Negative zero (-0)";           break; 
            case _FPCLASS_PZ:   s = "Positive 0 ( 0)";              break; 
            case _FPCLASS_PD:   s = "Positive denormalized";        break; 
            case _FPCLASS_PN:   s = "Positive normalized non-zero"; break; 
            case _FPCLASS_PINF: s = "Positive infinity ( INF)";     break;
            }
            return s;
        }

        string HexDump(double x)
        {
            unsigned long* pu;
            pu = (unsigned long*)&x;
            ostringstream os;
            os << hex << pu[0] << " " << pu[1];
            return os.str();
        }

        // ----------------------------------------------------------------------------
        int main()
        {
            double x, y, z;

            cout << "Testing z = 1/0\n";
            // cannot set x = 1/0 directly or would produce compile error.
            x = 1.0; y = 0; z = x/y;
            cout << "z = " << x/y << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";

            cout << "\nTesting z = -1/0\n";
            x = -1.0; y = 0; z = x/y;
            cout << "z = " << x/y << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";

            cout << "\nTesting z = sqrt(-1)\n";
            x = -1.0;
            z = sqrt(x);
            cout << "z = " << z << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";

            cout << "\nTesting z = log(-1)\n";
            x = -1.0;
            z = sqrt(x);
            cout << "z = " << z << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";

            cout << "\nTesting overflow\n";
            z = DBL_MAX;
            cout << "z = DBL_MAX = " << z; 
            z *= 2.0;
            cout << "; 2z = " << z << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";

            cout << "\nTesting denormalized underflow\n";
            z = DBL_MIN;
            cout << "z = DBL_MIN = " << z << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";
            z /= pow(2.0, 52);
            cout << "z = DBL_MIN / 2^52= " << z << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";
            z /= 2;
            cout << "z = DBL_MIN / 2^53= " << z << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";

            cout << "\nTesting z =  infinity   -infinty\n";
            x = 1.0; y = 0.0; x /= y; y = -x;
            cout << x << "   " << y << " = " << z << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";

            cout << "\nTesting z = 0 * infinity\n";
            x = 1.0; y = 0.0; x /= y; z = 0.0*x;
            cout << "x = " << x << "; z = 0*x = " << z << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";

            cout << "\nTesting 0/0\n";
            x = 0.0; y = 0.0; z = x/y;
            cout << "z = 0/0 = " << z << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";

            cout << "\nTesting z = infinity/infinity\n";
            x = 1.0; y = 0.0; x /= y; y = x; z = x/y;
            cout << "x = " << x << "; z = x/x = " << z << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";

            cout << "\nTesting x fmod 0\n";
            x = 1.0; y = 0.0; z = fmod(x, y);
            cout << "fmod(" << x << ", " << y << ") = " << z << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";

            cout << "\nTesting infinity fmod x\n";
            y = 1.0; x = 0.0; y /= x; z = fmod(y, x);
            cout << "fmod(" << y << ", " << x << ") = " << z << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";

            cout << "\nGetting cout to print QNAN\n";
            unsigned long nan[2]={0xffffffff, 0x7fffffff};
            z = *( double* )nan;
            cout << "z = " << z << "\n";
            cout << HexDump(z) << " _fpclass(z) = " << FPClass(z) << "\n";

            return 0;
        }    
    

To learn more

For a brief explanation of numerical limits and how floating point numbers are laid out in memory, see Anatomy of a floating point number.

For much more detail regarding exceptions and IEEE arithmetic in general, see What every computer scientist should know about floating-point arithmetic.

 

Other C articles: random number generationstringsregular expressions

### IEEE 浮点异常标志及其具体含义 在编程中遇到的IEEE浮点异常主要分为五种类型,每一种都有特定的意义和处理方式: - **INVALID**:当操作数非法或者运算本身无定义时触发此异常。例如0除以0、无穷大减去无穷大等情况都会引发此类异常[^1]。 - **OVERFLOW**:如果计算的结果超出了可表示的最大数值范围,则会抛出溢出错误。这通常发生在两个非常大的数相乘或指数增长过快的情形下。 - **UNDERFLOW**:相反于溢出,当结果太接近零以至于无法精确表达为正常形式下的最小正数时发生欠流。这种情况虽然不会像溢出那样立即造成灾难性的后果,但也可能导致精度损失严重的问题。 - **DIVIDE BY ZERO**:当尝试执行除法而分母为零时产生的异常。需要注意的是,在某些情况下(如涉及NaNs),即使存在理论上的“除以零”,也可能不被报告为此类异常而是其他类型的异常。 - **INEXACT**:只要任何一次算术运算是近似而非完全准确的时候就会设置这个状态位。它表明至少有一个舍入误差已经发生了,这对于那些依赖高精度的应用非常重要。 对于上述提到的各种异常情况,在实际编码过程中可以通过编译器选项来控制其行为模式。比如使用`gfortran`编译Fortran代码时可以指定`-ftrapv`或其他相关参数以便更好地管理这些潜在的风险;而在C/C++环境中则可能涉及到更复杂的信号处理器安装过程[^2]。 为了有效捕捉并响应这类事件的发生,开发者往往还需要熟悉平台特有的API接口以及调试工具链的功能特性。例如Windows平台上提供了专门用于获取当前线程上发生的硬件中断详情的数据结构——`EXCEPTION_POINTERS`,其中包含了指向异常记录(`ExceptionRecord`)上下文环境(`ContextRecord`)的信息[^3]。 最后值得注意的一点是在现代CPU架构里,性能监控单元(PMU)也能够帮助我们检测到由浮点指令引起的特殊条件变化,并通过配置相应的计数器来进行细致化的诊断工作[^5]。 ```cpp // C++ example of installing a signal handler to catch floating-point exceptions. #include <csignal> #include <iostream> void sig_handler(int signum){ std::cout << "Caught signal " << signum << "\n"; } int main(){ // Install the SIGFPE handler before any potential division by zero occurs. signal(SIGFPE, sig_handler); int i = 0; double d = 1 / (double)i; // This will trigger an overflow or divide-by-zero exception return 0; } ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值