About size_t and ptrdiff_t

本文探讨了size_t和ptrdiff_t类型在64位应用中的作用,它们能提高程序的可移植性和性能,确保地址运算的安全性。通过具体实例说明了在不同平台下使用这些类型的优势。
    • Abstract
    • Introduction
    • size_t type
    • ptrdiff_t type
    • Portability of size_t and ptrdiff_t
    • Safety of ptrdiff_t and size_t types in address arithmetic
    • Performance of code using ptrdiff_t and size_t
    • Code refactoring with the purpose of moving to ptrdiff_t and size_t
    • References

Abstract

The article will help the readers understand what size_t and ptrdiff_t types are, what they are used for and when they must be used. The article will be interesting for those developers who begin creation of 64-bit applications where use of size_t and ptrdiff_t types provides high performance, possibility to operate large data sizes and portability between different platforms.

Introduction

Before we begin I would like to notice that the definitions and recommendations given in the article refer to the most popular architectures for the moment (IA-32Intel 64IA-64) and may not fully apply to some exotic architectures.

The types size_t and ptrdiff_t were created to perform correct address arithmetic. It had been assumed for a long time that the size of int coincides with the size of a computer word (microprocessor's capacity) and it can be used as indexes to store sizes of objects or pointers. Correspondingly, address arithmetic was built with the use of int and unsigned types as well. int type is used in most training materials on programming in C and C++ in the loops' bodies and as indexes. The following example is nearly a canon:

for (int i = 0; i < n; i++)
  a[i] = 0;

As microprocessors developed over time and their capacity increased, it became irrational to further increase int type's sizes. There are a lot of reasons for that: economy of memory used, maximum portability etc. As a result, several data model appeared declaring the relations of C/C++ base types. Table N1 shows the main data models and lists the most popular systems using them.

Table N1. Data models

Table N1. Data models

As you can see from the table, it is not so easy to choose a variable's type to store a pointer or an object's size. To find the smartest solution of this problem size _t and ptrdiff_t types were created. They are guaranteed to be used for address arithmetic. And now the following code must become a canon:

for (ptrdiff_t i = 0; i < n; i++)
  a[i] = 0;

It is this code that can provide safety, portability and good performance. The rest of the article explains why.

size_t type

size_t type is a base unsigned integer type of C/C++ language. It is the type of the result returned by sizeof operator. The type's size is chosen so that it could store the maximum size of a theoretically possible array of any type. On a 32-bit system size_t will take 32 bits, on a 64-bit one 64 bits. In other words, a variable of size_t type can safely store a pointer. The exception is pointers to class functions but this is a special case. Although size_t can store a pointer, it is better to use another unsinged integer type uintptr_t for that purpose (its name reflects its capability). The types size_t and uintptr_t are synonyms. size_t type is usually used for loop counters, array indexing and address arithmetic.

The maximum possible value of size_t type is constant SIZE_MAX.

ptrdiff_t type

ptrdiff_t type is a base signed integer type of C/C++ language. The type's size is chosen so that it could store the maximum size of a theoretically possible array of any type. On a 32-bit system ptrdiff_t will take 32 bits, on a 64-bit one 64 bits. Like in size_t, ptrdiff_t can safely store a pointer except for a pointer to a class function. Also, ptrdiff_t is the type of the result of an expression where one pointer is subtracted from the other (ptr1-ptr2). ptrdiff_t type is usually used for loop counters, array indexing, size storage and address arithmetic. ptrdiff_t type has its synonym intptr_t whose name indicates more clearly that it can store a pointer.

Portability of size_t and ptrdiff_t

The types size_t and ptrdiff_t enable you to write well-portable code. The code created with the use of size_t and ptrdiff_t types is easy-portable. The size of size_t and ptrdiff_t always coincide with the pointer's size. Because of this, it is these types that should be used as indexes for large arrays, for storage of pointers and pointer arithmetic.

Linux-application developers often use long type for these purposes. Within the framework of 32-bit and 64-bit data models accepted in Linux, this really works. long type's size coincides with the pointer's size. But this code is incompatible with Windows data model and, consequently, you cannot consider it easy-portable. A more correct solution is to use types size_t and ptrdiff_t.

As an alternative to size_t and ptrdiff_t, Windows-developers can use types DWORD_PTR, SIZE_T, SSIZE_T etc. But still it is desirable to confine to size_t and ptrdiff_t types.

Safety of ptrdiff_t and size_t types in address arithmetic

Address arithmetic issues have been occurring very frequently since the beginning of adaptation of 64-bit systems. Most problems of porting 32-bit applications to 64-bit systems relate to the use of such types as int and long which are unsuitable for working with pointers and type arrays. The problems of porting applications to 64-bit systems are not limited by this, but most errors relate to address arithmetic and operation with indexes.

Here is a simplest example:

size_t n = ...;
for (unsigned i = 0; i < n; i++)
  a[i] = 0;

If we deal with the array consisting of more than UINT_MAX items, this code is incorrect. It is not easy to detect an error and predict the behavior of this code. The debug-version will hung but hardly will anyone process gigabytes of data in the debug-version. And the release-version, depending on the optimization settings and code's peculiarities, can either hung or suddenly fill all the array cells correctly producing thus an illusion of correct operation. As a result, there appear floating errors in the program occurring and vanishing with a subtlest change of the code. To learn more about such phantom errors and their dangerous consequences see the article "A 64-bit horse that can count" [1].

Another example of one more "sleeping" error which occurs at a particular combination of the input data (values of A and B variable):

int A = -2;
unsigned B = 1;
int array[5] = { 1, 2, 3, 4, 5 };
int *ptr = array + 3;
ptr = ptr + (A + B); //Error
printf("%i\n", *ptr);

This code will be correctly performed in the 32-bit version and print number "3". After compilation in 64-bit mode there will be a fail when executing the code. Let's examine the sequence of code execution and the cause of the error:

  • A variable of int type is cast into unsigned type;
  • A and B are summed. As a result, we get 0xFFFFFFFF value of unsigned type;
  • "ptr + 0xFFFFFFFFu" expression is calculated. The result depends on the pointer's size on the current platform. In the 32-bit program, the expression will be equal to "ptr - 1" and we will successfully print number 3. In the 64-bit program, 0xFFFFFFFFu value will be added to the pointer and as a result, the pointer will be far beyond the array's limits.

Such errors can be easily avoided by using size_t or ptrdiff_t types. In the first case, if the type of "i" variable is size_t, there will be no infinite loop. In the second case, if we use size_t or ptrdiff_t types for "A" and "B" variable, we will correctly print number "3".

Let's formulate a guideline: wherever you deal with pointers or arrays you should use size_t and ptrdiff_t types.

To learn more about the errors you can avoid by using size_t and ptrdiff_t types, see the following articles:

Performance of code using ptrdiff_t and size_t

Besides code safety, the use of ptrdiff_t and size_t types in address arithmetic can give you an additional gain of performance. For example, using int type as an index, the former's capacity being different from that of the pointer, will lead to that the binary code will contain additional data conversion commands. We speak about 64-bit code where pointers' size is 64 bits and int type's size remains 32 bits.

It is a difficult task to give a brief example of size_t type's advantage over unsigned type. To be objective we should use the compiler's optimizing abilities. And the two variants of the optimized code frequently become too different to show this very difference. We managed to create something like a simple example only with a sixth try. And still the example is not ideal because it demonstrates not those unnecessary data type conversions we spoke above, but that the compiler can build a more efficient code when using size_t type. Let's consider a program code arranging an array's items in the inverse order:

unsigned arraySize;
...
for (unsigned i = 0; i < arraySize / 2; i++)
{
  float value = array[i];
  array[i] = array[arraySize - i - 1];
  array[arraySize - i - 1] = value;
}

In the example, "arraySize" and "i" variables have unsigned type. This type can be easily replaced with size_t type, and now compare a small fragment of assembler code shown on Figure 1.

Figure N1.Comparison of 64-bit assembler code when using unsigned and size_t types

Figure N1.Comparison of 64-bit assembler code when using unsigned and size_t types

The compiler managed to build a more laconic code when using 64-bit registers. I am not affirming that the code created with the use of unsigned type will operate slower than the code using size_t. It is a very difficult task to compare speeds of code execution on modern processors. But from the example you can see that when the compiler operates arrays using 64-bit types it can build a shorter and faster code.

Proceeding from my own experience I can say that reasonable replacement of int and unsigned types with ptrdiff_t and size_t can give you an additional performance gain up to 10% on a 64-bit system. You can see an example of speed increase when using ptrdiff_t and size_t types in the fourth section of the article "Development of Resource-intensive Applications in Visual C++" [5].

Code refactoring with the purpose of moving to ptrdiff_t and size_t

As the reader can see, using ptrdiff_t and size_t types gives some advantages for 64-bit programs. However, it is not a good way out to replace all unsigned types with size_t ones. Firstly, it does not guarantee correct operation of a program on a 64-bit system. Secondly, it is most likely that due to this replacement, new errors will appear data format compatibility will be violated and so on. You should not forget that after this replacement the memory size needed for the program will greatly increase as well. And increase of the necessary memory size will slow down the application's work for cache will store fewer objects being dealt with.

Consequently, introduction of ptrdiff_t and size_t types into old code is a task of gradual refactoring demanding a great amount of time. In fact, you should look through the whole code and make the necessary alterations. Actually, this approach is too expensive and inefficient. There are two possible variants:

  • To use specialized tools like Viva64 included into PVS-Studio. Viva64 is a static code analyzer detecting sections where it is reasonable to replace data types for the program to become correct and work efficiently on 64-bit systems. To learn more, see "PVS-Studio Tutorial" [6].
  • If you do not plan to adapt a 32-bit program for 64-bit systems, there is no sense in data types' refactoring. A 32-bit program will not benefit in any way from using ptrdiff_t and size_t types.

References

/* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_TYPES_H #define _LINUX_TYPES_H #define __EXPORTED_HEADERS__ #include <uapi/linux/types.h> #ifndef __ASSEMBLY__ #define DECLARE_BITMAP(name,bits) \ unsigned long name[BITS_TO_LONGS(bits)] typedef u32 __kernel_dev_t; typedef __kernel_fd_set fd_set; typedef __kernel_dev_t dev_t; typedef __kernel_ulong_t ino_t; typedef __kernel_mode_t mode_t; typedef unsigned short umode_t; typedef u32 nlink_t; typedef __kernel_off_t off_t; typedef __kernel_pid_t pid_t; typedef __kernel_daddr_t daddr_t; typedef __kernel_key_t key_t; typedef __kernel_suseconds_t suseconds_t; typedef __kernel_timer_t timer_t; typedef __kernel_clockid_t clockid_t; typedef __kernel_mqd_t mqd_t; typedef _Bool bool; typedef __kernel_uid32_t uid_t; typedef __kernel_gid32_t gid_t; typedef __kernel_uid16_t uid16_t; typedef __kernel_gid16_t gid16_t; typedef unsigned long uintptr_t; #ifdef CONFIG_HAVE_UID16 /* This is defined by include/asm-{arch}/posix_types.h */ typedef __kernel_old_uid_t old_uid_t; typedef __kernel_old_gid_t old_gid_t; #endif /* CONFIG_UID16 */ #if defined(__GNUC__) typedef __kernel_loff_t loff_t; #endif /* * The following typedefs are also protected by individual ifdefs for * historical reasons: */ #ifndef _SIZE_T #define _SIZE_T typedef __kernel_size_t size_t; #endif #ifndef _SSIZE_T #define _SSIZE_T typedef __kernel_ssize_t ssize_t; #endif #ifndef _PTRDIFF_T #define _PTRDIFF_T typedef __kernel_ptrdiff_t ptrdiff_t; #endif #ifndef _CLOCK_T #define _CLOCK_T typedef __kernel_clock_t clock_t; #endif #ifndef _CADDR_T #define _CADDR_T typedef __kernel_caddr_t caddr_t; #endif /* bsd */ typedef unsigned char u_char; typedef unsigned short u_short; typedef unsigned int u_int; typedef unsigned long u_long; /* sysv */ typedef unsigned char unchar; typedef unsigned short ushort; typedef unsigned int uint; typedef unsigned long ulong; #ifndef __BIT_TYPES_DEFINED__ #define __BIT_TYPES_DEFINED__ typedef u8 u_int8_t; typedef s8 int8_t; typedef u16 u_int16_t; typedef s16 int16_t; typedef u32 u_int32_t; typedef s32 int32_t; #endif /* !(__BIT_TYPES_DEFINED__) */ typedef u8 uint8_t; typedef u16 uint16_t; typedef u32 uint32_t; #if defined(__GNUC__) typedef u64 uint64_t; typedef u64 u_int64_t; typedef s64 int64_t; #endif /* this is a special 64bit data type that is 8-byte aligned */ #define aligned_u64 __aligned_u64 #define aligned_be64 __aligned_be64 #define aligned_le64 __aligned_le64 /** * The type used for indexing onto a disc or disc partition. * * Linux always considers sectors to be 512 bytes long independently * of the devices real block size. * * blkcnt_t is the type of the inode's block count. */ typedef u64 sector_t; typedef u64 blkcnt_t; /* * The type of an index into the pagecache. */ #define pgoff_t unsigned long /* * A dma_addr_t can hold any valid DMA address, i.e., any address returned * by the DMA API. * * If the DMA API only uses 32-bit addresses, dma_addr_t need only be 32 * bits wide. Bus addresses, e.g., PCI BARs, may be wider than 32 bits, * but drivers do memory-mapped I/O to ioremapped kernel virtual addresses, * so they don't care about the size of the actual bus addresses. */ #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT typedef u64 dma_addr_t; #else typedef u32 dma_addr_t; #endif typedef unsigned int __bitwise gfp_t; typedef unsigned int __bitwise slab_flags_t; typedef unsigned int __bitwise fmode_t; #ifdef CONFIG_PHYS_ADDR_T_64BIT typedef u64 phys_addr_t; #else typedef u32 phys_addr_t; #endif typedef phys_addr_t resource_size_t; /* * This type is the placeholder for a hardware interrupt number. It has to be * big enough to enclose whatever representation is used by a given platform. */ typedef unsigned long irq_hw_number_t; typedef struct { int counter; } atomic_t; #define ATOMIC_INIT(i) { (i) } #ifdef CONFIG_64BIT typedef struct { s64 counter; } atomic64_t; #endif struct list_head { struct list_head *next, *prev; }; struct hlist_head { struct hlist_node *first; }; struct hlist_node { struct hlist_node *next, **pprev; }; struct ustat { __kernel_daddr_t f_tfree; #ifdef CONFIG_ARCH_32BIT_USTAT_F_TINODE unsigned int f_tinode; #else unsigned long f_tinode; #endif char f_fname[6]; char f_fpack[6]; }; /** * struct callback_head - callback structure for use with RCU and task_work * @next: next update requests in a list * @func: actual update function to call after the grace period. * * The struct is aligned to size of pointer. On most architectures it happens * naturally due ABI requirements, but some architectures (like CRIS) have * weird ABI and we need to ask it explicitly. * * The alignment is required to guarantee that bit 0 of @next will be * clear under normal conditions -- as long as we use call_rcu() or * call_srcu() to queue the callback. * * This guarantee is important for few reasons: * - future call_rcu_lazy() will make use of lower bits in the pointer; * - the structure shares storage space in struct page with @compound_head, * which encode PageTail() in bit 0. The guarantee is needed to avoid * false-positive PageTail(). */ struct callback_head { struct callback_head *next; void (*func)(struct callback_head *head); } __attribute__((aligned(sizeof(void *)))); #define rcu_head callback_head typedef void (*rcu_callback_t)(struct rcu_head *head); typedef void (*call_rcu_func_t)(struct rcu_head *head, rcu_callback_t func); typedef void (*swap_func_t)(void *a, void *b, int size); typedef int (*cmp_r_func_t)(const void *a, const void *b, const void *priv); typedef int (*cmp_func_t)(const void *a, const void *b); #endif /* __ASSEMBLY__ */ #endif /* _LINUX_TYPES_H */
最新发布
08-09
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值