Saturday, January 14, 2012

On 'int128_t'

Every programming language has built-in integer types, both signed (representing mathematical integers) and unsigned (representing mathematical nonnegative integers). C compilers usually give these types fancy names like 'unsigned long long', and have a nasty habit of changing the sizes (and meanings) of these types on different platforms. The C type 'short' is usually 16 bits, 'int' either 16 or 32 bits, and 'long' either 32 or 64 bits, depending on platform. This eventually led to the 'stdint.h' header in C99, which provides exact-size types which can be used accross platforms. These are named 'int32_t', 'int64_t', 'uint32_t', 'uint64_t', and so on.

With stuff getting bigger, it's natural to ask the question: "Why 64?" and the answer is generally because the highest integer type most hardware can deal with is 64 bits. Can we go higher? Of course! but how? In this article, I will show you how to define 'int128_t' and 'uint128_t' in C without any compiler hacks. They can be used as parameter types and return types from functions, and they don't require any special memory management or allocation, because they're not pointer types.

First, you might say we could just make an array type:
 typedef int32_t int128_as_int32x4_t[4];
typedef int64_t int128_as_int64x2_t[2];
but which one to we pick? What we really need is a union of each of these, so we can decide later which array type to use. However, neither arrays nor unions can be used as return values from functions, only struct's can be used as return values. So in order to have a type that can be used as a return value we need to make a struct of a union of array types, as follows:
typedef struct int128_s {
union int128_u {
int8_t as_int8[16];
int16_t as_int16[8];
int32_t as_int32[4];
int64_t as_int64[2];
} value;
} int128_t;
and wrap this type in a typedef. But how do we use these new integers? First of all we need some way of constructing 'int128_t's, and in the spirit of 'stdint.h' we can make a 'INT128_C()' macro which expands to a constructed object of type 'int128_t'. We'll need a few functions for this:
int128_t int128_from_int(int from);
int128_t int128_from_str(char *from);
int int_from_int128(int128_t from);
int str_from_int128(char *to, int to_size, int128_t from);
and we can use the second one to define the macro as:
#define INT128_C(x) int128_from_str(#x)
because '(#x)' indicates to the preprocessor to turn x into a string before compile-time, which is then passed to int128_from_str which then returns an object of type 'int128_t'. For compilers that do not support compile-time constant expressions involving function calls, we can also define simpler macros as follows:
#ifdef BIG_ENDIAN
#define INT128_C64(a,b)\
(int128_t){.value = {.as_int64 = {a, b}}}
#define INT128_C32(a,b,c,d)\
(int128_t){.value = {.as_int32 = {a, b, c, d}}}
#else
#define INT128_C64(a,b)\
(int128_t){.value = {.as_int64 = {b, a}}}
#define INT128_C32(a,b,c,d)\
(int128_t){.value = {.as_int32 = {d, c, b, a}}}
#endif
Note that because we use designators (value and as_int##) this part requires a C99 compiler

Conclusion

In order to use this integer type, we also need dozens of other functions, such as add, mul, sub, div, mod, and, or, xor, lsh, rsh, pow, etc., just to match the functionality usually associated with C integer types, and from there the possibilities are endless. A future article could revisit these functions. For now, though, I just wanted to bring focus to this integer type, especially considering how many common-place datatypes fit into an 'int128_t' such as UUID's and IPv6 addresses. We may need this sooner than we think.