Saturday, May 12, 2012

Byte-swap

Today I want to take a brief look at byte swapping from a different point of view. Many algorithms take mathematical concepts and turn them into a step-by-step process, but what if we turn it around, and mathematize a process that originates from computer science?
The idea of mathematizing computer programs is generally done within the context of the design of high-reliability systems and formal methods, which provide a way to prove that a program is correct. Perhaps if we took the time to ensure the correctness of the programs we write, then there wouldn't be so many bugs!

Monday, April 30, 2012

GoLang Highlights

I have been using Go quite frequently lately, and I would like to talk about the experience. There are several introductions to Go floating around, and reactions to its different conventions and ways of doing things than some people are used to. First, I would like to make it clear, though, that I have only used the gc compiler, as it was impossible to install gccgo on Mac OS X at the time of this writing.

What makes Go similar to X?

  • garbage collection
  • interface embedding
  • struct embedding
What makes Go better than X?
  • type syntax (backwards = fowards)
  • typed channels (no serialization)
  • implicit interfaces
  • segmented stacks

Similarities

Every scripting language has some form of garbage collection, so Go is not unique in that respect. Interface embedding is when you used a named interface type instead of a method when defining an interface type. It is similar to Haskell's => when constraining a type class. Struct embedding is when you use a named struct type instead of a field declaration when defining a struct type. This is very similar to inheritance in OOP languages, and it also gives the method sets of those child structs to the parent struct. Both of these are called embedding because they are syntactically similar, you just list the typename. However, they mean different things, and it's important to remember this.

Type Syntax

One of the distinguishing features of Go is its beautiful type syntax. Many programmers have noticed that writing types in C can be a headache, because when you read the type aloud, you have to read backward, or sometimes you have to skip around, back and forth in order to read the type properly. In Go there is no such problem, because the types are written exactly how you would say them in English. While this may be an issue in other languages, most speakers of languages of languages with SVO (Subject-Verb-Object) structure would be happy to see this change. It makes programming in Go a pleasure, and contributed greatly to my own productivity in my Go projects.

Typed Channels

Another Go feature is channels. Go channels are different from most other forms of I/O found in other languages because they are not based on bytes or characters, but data. Typed data. Which means instead of being required to serialize and deserialize your data when communicating between threads, or parts of a larger design, you can send them over a channel as is, knowing that you can use the data immediately instead of validating it first. Serialization is also a big issue with large software projects, since they may be written in many languages, which requires serialization in order to get data from one component to another. Go also has a serialization format, called gob, which can be considered a codec for most major types in the language. So you don't have to serialize, but if you want to, then it's available.

Implicit Interfaces

Another distinguishing feature of Go is implicit interfaces. Most languages with interfaces require explicit statements of implementation, such as Java and Haskell (for type classes). Go has no such requirement (and no such syntax). An interfaces is a collection of methods. A named type has an associated set of methods attached to it, called its method set. A named type implements an interface iff the methods it requires are a subset of the named type's method set. That's it! It's that simple. And because you don't have to make it explicit, it allows you to focus on more important things.

Segmented Stacks

The first and foremost under-appreciated and under-documented feature found only in Go is that of segmented stacks. Segmented stacks is what Go developers call the combination of implementation choices that led to the Go calling convention. It is an intersection of many features, such as: dynamic stack allocation, runtime checks, variable arguments, and deferred functions. Simply put, there is so much that happens between 'return' the statement, and 'RET' the instruction. First, the Go runtime has to check to see if there are any deferred functions to run, then it has to see if it allocated any stack space for the current function, and if it did and it isn't required anymore, then it frees the stack. Similarly, when a function is called, one of the first things that happens is the Go runtime checks to see if more stack space is required. What this means is that - in Go - there are no stack overflows!

Since segmented stacks are such an interesting feature, I suspect that I will revisit them in the future with a more in-depth discussion than I can ever hope to achieve in one paragraph.

Friday, March 2, 2012

Gödel and powerbooleans

What is a boolean?

A boolean is either true or false.

What is a powerboolean?

A powerboolean is a subset of the booleans (B):

  • True = {true}
  • Both = {true, false}
  • False = {false}
  • Null = {}

This set is equivalent to 2B (the powerset of the booleans), hence the name. The powerboolean operations are simply defined as the image of the normal boolean operations over B. The truth tables are as follows:

ANDTBFN
TTBFN
BBBFN
FFFFN
NNNNN
ORTBFN
TTTTN
BTBBN
FTBFN
NNNNN
NOT
TF
BB
FT
NN

These truth tables should not be confused with Belnap logic (which is another 4-valued logic) because there are different definitions for (N AND B), (B AND N), (N OR B), (B OR N). Also, In Belnap logic, B and N can be swapped, and the same rules apply, so they are not unique. In the powerbooleans, B and N cannot be swapped in a similar way, because if you do swap them, the truth tables would have to change. So the powerbooleans have unique values, they aren't just repetitions of some third truth value found in most 3-value logics. The 4-values of the powerbooleans are truely unique.

Who is Gödel?

Gödel is best known for his incompleteness theorems, but he is also known for the fuzzy logic named after him: "Gödel–Dummett" logic (also known as superintuitionistic, intermediate, or minimum t-norm logic). I've talked about product fuzzy logic before, but this time I'd like to talk about Gödel fuzzy logic. The operations are as follows:

  • (x AND y) = min(x, y)
  • (x OR y) = max(x, y)
  • NOT(x) = 1 - x

These operations are defined over a continuous interval from 0 to 1, so I can't really give truth tables for these, but to cut to the chase, they would look very similar to the tables above.

What do they have to do with each other?

If we let NOT(x) = (if x < 0, then x, else 1 - x), extend the interval of Gödel fuzzy logic below zero (which forces OR to be redefined in terms of our new NOT), and assign the following fuzzy values to each powerboolean:

  • True = 1
  • Both = 1/2
  • False = 0
  • Null = -1/2

then Gödel fuzzy logic and powerboolean operations are the same.

Thursday, February 16, 2012

A survey of 'divmod'

Rounding happens, and it happens far more than most people are aware of. As your web browser tries divide a web page (say, 917 pixels across) into 4 equal parts it uses rounding to find how many pixels each column is. Most of the time it is of little importance which direction numbers are rounded, but in some applications, it can be very noticeable. The solution to this problem is a simple calculation involving divmod:

  divmod(917, 4) = (229, 1)

which means a web page which is 917 pixels across can be divided into 4 columns, each of which is at least 229 pixels across, with 1 pixel left over. This concludes our example.

If the numbers being divided are real numbers or integers then there are many, many ways to round the division. If the numbers being divided are positive or unsigned integers, then there are fewer rounding modes (because rtn=rtz and rtp=rta) but there are still many ways. Despite how these modes may seem equivalent for mathematically positive numbers, they are different for fixed machine-size integers. For example, rtn is the same computationally on int32_t and uint32_t, as is rtp, but rtz and rta produce a different computational algorithms on signed and unsigned types. Regardless of the rounding mode chosen, the divmod axiom states:

  divmod(dividend, divisor) = (quotient, remainder)

implies

  dividend = divisor*quotient + remainder

or put in other words, quotient is approximately (dividend/divisor), but which way it is rounded is up to the rounding mode. This is true for every variant in this article except for divmod_euc, which we will discuss later.

This brings us to our analysis of the most common variants, which are usually called truncated division (also called quo-rem), and floored division (also called div-mod). The systems surveyed are: C (POSIX), OpenCL (a variant of C), libMPFR (the R stands for rounding), and Scheme (R5RS, R6RS, and a R7RS draft),

divmod_rtz(a,b)
• "round towards zero"
• 'rtz' suffix (OpenCL)
• x86:idiv (CPU instruction)
• c99:% (C operator)
• c:mod[fl]? (POSIX)
• c:trunc[fl]? (POSIX)
• c:FE_TOWARDZERO (POSIX)
• c:MPFR_RNDZ (libMPFR)
• rnrs:truncate (Scheme)
• r5rs:quotient (Scheme)
• r5rs:remainder (Scheme)
• r7rs:truncate/ (Scheme)
• "rem() has same sign as dividend"

divmod_rtn(a,b)
• "round towards negative infinity"
• 'rtn' suffix (OpenCL)
• c:floor[fl]? (POSIX)
• c:FE_DOWNWARD (POSIX)
• c:MPFR_RNDD (libMPFR)
• rnrs:floor (Scheme)
• r5rs:modulo (Scheme)
• r7rs:floor/ (Scheme)
• "mod() has same sign as divisor"

The problem with these two variants being so common is that it is easy to misuse them. It is quite common to use (A % B) as an index of an array of B elements, without checking if A is positive first. This introduces a buffer overflow error when the program tries to get the -4th element of the array, because that memory address may not be allocated yet, and even worse, if it is allocated then it's probably the wrong data! The core issue with both rtz and rtn rounding modes is that they may give negative remainders. However, there is less possibility of errors with mod_rtn when B is positive, because mod_rtn gives a positive remainder in that case. C99 was the first version of C that actually specified that "%" was mod_rtz, but before the 1999 version, that operator could also be mod_rtn, in which case the operator would be safe. Since C99 standardized on mod_rtz, however, now we know that "%" is unsafe, and therefore, we should always test if A is positive first. Another option would be to use mod_euc, which is discussed later in this article.

The next variant doesn't have a name, but it is a division involving the ceiling() function. Maybe someday we'll have a name for it.

divmod_rtp(a,b) 
• "round towards positive infinity"
• 'rtp' suffix (OpenCL)
• c:ceil[fl]? (POSIX)
• c:FE_UPWARD (POSIX)
• c:MPFR_RNDU (libMPFR)
• rnrs:ceiling (Scheme)
• r7rs:ceiling/ (Scheme)

I couldn't find any implementations of functions for the next two variants, but there is a rounding mode constant in MPFR. Anyways, here they are:

divmod_rta(a,b)
• "round away form zero"
• c:MPFR_RNDA (libMPFR)

divmod_rnz(a,b)
• "round to nearest, ties towards zero"
• (no examples)

There are also some rounding modes used in POSIX (also known as the Single UNIX Specification (the two standards have been synchronized since 2001), what most C programs use) that appear to depend on the environment for which rounding mode to use. The first is completely dependent on the current rounding mode, and the second is basically rn_ (round to nearest), except that ties are rounded according to the current rounding mode. The third seems uncommon in that C is the only language that uses it.

divmod_rtd(a,b)
• "round ... default behavior"
• c:nearbyint[fl]? (POSIX)

divmod_rnd(a,b)
• "round to nearest, ties ... default behavior"
• c:l?l?rint[fl]? (POSIX)

divmod_rna(a,b)
• "round to nearest, ties away from zero"
• c:l?l?round[fl]? (POSIX)

The next variants are vary rare, because they cannot be found in C:

divmod_rnn(a,b)
• "round to nearest, ties towards negative infinity"
• "Round half down" (Wikipedia)
• r6rs:div0, r6rs:mod0 (Scheme)

divmod_rnp(a,b)
• "round to nearest, ties towards positive infinity"
• "Round half up" (Wikipedia)
• elementary school rounding (in the U.S.)

The next variant, known as rte (round ties to even), is probably the fairest variant, in that it has no bias. Most of the variants above have a bias towards positive numbers or negative numbers, or a bias towards zero. The following variant is probably most well-known as being specified by the IEEE-754 floating-point standard. Its claim to fame is that it is unbiased, and it does not introduce any tendancies towards any particular direction.

divmod_rte(a,b)
• "round to nearest, ties to even"
• "Round half to even" (Wikipedia)
• 'rte' suffix (OpenCL)
• c:remquo[fl]? (OpenCL & POSIX)
• c:remainder[fl]? (OpenCL & POSIX)
• c:FE_TONEAREST (POSIX)
• c:MPFR_RNDN (libMPFR)
• r7rs:round/ (Scheme)

The next variant is by far the most advanced divmod on the planet. It is so advanced that it cannot be described by a rounding mode. All of the variants above can be described as div_?(a,b) = round_?(a/b) where the ? are replaced with a rounding mode, but not the next one. Its definition would be div_euc(a,b) = sign(b)*floor(a/abs(b)), but that doesn't even begin to describe it's awesomeness. The reason why div_euc is so amazing is that mod_euc is always nonnegative. Period.

divmod_euc(a,b)
• r6rs:div, r6rs:mod (Scheme)
• r7rs:euclidean/ (Scheme)

I am not the first to notice this. There is a paper titled "The Euclidean definition of the functions div and mod" (from 1992) and R6RS Scheme is also very insistant on this algorithm. In the paper, the author mentions how "definitional engineering" is at fault for the difficulties in using other rounding systems and in particular, fields in computer science such as arithmetic coding, arbitrary precision arithmetic, and programming language foundations all would benefit from (or in my opinion: require) the Euclidean definition, and yet there is no programming language (except for Algol and Scheme) that uses it.

Bibliography

  • OpenCL 1.1 Section 6.2.3.2 Rounding Modes.
  • C99/POSIX <fenv.h> and <math.h>.
  • Revised (5|6|7) Report on Scheme.
  • The libmpfr documentation.
  • The Euclidean definition of the functions div and mod. Raymond Boute. ACM Transactions on Programming Languages and Systems, Vol 14, No. 2, April 1992, Pages 127-144.

Saturday, January 14, 2012

On 'int128_t'

Every programming language has built-in integer types, both signed (representing mathematical integers) and unsigned (representing mathematical nonnegative integers). C compilers usually give these types fancy names like 'unsigned long long', and have a nasty habit of changing the sizes (and meanings) of these types on different platforms. The C type 'short' is usually 16 bits, 'int' either 16 or 32 bits, and 'long' either 32 or 64 bits, depending on platform. This eventually led to the 'stdint.h' header in C99, which provides exact-size types which can be used accross platforms. These are named 'int32_t', 'int64_t', 'uint32_t', 'uint64_t', and so on.

With stuff getting bigger, it's natural to ask the question: "Why 64?" and the answer is generally because the highest integer type most hardware can deal with is 64 bits. Can we go higher? Of course! but how? In this article, I will show you how to define 'int128_t' and 'uint128_t' in C without any compiler hacks. They can be used as parameter types and return types from functions, and they don't require any special memory management or allocation, because they're not pointer types.

First, you might say we could just make an array type:
 typedef int32_t int128_as_int32x4_t[4];
typedef int64_t int128_as_int64x2_t[2];
but which one to we pick? What we really need is a union of each of these, so we can decide later which array type to use. However, neither arrays nor unions can be used as return values from functions, only struct's can be used as return values. So in order to have a type that can be used as a return value we need to make a struct of a union of array types, as follows:
typedef struct int128_s {
union int128_u {
int8_t as_int8[16];
int16_t as_int16[8];
int32_t as_int32[4];
int64_t as_int64[2];
} value;
} int128_t;
and wrap this type in a typedef. But how do we use these new integers? First of all we need some way of constructing 'int128_t's, and in the spirit of 'stdint.h' we can make a 'INT128_C()' macro which expands to a constructed object of type 'int128_t'. We'll need a few functions for this:
int128_t int128_from_int(int from);
int128_t int128_from_str(char *from);
int int_from_int128(int128_t from);
int str_from_int128(char *to, int to_size, int128_t from);
and we can use the second one to define the macro as:
#define INT128_C(x) int128_from_str(#x)
because '(#x)' indicates to the preprocessor to turn x into a string before compile-time, which is then passed to int128_from_str which then returns an object of type 'int128_t'. For compilers that do not support compile-time constant expressions involving function calls, we can also define simpler macros as follows:
#ifdef BIG_ENDIAN
#define INT128_C64(a,b)\
(int128_t){.value = {.as_int64 = {a, b}}}
#define INT128_C32(a,b,c,d)\
(int128_t){.value = {.as_int32 = {a, b, c, d}}}
#else
#define INT128_C64(a,b)\
(int128_t){.value = {.as_int64 = {b, a}}}
#define INT128_C32(a,b,c,d)\
(int128_t){.value = {.as_int32 = {d, c, b, a}}}
#endif
Note that because we use designators (value and as_int##) this part requires a C99 compiler

Conclusion

In order to use this integer type, we also need dozens of other functions, such as add, mul, sub, div, mod, and, or, xor, lsh, rsh, pow, etc., just to match the functionality usually associated with C integer types, and from there the possibilities are endless. A future article could revisit these functions. For now, though, I just wanted to bring focus to this integer type, especially considering how many common-place datatypes fit into an 'int128_t' such as UUID's and IPv6 addresses. We may need this sooner than we think.

Friday, January 28, 2011

Tetrational-point

As you might have guessed, most of my posts are a synthesis of two or more ideas. This time, they are IEEE-754 floating point formats and composite arithmetic formats. To provide some overview, IEEE-754 has been the official standard for representing numbers in computers for over 25 years, and is widely deployed in practically every computer architecture. Composite arithmetic on the other hand is a new approach which uses a combination of integer, rational, scientific, and tetrational representation in unison to achieve greater accuracy, precision, and dynamic range.

References for composite arithmetic include: "Beyond Floating Point" (by C.W.Clenshaw and F.W.J.Olver) which focuses on extending floating point with tetration alone, "Composite Arithmetic, a Proposal for a New Standard" and "Composite Arithmetic, A Storage Form" (both by W. Neville Holmes) which both focus on combining all four approaches, "Design of a Composite Arithmetic Unit for Rational Numbers" (by Tomasz Pinkiewicz, W. Neville Holmes, and Tariq Jamil) which focuses on the implementation of these in hardware, "Design of a 32-bit Arithmetic Unit based on Composite Arithmetic and its Implementation on a Field Programmable Gate Array." (by Tomasz Hubert Pinkiewicz) which focuses on the same, and "Lecture notes on: Computer Arithmetic: Principles, Architectures, and VLSI Design" (by Reto Zimmermann) which seems to be an overview of the subject targeted at students.

The most recent edition of IEEE-754 introduced a 16-bit floating point format, called 'binary16' for short. Since this is the smallest standard floating point format, we will use it as an example of how IEEE-754 represents numbers. To begin we will see how it represents 1.5:

s exponent significand
bin 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0
hex 3 E 0 0
value = 1.5

and this is how the standard binary16 format represents infinity:

s exponent significand
bin 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
hex 7 C 0 0
value = Infinity

As you can see, binary16 uses a 5-bit exponent and a 10-bit significand. One important class of representations which do not represent numbers is known collectively as Not-A-Number (NaN) representations. These are when the exponent is all 1's, regardless of the sign. Infinity is a particular example of this, since the exponent is all 1's and the significand is all 0's. If the significand is nonzero, then the remaining representations fall into two categories: signaling NaNs (sNaN) and quiet NaNs (qNaN). A signaling NaN is traditionally represented with a 0 in the MSB of the significand, and some other bit nonzero. A quiet NaN is traditionally represented with a 1 in the MSB of the significand, with anything in the other bits. Since sNaNs can cause errors and trap handlers to invoke, it is a bad idea to use them to represent numbers, since many architectures will automatically convert sNaNs to qNaNs. So, we will leave sNaNs alone, and replace qNaNs with a tetrational number representation format.

This tetrational format will use the two MSBs to represent quietness (q) and height (h) respectively. The tetrand will become the exponent at the top of the exponential tower. So a height of 0 will indicate a value of 2^2^2^2^2^(0.tetrand), and a height of 1 will indicate a value of 2^2^2^2^2^2^(0.tetrand). In general, the value associated with the tetrational format is:

(-1)sexp2h+5(0.tetrand) -- HTML
-1 s exp 2 h + 5 0.tetrand -- MathML

We will start at 65536, which is just greater than the largest value represented by binary16 (65504). Extending this to larger formats (like binary32) should probably start here as well, because the next height is too much larger than the largest number represented by binary32. Starting from 65536, we can represent it as follows:

binary16: s exponent significand
tetra16: s exponent q h tetrand
bin 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
hex 7 E 0 0
value ≈ 65536

We get this value because 2^2^2^2^2^(0.0) = 2^2^2^2^1 = 2^2^2^2 = 2^2^4 = 2^16 = 65536. The next greater representation is

binary16: s exponent significand
tetra16: s exponent q h tetrand
bin 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1
hex 7 E 0 1
value ≈ 71036

and perhaps another interesting representation is that of googolplex:

binary16: s exponent significand
tetra16: s exponent q h tetrand
bin 0 1 1 1 1 1 1 1 1 0 1 1 0 0 1 0
hex 7 F B 2
value ≈ googolplex = 10^(10^100)

Note that we only used a single bit for the height of the exponential tower, but we could have used more. I think that 2 bits should be used for height in binary32, and 3 bits should be used for height in binary64 and binary128. All floating point formats would encode the height and tetrand in the extra bits of a qNaN so as to avoid conflicts with hardware arithmetic.

Here is an overview of the extensions to the binary16 format:

#x0000 = +0
#x0001 = 5.960464477539063e-8
... (standard binary16 floating-point) ...
#x7BFF = 65504.
#x7C00 = +Infinity
#x7C01 = +sNaN
... (standard binary16 signaling NaN) ...
#x7DFF = +sNaN
#x7E00 = 65536 = 2^2^2^2^2^(0)
#x7E01 = 71036.
... (new tetrational-point) ...
#x7E?? = 2^2^2^2^2^(0.??_16)
...
#x7EFE = 8.9423389054e+15721
#x7EFF = 6.2622606021e+17594
#x7F00 = 2^65536 = 2^2^2^2^2^2^(0) = 2^2^2^2^2
#x7F01 = 6.8504792165e+21383
... (new tetrational-point) ...
#x7F?? = 2^2^2^2^2^2^(0.??_16)
...
#x7FFE = 10^(2.69191e+15721) which is < 2^(2^65536)
#x7FFF = +qNaN (unique)
#x8000 = -0
#x8001 = -5.960464477539063e-8
... (standard binary16 floating-point) ...
#xFBFF = -65504.
#xFC00 = -Infinity
#xFC01 = -sNaN
... (standard binary16 signaling NaN) ...
#xFDFF = -sNaN
#xFE00 = -65536
#xFE01 = -71036.
... (new tetrational-point) ...
#xFE?? = -2^2^2^2^2^(0.??_16)
...
#xFEFE = -8.9423389054e+15721
#xFEFF = -6.2622606021e+17594
#xFF00 = -2^65536
#xFF01 = -6.8504792165e+21383
... (new tetrational-point) ...
#xFF?? = -2^2^2^2^2^2^(0.??_16)
...
#xFFFE = -10^(2.69191e+15721) which is > -2^(2^65536)
#xFFFF = -qNaN (unique)

In conclusion, we have shown exactly how to increase the dynamic range (quite dramatically) of standard IEEE-754 floating point formats by using a tetrational representation encoded in the unused bits of quiet NaNs. This provides a backwards-compatible hardware-accelerated tetrational floating point format which will encode approximate overflows as precisely as possible. So instead of overflows, you get a number! This makes it possible to be more informative when it comes to mathematical functions such as exp, log, or 1/x. With a bigger dynamic range, many of the outputs of these functions will no longer be indeterminate, but instead, finite.

Monday, January 3, 2011

GoLang Proposals

Go is a very new programming language, barely a year old. Since its initial release, it has become an instant success, almost overnight. Google designed Go primarily for high-level systems programming. Since then, Google has replaced a few of its internal servers with custom Go servers. I believe this success is driven by the simplicity of Go, and its special features which focus on concurrency.

Compared to C, it has many more features so I consider its features a strict superset of C's features. Compared to C++ however, Go is very different, possibly described by a classic Venn-diagram in which Go's features and C++'s features intersect, but both have features the other doesn't. For example, method overloading can be accomplished in both languages, but in different ways.One feature that C++ has enjoyed is that of templates, which do not exist in Go. This requires that algorithms that can work on multiple datatypes be written out each time, which negatively affects readability and maintainability. While adding full-fledged C++ templates to Go is certainly possible, I believe we need to look at other languages for guidance.

Java started out as a simple language. Java's success is due in part to this simplicity, however, language designers at Sun (now Oracle) decided very late in the process (2004, a full 9 years after its introduction in 1995) to add generics to the language. In addition, they chose to use C++ syntax, which has conflicts with the shift operator (>>). In my opinion, avoiding the introduction of generics has divided the community, effected incompatibilities, and complicated the standard library with multiple versions of the same function.

Depending on your definition of generics, they may have different kinds of parameters. The most intellectually challenging are Agda generics, which imply that generics are just functions which return types, and any function may take types as a parameter. C++ templates are similar to this kind of generics, but make a distinction between usual functions and functions returning types. Since this is a minority when it comes to generics implementations, we will not consider this kind of generics in the remainder of this article.

The most prevalent form of generics are found in Java and Haskell, which only allow types as parameters. From this point of view, the only generic type found in Go today is the map type, so it will be the primary example used.


Proposal #1: Add first-class types

This is doomed for failure, because it requires Agda-like (or C++/RTTI) semantics, but it is worth discussing. It is the proposal which requires the least syntactic changes, even if it might be the most intellectually demanding on semantics and runtime. If Go did have first-class types, then we could define the Map type as follows:

func Map(KeyT .(type), T .(type)) .(type) {
type mapT []struct{
hash int
key KeyT
value T
}
return mapT
}

Using this proposal, we can simulate map[KeyT]T with Map(KeyT, T). As you can see, type parameters are indicated with the ".(type)" token as is found in Go type-switch statements. One advantage of this system would be that it could be used to define Array(n, T) and Matrix(m, n, T) which would be impossible to define using more restrictive generics. Other functions that could defined using this proposal are new and make, which take a type as their first parameter. Here is a summary of the related changes to the Go specification:

Result        |= ".(type)"
ParameterDecl |= TypeName ".(type)"
Expression |= TypeName

In addition to these syntactic changes, a few paragraphs of rules and discussion maybe required.


Proposal #2: Add 'generic' (or '_Generic') keyword

This is the crux of this blog article. Since first-class types require that types and objects be intermixed, it requires that every parameter be suffixed with .(type), but with the generic keyword, we can enforce that every parameter be a type parameter (Java/Haskell-style generics). This makes declarations much easier to read, and has provides a much more structured style.

generic Map[KeyT]T []struct{
hash int
key KeyT
value T
}

Using this proposal, we can simulate map[KeyT]T with Map[KeyT]T. As you can see, this is very close to the standard built-in map type, with the obvious case difference. Here is an overview of the associated changes to the Go specification:

TypeLit       |= GenericType
TopLevelDecl |= GenericDecl
GenericType = "generic" GSignature Type
GenericDecl = "generic" identifier GSignature Type
GSignature = GParameters identifier
GParameters = "[" [ GParameterList [ "," ] ] "]"
GParameterList = GParameterDecl { "," GParameterDecl }
GParameterDecl = identifier

Having considered adding generic syntax to LiteralType, I don't think it would work well with literal syntax or semantics. What would such a literal mean? Would it simply be syntactic sugar for the base-type of the generic type with all the free variables filled in? If all that would be gained is syntax sugar, then I don't see the value of adding it to literal syntax.

This generics model also allows more than one parameter inside the brackets, but it also requires at least one parameter after the brackets, which is also very similar to how generic types work in Haskell, since the prevalence of monads has made the last parameter more important than the others. This generics model is also much closer to Go's existing map type, the only problem now is that the first letter is uppercase "M" whereas the built-in map type has a lowercase "m". This leads us to our next section, which discusses another proposal.


Proposal #3: Add 'attrib' (or '_Attrib') keyword

This proposal combines two issues into a single solution. The first issue is nonessential attributes (such as alignment, packing, etc.) for which GCC uses the __attribute__((id)) syntax. The second issue is the public/private convention in Go which may be an issue in the future if it is used to implement existing APIs such as POSIX, which require lowercase external symbols. The solution I propose to both of these issues are a new syntax: _Attrib(id), which allows simple annotation of declarations in such a way as to override normal Go semantics. Consider two such attributes: public and private, which would allow us to define the built-in map type as follows:

generic attrib(public) map[KeyT]T []struct{
hash int
key KeyT
value T
}

Using this proposal, there would be no difference between this defined type and the built-in map type. This would allow very small Go compilers which need only handle arrays and slices, and leave the map type, and possibly other built-in functions such as cap and len to a library implementation.

The syntax was designed to force the attrib specifier immediately before the identifier, to make it clear which identifier it is referring to. The changes to the MethodDecl production are questionable, and require more consideration and discussion. I have also considered moving AttribSpec to before the keywords which would simplify the grammar significantly. This is the summary of all the required extensions to the Go specification.

ConstSpec    |= AttribSpec IdentifierList 
[ [ Type ] "=" ExpressionList ]
TypeSpec |= AttribSpec identifier Type
VarSpec |= AttribSpec IdentifierList
( Type [ "=" ExpressionList ]
| "=" ExpressionList )
MethodDecl |= "func" Receiver AttribSpec
MethodName Signature [ Body ]
FunctionDecl |= "func" AttribSpec identifier
Signature [ Body ]
GenericDecl |= "generic" AttribSpec identifier
GSignature Type
AttribSpec = "attrib" "(" identifier
[ "(" ExpressionList ")" ] ")"

Proposal #4: Add 'pragma' (or '_Pragma') keyword

Another method used for nonessential attributes are top-level pragmas. Using pragmas, you can specify information to be used on a per-file or per-package basis. This could be a useful and simple extension to the language that could bring existing GCC syntax to Go compilers. Here is an example of using pragmas in a Go source file:

pragma("GCC poison strdup")
pragma("DSGO prefix pthread_")

Other possibilities for pragmas are to use Go with OpenMP, even though it might sound funny at first, because most of the features of OpenMP are already in Go! However, we should never underestimate what people will do with compilers, given the chance. Here is a summary of the associated extensions to the Go specification:

TopLevelDecl |= PragmaDecl
PragmaDecl = "pragma" "(" string_lit ")"

Conclusion

We have reviewed four possible extensions to the Go programming language, which may not seem in the spirit of Go right now, but over time may become necessary. If Go is to be used for serious projects, developers may come to expect these features, rather than hope for them. As we have learned from Java, if we wait for too long to introduce new features to Go, it may polarize the community, which would be an undesirable outcome for everyone.