Thursday, February 16, 2012

A survey of 'divmod'

Rounding happens, and it happens far more than most people are aware of. As your web browser tries divide a web page (say, 917 pixels across) into 4 equal parts it uses rounding to find how many pixels each column is. Most of the time it is of little importance which direction numbers are rounded, but in some applications, it can be very noticeable. The solution to this problem is a simple calculation involving divmod:

  divmod(917, 4) = (229, 1)

which means a web page which is 917 pixels across can be divided into 4 columns, each of which is at least 229 pixels across, with 1 pixel left over. This concludes our example.

If the numbers being divided are real numbers or integers then there are many, many ways to round the division. If the numbers being divided are positive or unsigned integers, then there are fewer rounding modes (because rtn=rtz and rtp=rta) but there are still many ways. Despite how these modes may seem equivalent for mathematically positive numbers, they are different for fixed machine-size integers. For example, rtn is the same computationally on int32_t and uint32_t, as is rtp, but rtz and rta produce a different computational algorithms on signed and unsigned types. Regardless of the rounding mode chosen, the divmod axiom states:

  divmod(dividend, divisor) = (quotient, remainder)

implies

  dividend = divisor*quotient + remainder

or put in other words, quotient is approximately (dividend/divisor), but which way it is rounded is up to the rounding mode. This is true for every variant in this article except for divmod_euc, which we will discuss later.

This brings us to our analysis of the most common variants, which are usually called truncated division (also called quo-rem), and floored division (also called div-mod). The systems surveyed are: C (POSIX), OpenCL (a variant of C), libMPFR (the R stands for rounding), and Scheme (R5RS, R6RS, and a R7RS draft),

divmod_rtz(a,b)
• "round towards zero"
• 'rtz' suffix (OpenCL)
• x86:idiv (CPU instruction)
• c99:% (C operator)
• c:mod[fl]? (POSIX)
• c:trunc[fl]? (POSIX)
• c:FE_TOWARDZERO (POSIX)
• c:MPFR_RNDZ (libMPFR)
• rnrs:truncate (Scheme)
• r5rs:quotient (Scheme)
• r5rs:remainder (Scheme)
• r7rs:truncate/ (Scheme)
• "rem() has same sign as dividend"

divmod_rtn(a,b)
• "round towards negative infinity"
• 'rtn' suffix (OpenCL)
• c:floor[fl]? (POSIX)
• c:FE_DOWNWARD (POSIX)
• c:MPFR_RNDD (libMPFR)
• rnrs:floor (Scheme)
• r5rs:modulo (Scheme)
• r7rs:floor/ (Scheme)
• "mod() has same sign as divisor"

The problem with these two variants being so common is that it is easy to misuse them. It is quite common to use (A % B) as an index of an array of B elements, without checking if A is positive first. This introduces a buffer overflow error when the program tries to get the -4th element of the array, because that memory address may not be allocated yet, and even worse, if it is allocated then it's probably the wrong data! The core issue with both rtz and rtn rounding modes is that they may give negative remainders. However, there is less possibility of errors with mod_rtn when B is positive, because mod_rtn gives a positive remainder in that case. C99 was the first version of C that actually specified that "%" was mod_rtz, but before the 1999 version, that operator could also be mod_rtn, in which case the operator would be safe. Since C99 standardized on mod_rtz, however, now we know that "%" is unsafe, and therefore, we should always test if A is positive first. Another option would be to use mod_euc, which is discussed later in this article.

The next variant doesn't have a name, but it is a division involving the ceiling() function. Maybe someday we'll have a name for it.

divmod_rtp(a,b) 
• "round towards positive infinity"
• 'rtp' suffix (OpenCL)
• c:ceil[fl]? (POSIX)
• c:FE_UPWARD (POSIX)
• c:MPFR_RNDU (libMPFR)
• rnrs:ceiling (Scheme)
• r7rs:ceiling/ (Scheme)

I couldn't find any implementations of functions for the next two variants, but there is a rounding mode constant in MPFR. Anyways, here they are:

divmod_rta(a,b)
• "round away form zero"
• c:MPFR_RNDA (libMPFR)

divmod_rnz(a,b)
• "round to nearest, ties towards zero"
• (no examples)

There are also some rounding modes used in POSIX (also known as the Single UNIX Specification (the two standards have been synchronized since 2001), what most C programs use) that appear to depend on the environment for which rounding mode to use. The first is completely dependent on the current rounding mode, and the second is basically rn_ (round to nearest), except that ties are rounded according to the current rounding mode. The third seems uncommon in that C is the only language that uses it.

divmod_rtd(a,b)
• "round ... default behavior"
• c:nearbyint[fl]? (POSIX)

divmod_rnd(a,b)
• "round to nearest, ties ... default behavior"
• c:l?l?rint[fl]? (POSIX)

divmod_rna(a,b)
• "round to nearest, ties away from zero"
• c:l?l?round[fl]? (POSIX)

The next variants are vary rare, because they cannot be found in C:

divmod_rnn(a,b)
• "round to nearest, ties towards negative infinity"
• "Round half down" (Wikipedia)
• r6rs:div0, r6rs:mod0 (Scheme)

divmod_rnp(a,b)
• "round to nearest, ties towards positive infinity"
• "Round half up" (Wikipedia)
• elementary school rounding (in the U.S.)

The next variant, known as rte (round ties to even), is probably the fairest variant, in that it has no bias. Most of the variants above have a bias towards positive numbers or negative numbers, or a bias towards zero. The following variant is probably most well-known as being specified by the IEEE-754 floating-point standard. Its claim to fame is that it is unbiased, and it does not introduce any tendancies towards any particular direction.

divmod_rte(a,b)
• "round to nearest, ties to even"
• "Round half to even" (Wikipedia)
• 'rte' suffix (OpenCL)
• c:remquo[fl]? (OpenCL & POSIX)
• c:remainder[fl]? (OpenCL & POSIX)
• c:FE_TONEAREST (POSIX)
• c:MPFR_RNDN (libMPFR)
• r7rs:round/ (Scheme)

The next variant is by far the most advanced divmod on the planet. It is so advanced that it cannot be described by a rounding mode. All of the variants above can be described as div_?(a,b) = round_?(a/b) where the ? are replaced with a rounding mode, but not the next one. Its definition would be div_euc(a,b) = sign(b)*floor(a/abs(b)), but that doesn't even begin to describe it's awesomeness. The reason why div_euc is so amazing is that mod_euc is always nonnegative. Period.

divmod_euc(a,b)
• r6rs:div, r6rs:mod (Scheme)
• r7rs:euclidean/ (Scheme)

I am not the first to notice this. There is a paper titled "The Euclidean definition of the functions div and mod" (from 1992) and R6RS Scheme is also very insistant on this algorithm. In the paper, the author mentions how "definitional engineering" is at fault for the difficulties in using other rounding systems and in particular, fields in computer science such as arithmetic coding, arbitrary precision arithmetic, and programming language foundations all would benefit from (or in my opinion: require) the Euclidean definition, and yet there is no programming language (except for Algol and Scheme) that uses it.

Bibliography

  • OpenCL 1.1 Section 6.2.3.2 Rounding Modes.
  • C99/POSIX <fenv.h> and <math.h>.
  • Revised (5|6|7) Report on Scheme.
  • The libmpfr documentation.
  • The Euclidean definition of the functions div and mod. Raymond Boute. ACM Transactions on Programming Languages and Systems, Vol 14, No. 2, April 1992, Pages 127-144.