# A division and n-root algorithm for big numbers

In a previous post, we introduced a library for performing arithmetic operations with arbitrary precision, and in this update we optimized the multiplication algorithm using the Fast Fourier Transform factorization.

In this post, using the FFT-based multiplication algorithm as the cornerstone of our library, we will optimize the division algorithm and introduce some other methods for computing square and quartic roots. In a previous example, we succeeded to compute the largest known prime number working with integer arithmetic; with the new ingredients, we will be able to compute some other interesting things, such as millions of decimals of Pi or other irrational numbers using convergent series.

## A new division algorithm

We have already optimized the multiplication method, moving from a $O(n^2)$ time complexity algorithm to a $O(n\,log_2(n))$ one. But the division is still computed with a long division algorithm, with a time complexity of $O(n^2)$. We will try to find a faster algorithm using the Newton Raphson method to converge to the solution.

The problem is: having the numbers $a$ and $b$, we want to compute the quotient $c = \dfrac{a}{b}$. The following function has a zero at the desired quotient:

$f(x) = b\,x - a.$

Applying the Newton-Raphson method to it, starting with an initial guess $x_0$, we can compute a sequence $\{x_k\},\,k=1,\,2,\,\ldots$ that converges quadratically to $\dfrac{a}{b}$ in the following way

$x_{k+1} = x_{k} - \dfrac{f(x_k)}{f'(x_{k})} = x_{k} - \dfrac{b\,x_{k} - a}{b}$

Turns out that we didn’t make any progress yet, as long as we still need to compute a division with the same divisor $b$. But, conceiving the division $\dfrac{a}{b}$ as a multiplication by the inverse $a\,\left(b^{-1}\right)$, and being that inverse $b^{-1}$ a zero of the following trivial function:

$f(x) = \dfrac{1}{x} - b,$

we can build a series that converges quadratically to $b^{-1}$

$x_{k+1} = x_{k} - \dfrac{f(x_k)}{f'(x_{k})} = x_{k} - \dfrac{\dfrac{1}{x_{k}} - b}{-\dfrac{1}{x_{k}^2}} = x_{k}\left(2 - b\,x_{k}\right)$

Thus, we only need multiplications and subtractions to compute every element of the sequence. Once we have converged to the inverse, $x_{k} \simeq b^{-1}$, we only need to estimate the division as

$c \simeq = a\,x_{k}$

A quadratic convergence means that the amount of found figures should double from one iteration to the next one.

## A n-root algorithm

In a similar way, we can build a sequence converging to the square root of a given number, which is a root of the following function

$f(x) = x^2 - a \qquad \Rightarrow \qquad x_{k+1} = x_{k} - \dfrac{{x_{k}}^2 - a}{2\,x_{k}}$

Or, if we want to avoid computing the division in each iteration, we can build a sequence converging to the inverse of the square root, being such number a zero of the function

$f(x) = \dfrac{1}{x^2} - a \qquad \Rightarrow \qquad x_{k+1} = \dfrac{x_{k}}{2}\left(3 - a\,{x_k}^2\right).$

Voilà, we only need multiplications and additions/subtractions in each iteration, and the solution will converge quadratically to $\dfrac{1}{\sqrt{a}}$. We can even compute the desired value ${\sqrt{a}}$ at the end without having to invert the solution, only multiplying by $a$

$\sqrt{a} = a\,\dfrac{1}{\sqrt{a}} \simeq a\,x_k.$

This method can be generalized for computing ${\sqrt[n]{a}}$ (the two previous examples correspond to $n = 1$ and $n = 2$):

$f(x) = \dfrac{1}{x^n} - a \qquad \Rightarrow \qquad x_{k+1} = \dfrac{x_{k}}{n}\left(n + 1 - a\,{x_k}^n\right).$

Once we have the final solution $x_{k} \simeq \dfrac{1}{\sqrt[n]{a}}$, we can compute the n-root without inverting the solution, by means of

$\sqrt[n]{a} \simeq a\,x_k^{n-1}$

## Implementation details

The implementation in the project page adds the new features to the previous version:

• Inverse computation (file div.cc).
• Division algorithm using the inverse method (file div.cc). The long division algorithm was already implemented.
• Square and quartic roots (n-root not implemented) using the direct methods or the more efficient inverse ones (file sqrt.cc).

The graphic below shows a scalability comparison of the methods implemented in the library. Notice the scalability problems of Long Division and Long Multiplication algorithms (the latest takes 10,000 times longer than the FFT multiplication algorithm to compute 1 million figures). The Newton inverse algorithm and its related algorithms (inverse division, square root and quartic root), perform well enough for our purposes (much better than the non inverse versions). The non inverse square and quartic root algorithms are based on an inverse division algorithm (they would perform much worse if based on a long division algorithm). There are many possible optimizations, but they are beyond the scope of this post.

Comparison of the methods implemented in the library

Some other important issues we didn’t tackle yet:

• Initial guesses for the iterative methods: check the methods BigNumber::fromDouble(double, int) and toDouble(double&, int&) and their usage for computing an initial guess, i.e. for a multiplicative inverse.
• Concergence criterions: a typical stop criterion with a convergent sequence would be when $x_{k+1}$ and $x_{k}$ are very close each other, $x_{k+1} \simeq x_{k}$, and that means that all (or almost all) of their digits match up. Check the source code again for further detail.
• Rounding errors: using algorithms that converge to multiplicative inverses can yield rounding errors when working with finite arithmetic. These errors can be overcome, but no solution has been implemented in the code nor discussed in this post. As a clue for example for the division algorithm, once we get the approximate quotient $c_1 \simeq \dfrac{a}{b}$ using the inverse approach, we can compute a new quotient $c_2 \simeq \dfrac{r}{b}$ (where $r = a - b\,c_1$ is the first remainder), and estimate the final quotient as $c \simeq c_1 + c_2$. The second division can be performed with a more accurate long division algorithm (not taking too long provided a small remainder).

## Application example

I always wanted to write my own program for computing millions of decimals of Pi, and now we have all the ingredients to do it by using for instance a 4th order Borwein’s algorithm (disclaimer: this is just an application of this general purpose library, not a specially optimized way to compute Pi decimals). Let us start with the following values

$a_0 = 6 - 4\sqrt{2},\qquad y_0 = \sqrt{2} - 1,$

and then construct the sequences $\{y_{k}\}$ and $\{a_{k}\}$

$y_{k+1} = \dfrac{1-(1-y_k^4)^{1/4}}{1+(1-y_k^4)^{1/4}} \\ \\ a_{k+1} = a_k(1+y_{k+1})^4 - 2^{2k+3} y_{k+1} (1 + y_{k+1} + y_{k+1}^2),$

the sequence $\{a_{k}\}$ converges with quartic order to $\dfrac{1}{\pi}$. You can get this result in about 20 minutes with a normal computer (~2.4 GHz CPU) when trying to compute PI with 1,038,090 decimals (the last 7 decimals are wrong due to rounding errors, and the total amount of figures involved in computations is 220 = 1,048,576). Find the implementation in file pi.cc.

# Modeling and simulation of planetary motion

I believe that the maximum exponent of curiosity is trying to understand how our own world works; thus, as a curious minded person, I’ve always been very fond of physics and applied sciences.

One of my favourite problems in classical mechanics is the planetary motion, and one of my first attempts to solve it with computer algorithms dates from my early teens. My first approach was written in BASIC, and the results were completely wrong, as I didn’t know enough about the underlying mathematics. Subsequent attempts yielded good results once I knew some basics of differential calculus during my high-school years.

In this post I’ll rescue one of those implementations, written in C, which uses the laws of motion to compute astronomical body paths.

## Physics background

Let’s suppose that we have two bodies $B_1$ and $B_2$, at positions $\vec{p}_1$ and $\vec{p}_2$, and with masses $m_1$ and $m_2$, respectively. The Newton’s law of universal gravitation states that the force between the two bodies is proportional to the product of the two masses, and inversely proportional to the squared distance

$F = G\,\dfrac{m_1\,m_2}{r^2}.$

Where $r = \left|\vec{p}_1-\vec{p}_2\right|$. For instance, the force with which $B_1$ is attracted towards $B_2$, including magnitude and direction, and being $\hat{r}$ the unitary vector from $\vec{p}_1$ to $\vec{p}_2$, is

$\vec{F}_{12} = G\,\dfrac{m_1\,m_2}{\left|\vec{p}_1-\vec{p}_2\right|^2}\hat{r} = G\,\dfrac{m_1\,m_2}{\left|\vec{p}_1-\vec{p}_2\right|^3}(\vec{p}_2-\vec{p}_1)$

Solution example when two bodies interact.

If we also use the Newton’s second law of motion, which relates force and acceleration

$\vec{F} = m\,\vec{a},$

and knowing that the acceleration $\vec{a}(t)$ is the second time derivative of position $\vec{p}(t)$

$\vec{a}(t) = \dfrac{d^2}{dt^2}\vec{p}(t),$

we can express the problem with the following set of coupled non-linear differential equations:

$\begin{array}{c} \dfrac{d^2}{dt^2}\vec{p}_1(t) = G\,\dfrac{m_2}{\left|\vec{p}_1(t)-\vec{p}_2(t)\right|^3}(\vec{p}_2(t)-\vec{p}_1(t))\\ \dfrac{d^2}{dt^2}\vec{p}_2(t) = G\,\dfrac{m_1}{\left|\vec{p}_1(t)-\vec{p}_2(t)\right|^3}(\vec{p}_1(t)-\vec{p}_2(t)). \end{array}$

Same solution as before, but showing the relative motion. The result is an ellipse.

If we provide the initial conditions, i.e. the initial positions $\vec{p}_1(t_0)$ and $\vec{p}_2(t_0)$, and the initial velocities $\vec{v}_1(t_0)$ and $\vec{v}_2(t_0)$, we can obtain, from the above equations, the paths $\vec{p}_1(t)$ and $\vec{p}_2(t)$, regardless of the number of dimensions. Particularly, for a two-dimensional problem, where

$\begin{array}{c} \vec{p}_1(t) = (x_1(t), y_1(t))\\ \vec{p}_2(t) = (x_2(t), y_2(t)), \end{array}$

the set of equations becomes

$\begin{array}{c} \dfrac{d^2}{dt^2}x_1(t) = G\,\dfrac{m_2}{\bigl((x_1(t)-x_2(t))^2+(y_1(t)-y_2(t))^2\bigr)^\frac{3}{2}}\bigl(x_2(t)-x_1(t)\bigr)\\ \dfrac{d^2}{dt^2}y_1(t) = G\,\dfrac{m_2}{\bigl((x_1(t)-x_2(t))^2+(y_1(t)-y_2(t))^2\bigr)^\frac{3}{2}}\bigl(y_2(t)-y_1(t)\bigr)\\ \dfrac{d^2}{dt^2}x_2(t) = G\,\dfrac{m_1}{\bigl((x_1(t)-x_2(t))^2+(y_1(t)-y_2(t))^2\bigr)^\frac{3}{2}}\bigl(x_1(t)-x_2(t)\bigr)\\ \dfrac{d^2}{dt^2}y_2(t) = G\,\dfrac{m_1}{\bigl((x_1(t)-x_2(t))^2+(y_1(t)-y_2(t))^2\bigr)^\frac{3}{2}}\bigl(y_1(t)-y_2(t)\bigr), \end{array}$

where we have to find $x_1(t), y_1(t), x_2(t)$ and $y_2(t)$. The problem can be easily spread to N dimensions and M bodies.

## Designing an algorithm to solve the equations

If we discretize the time domain

$t_n = t_0 + n\,\Delta_t\,\qquad n \in \mathbb{Z}^+$

and we apply the following first order finite differences approximations of the differential operators (this is the easiest way, and it’s equivalent to Euler method):

$\begin{array}{c} \vec{v}(t) = \dfrac{d}{dt}\vec{p}(t) \simeq \dfrac{\vec{p}(t+\Delta_t)-\vec{p}(t)}{\Delta_t}\\ \vec{a}(t) = \dfrac{d}{dt}\vec{v}(t) \simeq \dfrac{\vec{v}(t+\Delta_t)-\vec{v}(t)}{\Delta_t} \end{array}$

We can compute both the velocity and position of a body $B_i$, given the force exerted over it, by finding $\vec{p}_i(t+\Delta_t)$ from the above equations

$\begin{array}{l} \vec{F}_i = \displaystyle\sum_{j \neq i}G\,\dfrac{m_i\,m_j}{\left|\vec{p}_i(t_n)-\vec{p}_j(t_n)\right|^3}\bigl(\vec{p}_j(t_n)-\vec{p}_i(t_n)\bigr)\\ \vec{a}_i = \dfrac{\vec{F}_i}{m_i}\\ \vec{v}_i(t_n+\Delta_t) = \vec{v}_i(t_n) + \Delta_t\,\vec{a}_i\\ \vec{p}_i(t_n+\Delta_t) = \vec{p}_i(t_n) + \Delta_t\,\vec{v}_i(t_n) \end{array}$

3D solution example.

As simple as that, we compute the forces for each body (the forces due to all the other bodies), hence we have the acceleration, and we integrate it twice to obtain both the velocity and the position.

## The code

In the link below you have an implementation in C for the two-dimensional case. The code is platform-dependent, as it uses some basic graphic routines. Originally this code was written for MS-DOS, using the protected mode and the mode 13h, compiling with the DJGPP programming platform under RHIDE. As this platform is from the past, you can either run it under an emulator like DOSBox, or compile it using the SDL wrapper that I’ve written for rescuing this and other codes (only tested under GNU/Linux). The wrapper is coded in files mode13h.h and mode13h.c (the only platform-dependent code), and to switch between a native MS-DOS platform and SDL you must use the DOS symbol (when present, it assumes a native MS-DOS platform, otherwise it will use SDL).

The main algorithm is coded in file newton.c, and, as I described before, it just computes the force exerted over each body, and then velocity and position.

Example of execution when we show the relative motion following one of the bodies. The result is an ellipse.

A body is modelled with the struct body_t, which considers the position and velocity. Before starting the algorithm, you must create the bodies with the createBody() function, providing initial position, initial velocity and mass. Also, you can specify whether or not the body is fixed at its position, to simulate fictitious situations, suitable to being validated against Keppler’s laws (you can visually check how the two first laws hold when only two bodies interact). Another way to do this with real cases is by using the bodyref variable: when this variable is not null, the camera will follow the selected body.

Other useful parameters are scale_factor, to fit the required space in the screen, and calculations_per_plot, that allows you to enable frame dropping, computing with enough accuracy without having to plot all the solutions.

The source code includes an example of an EarthMoon system, where you can check that, given realistic data for the initial values and masses, some other parameters like the orbital period match the known values.

In the video below you can watch an execution example of the three-body problem, under DOSBox emulator, where you can appreciate the chaotic behaviour.

# A fast circle algorithm for ZX Spectrum

As like many people of my generation, my first steps in computer programming were with a ZX Spectrum. Following a natural order, Sinclair BASIC was the first language I dealt with, but quite often the hardware limitations forced me to move to the Z80 assembler language. A common practice was coding some time critical routines in assembler to avoid many of the bottlenecks, like some graphic routines.

In this post, I’ll explain, as an example, how to outperform the built-in circle drawing routine.

## Algorithm explanation

The original built-in algorithm (command CIRCLE in BASIC) drew the circumference with an angular sweep, from 0 to 360 degrees, and although it was implemented in machine code, it looked quite slow.

It’s possible to make a more efficient implementation without using trigonometric functions nor square roots, and even without using multiplications. The following algorithm is equivalent to the now known mid-point algorithm (but with different error formulation), and it’s based on the implicit equation of the circumference (let’s assume the circumference is centered at (0,0))

$x^2 + y^2 - r^2 = 0$

To draw the contour, we establish $(x,y) = (r,0)$ as the starting point, and draw the first octant, with angles between 0 and 45 degrees. As below 45 degrees the y component goes faster than the x one, we’ll increment the y coordinate inside a loop, and correct the path decreasing the x component whenever we consider we are “far away” from the contour. To evaluate how far we are, let’s compute the squared error

$e(x,y) = x^2 + y^2 - r^2$

Let $\vec{p}_{k} = (x_{k},y_{k})$ be the coordinates of a pixel at a given step k; at each step we have to choose whether to move upwards or towards the upper-left pixel:

$\begin{array}{l} \vec{p}_{1,k+1} = \vec{p}_{1,k} + (0,1)\\ \vec{p}_{2,k+1} = \vec{p}_{2,k} + (-1,1) \end{array}$

Our decision will be based on a minimum squared error criterion; for the two options above we have the following expressions for the error (in our discrete space, we measure at the middle of the pixels, so the mathematical center is at the middle of the pixel (0,0))

$\begin{array}{ll} e_{1,k+1} = & e(\vec{p}_{1,k+1}) = x_{k}^2 + (y_{k}+1)^2 - r^2 = e_{k} + 1 + 2\,y_{k}\\ \\ e_{2,k+1} = & e(\vec{p}_{2,k+1}) = (x_{k}-1)^2 + (y_{k}+1)^2 - r^2\\ &= e_{1,k+1} + 1 - 2\,x_{k} = e_{k} + 1 + 2\,y_{k} + 1 - 2\,x_{k} \end{array}$

Notice that we can know the error at a given step if we know the error previous to it, with only a few additions. Whenever $|e_{1,k+1}| < |e_{2,k+1}|$, we choose $\vec{p}_{1,k+1}$ as the next pixel, otherwise we take $\vec{p}_{2,k+1}$. We can simplify the comparison in the following way, knowing that $e_{2,k+1}$ is always negative and $x_{k}$ and $y_{k}$ are integers

$\begin{array}{l} |e_{1,k+1}| < |e_{2,k+1}| \,\,\,\, \Rightarrow \,\,\,\, e_{1,k+1} < -e_{2,k+1} \,\,\,\, \Rightarrow \,\,\,\, e_{1,k+1} < -e_{1,k+1} - 1 + 2\,x_{k} \\ \,\,\,\, \Rightarrow \,\,\,\, 2\,e_{1,k+1} < 2\,x_{k} - 1 \,\,\,\, \Rightarrow \,\,\,\, e_{1,k+1} < x_{k} - \frac{1}{2} \,\,\,\, \Rightarrow \,\,\,\, e_{1,k+1} < x_{k} \end{array}$

The initial value for the error is $e_{0} = 0$, as the first pixel $\vec{p}_{0} = (r,0)$ satisfies the circle equation. Observe the error evolution in the figure below; the corrections produced by the x component decrements tend to keep the error around 0.

Error made in the angular sweep from 0 to 45º. While incrementing the y component, we correct the x component whenever the error becomes too high, according to a minimum squared error criterion. Each x correction results in an error reduction.

The algorithm can be coded in C as follows:

int x = r;
int y = 0;
int e = 0;

for (;;) {

drawPixel(xc + x, yc + y);

if (x <= y) {
break;
}

e += 2*y + 1;
y++;

if (e > x) {
e += 1 - 2*x;
x--;
}
}

Once we know how to draw the first octant, the rest of the circle can be drawn applying symmetry

Instead of drawing one single pixel, we draw 8 of them:

drawPixel(xc + x, yc + y);
drawPixel(xc + x, yc - y);
drawPixel(xc - x, yc - y);
drawPixel(xc - x, yc + y);
drawPixel(xc + y, yc + x);
drawPixel(xc + y, yc - x);
drawPixel(xc - y, yc - x);
drawPixel(xc - y, yc + x);

## The code

The source code is a .asm text file that you can compile with a z80 assembler like pasmo (you can use other assemblers, but some of them ignore the ORG directives, so be careful with the relocations). With pasmo you can generate directly a .tap file ready to be loaded in an spectrum emulator.

The code contains two main functions: one for drawing pixels and another one for drawing circles. Also, the file includes an execution example placed at the address 53000, that draws a set of concentric circles growing in size.

For drawing pixels, two lookup tables are used: tabpow2, with powers of 2, and tablinidx, with the order of the 192 screen lines (remember that the ZX spectrum used an interlaced access).

You can invoke the routine by placing the point coordinates at the addresses 50998 and 50999, and jumping to the address 51000

POKE 50998, 128
POKE 50999, 88
RANDOMIZE USR 51000

To invoke the circle routine, you must place the center coordinates at 51997 and 51998, and the radius at 51999, and then jump to the address 52000.

POKE 51997, 128
POKE 51998, 88
POKE 51999, 80
RANDOMIZE USR 52000

In the video below you can see the execution of the following code under Spectemu emulator, comparing the performance of the two algorithms.

10 FOR i=1 TO 20
20 CIRCLE 128, 80, i
30 NEXT i
40 RANDOMIZE USR 53000

Source code (includes a .tap tape file generated with pasmo v0.5.3, and a .z80 snapshot file generated with spectemu v0.94/c).
Project allocated at bitbucket.org

# FFT-based multiplication algorithm

We have presented in a previous post a library for operating with arbitrarily big numbers, where we had implemented some basic arithmetic operations, like addition, multiplication and division.

In this post we’ll focus on improving the multiplication algorithm efficiency through Fast Fourier Transforms.

## Algorithm explanation

Let’s take two random numbers, say

$\begin{array}{c} a = 143672,\\ b = 669381 \end{array}$

The multiplication result is

$a.b = 96171307032$

If we see each number as a digit sequence (digits between 0 and 9), we can think of each of those digits as the coefficients of a polynomial. The multiplication of two polynomials and the ordinary multiplication are closely related

$\begin{array}{c} p_a(x) = x^5+4\,x^4+3\,x^3+6\,x^2+7\,x+2\\ p_b(x) = 6\,x^5+6\,x^4+9\,x^3+3\,x^2+8\,x+1 \end{array}$

$\begin{array}{c} p_a(x) . p_b(x) = 6\,x^{10} + 30\,x^9 + 91\,x^8 + 53\,x^7 + 125\,x^6 + 150\,x^5 + \\ 121\,x^4 + 90\,x^3 + 68\,x^2 + 23\,x + 2 \end{array}$

Thus, we obtain the sequence [6 30 91 53 125 150 121 90 68 23 2]. As some of these digits are greater than 9, we need to apply a carry propagation correction, obtaining the sequence [9 6 1 7 1 3 0 7 0 3 2], which is the result we were looking for. The carry correction algorithm has an order of n time complexity.

It’s well known that a polynomial multiplication algorithm is equivalent to a discrete convolution over the coefficients, so we conceive now our digit sequences as discrete signals.

The discrete convolution can be computed as

$(a * b)[n] = \displaystyle{\sum_{m=-\infty}^{\infty}{a[m]\,b[n-m]}}.$

The classical algorithm for the convolution has an order of n2 time complexity, so there is no point at the moment. To reduce this complexity, we’ll use the convolution theorem, which tells us that the Fourier transform of a convolution is the pointwise product of the Fourier transforms.

$\mathcal{F}\{a * b\} = \mathcal{F}\{a\}\,.\,\mathcal{F}\{b\}.$

That is, we can compute the DFTs of the original signals, multiply them pointwise, compute the inverse DFT and apply the decimal correction. The key point is that we can use efficient FFT algorithms to compute both the DFT and the inverse DFT.

From a O(n2) order algorithm, we have moved to the computation of 2 FFTs, with order O(n.log2(n)) each one, a pointwise product, with O(n), an inverse FFT, with O(n.log2(n)), and a carry propagation correction, with O(n), so putting all together, our final algorithm will have an order of O(n.log2(n)), much better than the original algorithm.

As a final optimization, it is possible to avoid computing one of the two FFTs if we take into account that the original signals are both real, so if we build a complex signal which carries one of the signals in its real part and the other signal in the imaginary part

$c[n] = a[n] + i\,b[n],$

we can extract the original signals applying the real part and imaginary part operators

$\begin{array}{c} a[n] = \Re\{y[n]\} = \dfrac{c[n] + c^*[n]}{2}\ \\ b[n] = \Im\{y[n]\} = \dfrac{c[n] - c^*[n]}{2\,i} \end{array}.$

Using the following property of the Fourier transform

$a^*[n] \longrightarrow \mathcal{F}\{a\}^*[-n],$

we can compute now the DFT of the composite signal and extract the DFTs of the original signals

$\begin{array}{c} \mathcal{F}\{a\}[n] = \dfrac{\mathcal{F}\{c\}[n] + \mathcal{F}\{c\}^*[-n]}{2}\ \\ \mathcal{F}\{b\}[n] = \dfrac{\mathcal{F}\{c\}[n] - \mathcal{F}\{c\}^*[-n]}{2\,i} \end{array},$

where the new operations have all order O(n).

## The code

Starting from the original code, some functions have been added: FFT and inverse FFT computation functions, in the files fft.h and fft.c. Also, the memory allocation is now dynamic in order to deal with really huge numbers; furthermore, the show() function has been improved in such a way that it only prints the most and less significant digits with very big numbers, while the whole computation is dumped to the file ‘output.txt’.

To test the new multiplication algorithm, the program evaluates the largest known prime number, the Mersenne prime 243,112,609-1, with 12,978,189 digits. You no longer need to modify N_DIGITS (now BigNumber::N_DIGITS), it will automatically adapt to the selected exponent. In this case, we need 224 figures (must be power of 2), that is, 16,777,216, to compute a number with 13 million digits.

The algorithm used to obtain the Mersenne number is detailed in the source code. You will need about 20 minutes and 1,3 GB of RAM memory to run the program with the current configuration in a normal computer (2 GHz CPU). If that’s too much for you, you can compute a cheaper Mersenne prime; e.g., you can compute the number 23,021,377-1, with 909,526 digits, in about 35 seconds.

Take a look to that numbers to get an idea of what we are talking about. Here you have the execution output for both of them in raw text formatted to 75 columns, do you imagine how hard must be to proof that any of them is indeed a prime number?

It’s possible to do some improvements with parallelization and a more dynamic memory allocation, but that’s beyond the range of this post. In future posts, we’ll present some algorithms to perform more complicated mathematical operations, like square roots.

Source code
Library allocated at bitbubket.org (the code explained here is tagged as “1.0.1”, the posterior code can contain improvements explained in future posts)

# Performing fixed-point arithmetic operations with arbitrary precission

I decided to write this little library in 2000, after watching a rerun of “The Simpsons” episode “Treehouse of Horror VI”.

Homer Simpson in a three-dimensional world.

In the last story of this episode, Homer Simpson jumps into a three-dimensional world. The full story, and specially this parallel world, are full of mathematical tributes: geometrical shapes, the Euler’s identity, a reference to the P versus NP problem, and so on. Among these mathematical references, a disturbing numerical formula attracted our attention:  178212 + 184112 = 192212

The previous affirmation is obviously a joke, since it must be false according to Fermat’s Last Theorem. But, trying to evaluate both sides of the equation with a normal scientific calculator, one can fail to believe in its truthfulness. Try for example using Google as a calculator:

The problem here is that a normal calculator doesn’t handle the enough amount of significant figures. As long as both sides of the equality match up in the 9 first of their 40 digits, it shows apparently the same result. Here is the right answer:

 178212 + 184112 = 2541210258614589176288669958142428526657 192212 = 2541210259314801410819278649643651567616

The main trouble stems obviously from a spatial limitation of the display, unable to represent all the computed figures in scientific notation. But even if we had a larger screen, the above formulas couldn’t be properly evaluated due to the internal floating point storage precision.

Even though there are numerous tools able to perform arithmetic operations with arbitrary big integers (like the phyton language, or bc, into the GNU project), I found interesting to write a lightweight and generic library to deal with this and other similar problems.

In this post I will present the library code and its basic structure, and in future publications we will try to improve the algorithms efficiency and we will show some practical examples.

## Library specification

A natural binary-coded decimal (NBCD) system has been chosen, with sign and module agreement, so an arbitrarily big integer can be coded with a long enough BCD digit sequence.

In order to handle fractional numbers, the library uses fixed-point arithmetic, therefore the format (the BCD sequence) is divided into integer and fractional parts. By this way, fractional numbers can be approximated with arbitrary precision by simply allocating enough memory in the fractional part. This scheme can deal with arbitrary-sized numbers, but in a static way, i.e., having previously allocated the needed memory.

Multiplication with fixed-point arithmetic.

Once the format has been chosen, let us describe some simple arithmetic operations:

• Addition and subtraction: These operations are made by a point-wise addition or subtraction of both BCD sequences, applying a carry propagation from each digit to the next one.
• Multiplication: The operation is performed by a typical long multiplication algorithm.
• Division: The algorithm used is the long division algorithm.

## The code

The library is coded in C++, but almost none of the language features are used in this limited version.

A number is coded in BigNumber class, which consists basically in a byte array and a sign flag. For simplicity and clarity, only one decimal digit is coded in each byte, instead of two.

The file config.h is a configuration point, where we can set the number of digits assigned to the integer and fractional parts in the format.

The file test.cc contains a main function with an example application: it evaluates both sides of the former equation 178212 + 184112 = 192212, and verifies its falseness. At least 40 digits are needed to carry out the operation, so the default configuration, which reserves 50 digits for both the integer and fractional parts, is enough for our purposes.

The rest of the files implement several operations.

• mul.cc: multiplication.
• div.cc: division.
• convert.cc: conversion between bignumbers and floating point numbers.

Source code
Library allocated at bitbubket.org (the code explained here is tagged as “1.0.0”, the posterior code can contain improvements explained in future posts)

Legal Notice: The Simpsons TM and (C) Fox and its related companies. All rights reserved. Any reproduction, duplication, or distribution in any form is expressly prohibited.

Disclaimer: This web site, its operators, and any content contained on this site relating to The Simpsons are not specifically authorized by Fox.

# Sudoku solver

In this post I will propose a way to solve a sudoku puzzle with a computer program, trying to find all the puzzle solutions in an efficient way using an algorithm based on uncertainty reduction.

The basic idea and the source code date from 2005, when the puzzle came into fashion.

## The algorithm basis

A sudoku puzzle.

A sudoku puzzle consists of 9×9 squares accepting digits between 1 and 9. A digit in a square cannot be repeated in its row, column, or 3×3 grid. Therefore, each square belongs to 3 uniqueness constraint groups (a row, a column and a 3×3 grid), having a total amount of 3×9 restriction groups, 9 of each type.

Each square has a state: a set of digits that can be fitted without violating its 3 uniqueness constraints, and therefore the total table state will be the set of all the 81 individual states. Without any information (an empty sudoku), the initial state is the one with the maximum uncertainty: all the digits are possible in each square.

The state of a square gives us an uncertainty metric: the number of possible digits. By summing up the uncertainty of all the squares, we can obtain a global uncertainty metric. Our strategy will be to reduce this uncertainty until we reach the minimum state: the solution.

We have two ways to perform this reduction:

1. Using deductions: given a state, we move to other state with lower uncertainty. The new state will be valid whenever so is the original one. We say a state is valid when it contains the solution, which is actually not easy to determine. Fortunately, we can at least identify those invalid states that do not respect the uniqueness constraints; those, for sure, will not contain the desired solution. Fair enough.
2. Using hypothesis: given a state where no deductions can be made, we make an assumption which takes us to another state with lower uncertainty. We have no guarantee about the validity of our hypothesis, and hence about the validity of our new states, but a wrong decision will result sooner or later in an invalid state, so we will have the chance to rectify.

Using only hypothesis involves a typical brute-force algorithm. But in this approach, the use of deductions will allow us to drastically reduce the space into which the hypothesis are made: the less the uncertainty is, the less assumptions we’ll need. The key point is that every time we reduce the uncertainty of a square, we can trigger recursively new deductions in their related squares (the ones belonging to its restriction groups), resulting in a notable global uncertainty reduction.

The algorithm then will consist of applying deduction rules whenever it is possible, and hypothesis otherwise, until we reach the solution. If we reach an invalid state, produced by a hypothesis, we rule out that hypothesis and make a new one. The algorithm flow is as follows:

Algorithm flow.

1. We start from the maximum uncertainty state: a void table where all the digits (1..9) are possible in all the squares.
2. Use the initial table information to recursively trigger the initial deductions. If the puzzle is solved, we exit. Many cases can be solved using only this technique.
3. If the puzzle is not solved, we make an assumption over the next square with uncertainty (the next within a given sequence). This will yield new deductions recursively, with three possible results:
• If the puzzle is solved, we exit.
• If we arrive at an inconsistent state, the last hypothesis was incorrect and we try the next possible one over the same square (or, if we cannot make more assumptions over that square, we move to the next one in our sequence).
• Otherwise (the puzzle is not solved yet), we jump to the point 3.

But, how do we exactly make hypothesis and deductions?. Well, making a hypothesis is not a big deal. It could be, say, to randomly pick a digit among the state of a given square (set of possible digits), and suppose it is the correct choice. These are the kind of assumptions we will consider in this program, but we could have chosen other options, like, for instance, assuming that one or more of the possible digits are not possible in the given square. This approach, not explored in this post, could yield a better performance, as long as it can minimize the errors by choosing more likely options.

Well then, and what about the deductions?. Let’s start with the easy case: if we know that a square contains a given digit, that digit cannot appear in the same row, column or square 3×3 grid. That digit can therefore be removed from the state of all the related squares.

Uncertainty reduction in the 3×3 subgrid at the middle. As far as we know the initial value of the above four squares, their digits cannot appear in the other squares of the restriction group.

Let’s say it in a different way: if a given restriction group has a square whose state contains one single digit (no uncertainty at all), that digit necessarily has to belong exclusively to that square, and the uncertainty of the other squares can be reduced.

What about if we have in a given restriction group two squares that accept the same pair of digits and nothing else?, don’t panic: those two digits must be distributed only between those two squares, no other square in the group can accept any of them, and hence their uncertainty can be reduced by removing them from their states.

The four numbers 2, 3, 5 and 6 can only be placed at four squares, so those four squares cannot accept any other numbers. Or, alternatively: as long as there are three squares that accepts only elements from the set of three digits { 4, 7, 9}, no other square can accept any of them.

Can we generalize this rule? of course we can. By simple induction, if we want to distribute a set of N digits among the squares of a restriction group, and there are exactly N squares that accepts only digits from that set, those N digits necessarily have to be distributed among those N squares, and the other 9 – N squares can’t receive any of them.

Let’s say it again, even more clear this time: given a restriction group C, i.e., a set of 9 squares, let A be the state of one of them (the set of possible digits of that square) and let B be the set of squares of C with their state contained in A. If the cardinal of A is the same as the one of B, then the squares of C which aren’t contained in B (C \ B) can’t contain in their states any element of A. Thus, those elements can be removed from their states, and, consequently, the uncertainty is reduced.

That is, this is the basic rule, we don’t need anything else, so let’s start programming.

## The code

The program is written in C and has been tested under GNU Linux. More details about its usage and table format can be found in the README file.

The source code is quite clear, so I’ll just give some guidelines to help its understanding:

• About the data model, the structures square_t and square_group_t respectively model a square and a restriction group. No more complex data structures are needed.
• Regarding that the square state is coded as a bit field (an unsigned short where the least significant 9 bits represent the possible digits), the following bitwise operations can be done:
• An uncertainty reduction operation consists in removing a bit subset from a bit set: a &= ~b (the set b is removed from a).
• A set inclusion can be detected checking the equality:  if ((a | b) == b) … (is a contained in b?)
• The number of considered digits (the number of set bits) can be computed with the built-in function __builtin_popcount.
• The digit coded by a set of bits with only one set bit can be obtained with the built-in function __builtin_ffs, wich returns the first set bit index.
• The method uncertainty_reduction performs a deduction rule over a square (the recursion enables to trigger other reductions in the related squares), and the method make_assumption makes a hypothesis.