In a previous post, we introduced a library for performing arithmetic operations with arbitrary precision, and in this update we optimized the multiplication algorithm using the Fast Fourier Transform factorization.
In this post, using the FFT-based multiplication algorithm as the cornerstone of our library, we will optimize the division algorithm and introduce some other methods for computing square and quartic roots. In a previous example, we succeeded to compute the largest known prime number working with integer arithmetic; with the new ingredients, we will be able to compute some other interesting things, such as millions of decimals of Pi or other irrational numbers using convergent series.
A new division algorithm
We have already optimized the multiplication method, moving from a time complexity algorithm to a one. But the division is still computed with a long division algorithm, with a time complexity of . We will try to find a faster algorithm using the Newton Raphson method to converge to the solution.
The problem is: having the numbers and , we want to compute the quotient . The following function has a zero at the desired quotient:
Applying the Newton-Raphson method to it, starting with an initial guess , we can compute a sequence that converges quadratically to in the following way
Turns out that we didn’t make any progress yet, as long as we still need to compute a division with the same divisor . But, conceiving the division as a multiplication by the inverse , and being that inverse a zero of the following trivial function:
we can build a series that converges quadratically to
Thus, we only need multiplications and subtractions to compute every element of the sequence. Once we have converged to the inverse, , we only need to estimate the division as
A quadratic convergence means that the amount of found figures should double from one iteration to the next one.
A n-root algorithm
In a similar way, we can build a sequence converging to the square root of a given number, which is a root of the following function
Or, if we want to avoid computing the division in each iteration, we can build a sequence converging to the inverse of the square root, being such number a zero of the function
Voilà, we only need multiplications and additions/subtractions in each iteration, and the solution will converge quadratically to . We can even compute the desired value at the end without having to invert the solution, only multiplying by
This method can be generalized for computing (the two previous examples correspond to and ):
Once we have the final solution , we can compute the n-root without inverting the solution, by means of
The implementation in the project page adds the new features to the previous version:
- Inverse computation (file div.cc).
- Division algorithm using the inverse method (file div.cc). The long division algorithm was already implemented.
- Square and quartic roots (n-root not implemented) using the direct methods or the more efficient inverse ones (file sqrt.cc).
You will find more information about how to switch the algorithms in the README file.
The graphic below shows a scalability comparison of the methods implemented in the library. Notice the scalability problems of Long Division and Long Multiplication algorithms (the latest takes 10,000 times longer than the FFT multiplication algorithm to compute 1 million figures). The Newton inverse algorithm and its related algorithms (inverse division, square root and quartic root), perform well enough for our purposes (much better than the non inverse versions). The non inverse square and quartic root algorithms are based on an inverse division algorithm (they would perform much worse if based on a long division algorithm). There are many possible optimizations, but they are beyond the scope of this post.
Some other important issues we didn’t tackle yet:
- Initial guesses for the iterative methods: check the methods BigNumber::fromDouble(double, int) and toDouble(double&, int&) and their usage for computing an initial guess, i.e. for a multiplicative inverse.
- Concergence criterions: a typical stop criterion with a convergent sequence would be when and are very close each other, , and that means that all (or almost all) of their digits match up. Check the source code again for further detail.
- Rounding errors: using algorithms that converge to multiplicative inverses can yield rounding errors when working with finite arithmetic. These errors can be overcome, but no solution has been implemented in the code nor discussed in this post. As a clue for example for the division algorithm, once we get the approximate quotient using the inverse approach, we can compute a new quotient (where is the first remainder), and estimate the final quotient as . The second division can be performed with a more accurate long division algorithm (not taking too long provided a small remainder).
I always wanted to write my own program for computing millions of decimals of Pi, and now we have all the ingredients to do it by using for instance a 4th order Borwein’s algorithm (disclaimer: this is just an application of this general purpose library, not a specially optimized way to compute Pi decimals). Let us start with the following values
and then construct the sequences and
the sequence converges with quartic order to . You can get this result in about 20 minutes with a normal computer (~2.4 GHz CPU) when trying to compute PI with 1,038,090 decimals (the last 7 decimals are wrong due to rounding errors, and the total amount of figures involved in computations is 220 = 1,048,576). Find the implementation in file pi.cc.