Restoring Integer Square Root Explained (in binary)
In case you’ve never heard of it, the restoring integer square root algorithm allows one to calculate the integer portion of the square root of a given integer, i.e. it’s equivalent to the floor of the regular square root. Much like restoring division, it works by analyzing pieces of the input number at a time, subtracting a given value from each, and acting upon the result, restoring the previous value if the subtraction would result in a negative number.
But how and why exactly does it work? This is the question I’ve sought out to answer, and, although I was able to derive an explanation that seems sensible, if incomplete, you might want to check out other people’s perspective on the matter.
Description
For the following description and later sections, we will consider a number and , and consider so that has digits (we may add a leading zero if needed), will then be the most significant bit in the binary representation of , and the least significant bit (LSB).
Given a number , the algorithm outputs and satisfying , such that is maximized. First, we take the binary representation of and split it up into pairs of bits. We start with the most significant pair and subtract 1 from it. If the result is , then we know the MSB in the binary representation of is 1. Otherwise (if the result is negative) then the result is “restored” to the previous value, and the MSB of will be zero.
Then, we take that result, concatenate it with the next pair and subtract from it (the overline denotes concatenation). The result will determine , in the same way as before. On the next step, we would group the current result with the next pair and subtract , determining . This procedure is iterated until running out of pairs. The remainder of the last subtraction is .
An idea for a solution
An interesting fact: all perfect squares can be written as a sum of consecutive odd numbers, starting at 1.
That allows us to derive a simple but inefficient approach to calculate the integer square root: subtract consecutive odd numbers from , keeping track of the result and incrementing until the result is less than the next odd number. The remainder of the last subtraction will be .
It turns out the way the restoring algorithm works is very similar, it just subtracts a set of consecutive odd numbers at once rather than one at a time, while narrowing down on the result at each iteration.
Working it out
We should note that if all the previous bits of are zero, then we will be subtracting a power of 4 on the current iteration. Since , a power of 4 can always be written as a sum of consecutive odd numbers, starting from 1.
A peculiar sequence of sets of consecutive odd numbers is formed if you execute the algorithm on the value 1, extended to binary digits. For instance, for , we would first subtract , then , and, finally, 1.
Note also that if any of the previous bits of are not zero, then we can reason that some of the past subtractions “succeeded” (resulted in a positive number). That means we were able to subtract a set of consecutive odd numbers, starting at 1, from the initial value. Anything we subtract from now on should be “offset” to account for that, i.e., we should now subtract a set of consecutive odd numbers that starts at the smallest odd number not in any of the previous sets. We can show that including the bits of in the terms being subtracted does exactly that.
To that end, let’s consider that we are at the th iteration (where the first iteration would be 0 and the last ). The number being subtracted at that iteration, , is given by:
The first term corresponds to the constant “1”, while the second corresponds to the previous bits of that are added to the term being subtracted. To make more sense of it, see the following illustration, showing the binary representation of :
Meanwhile, we know that must be equal to the sum of a certain set of consecutive odd numbers. Rather than focusing on the numbers themselves, let’s denote by the set of consecutive odd numbers starting at the th odd number with elements. Thus:
can be derived by realizing that if all the previous bits of are zero, then we have to start at the first odd number (1), and that the amount of odd numbers subtracted (initially ) is halved at each iteration.
Now, we can show that , the number subtracted at iteration , is equal to the sum of the elements of . For that, let’s start with :
But why this particular arrangement? Why is a power of 2? While it would be possible to adapt this algorithm to use other bases, sticking to base 2 gives us simpler rules and turns out to be very useful, as it allows for a fast implementation on modern computers and other digital electronic systems. However, the fact that the size of is an exponential function of is fundamental to guarantee this algorithm’s performance.
If we consider the input size of the algorithm to be the number of digits of (), then, as already mentioned, will have digits, at most, since . There are possible values for , and each one must be reachable from the starting point of the algorithm. If a linear function of were used for the size of , the computational complexity would inevitably devolve to , which is almost as bad as the naïve repeated subtraction method, since a sum of linear functions of is . However, using an exponential function for the size brings the computational complexity down to , or 1.
Fun fact
Since each bit in the binary representation of is set only once, this algorithm can be implemented using combinational logic circuits (in fact, this is how I originally found out about it in the first place).
The following diagram depicts a binary decision tree representing all possible choices the algorithm could take for , where each level corresponds to an iteration. A dashed edge means that the subtraction resulted in a negative number and the previous value had to be restored, while a solid edge means that the subtraction resulted in a number . Each node shows the value subtracted at that iteration, as a sum of consecutive odd numbers. While only four leaf nodes are drawn, each leaf node could have a dashed and a solid edge out of it, each destination node would then represent a unique value for .
Addendum
It is also possible to show, given all and for a given run of the algorithm, that:
Though that will be left as an exercise to the reader.
Only valid if we assume that a subtraction can be performed in constant time, i.e. , which is not true in general.↩︎