Tuesday, October 11, 2016

Self-Adjusting Heaps

I am a big fan of self-adjusting data structures. The whole idea behind self-adjusting data structures is that one can use a simple heuristic to adjust the data structure (perform rotations in a tree for example) and get superb bounds on the running time (amortized over a sequence of operations on the data structure). These data structures can be difficult to analyze but it is rather satisfying that simple ideas can lead to practical data structures. 

Pairing heaps are very easy to understand and implement and they perform really well in practice. If you are going to use Dijkstra's algorithm in production code, then you better use pairing heaps to implement it.

We are going to implement a min-heap. Here are the operations that will be used to interface with the data structure:

  • Top: Top element (or minimum).
  • Push: Insert new element.
  • Pop: Remove top element.
  • Join: Merge two heaps together.


We are going to use the left child next sibling tree representation. Each node in the tree stores a key, a pointer to its leftmost child and a pointer to its next sibling (right sibling). An empty heap is just a null pointer. With this representation, adding a new child is easy and fast.

Here is the code for what we have covered so far:

Getting the Top Element

With the above representation, getting the minimum element is easy to implement. We just return the key of the root node.

Merging Heaps Together

This operation is not so hard to implement either and we are going to use it again to implement the rest of the pairing heap interface. Let us imagine that we have two non-empty heaps $A$ and $B$. If the key of the root node of $A$ is smaller than the key of the root node of $B$ we simply add the root node of $B$ as a child of $A$ and vice versa. Note that such an operation moves the entire heap and not only the root node (we are using pointers in the representation).

Inserting a New Element

Again, this is really easy to implement. We can create a new node for the element to be inserted and merge this node with the existing heap by using the merge routine.

Removing the Top Element

So far, all of the operations were easy to implement and they are fast as well; each operations takes $\cal O(1)$ time to execute. Now removing the top element requires some re-adjusting. This is where we use a heuristic to re-adjust the tree. The heuristic we are going to use is called two-pass merge. It is actually convenient to have some figures to explain how this works. So let us imagine that our heap (or tree really) looks like the figure below.

Pairing heap before pop

The two-pass merge heuristic re-adjusts the tree by first removing the root node and then merging the children in pairs. This is a two-pass merge because we have to follow the next pointers in a first pass and then merge on the way back.

Pairing heap after pop

Recursion comes in handy here, we can implement the above idea in a single function.

This operation takes logarithmic time amortized over a sequence of operation.

Wrap Up

We have been working with nodes but it would be nice to have a type for pairing heaps themselves. This is not hard to accomplish. We add a new type for pairing heaps with its own interface. This is a clean way of doing it and in a real world implementation we can hide the node-based implementation and expose the pairing heap interface only.

This is all for this post. The entire code listing can be found here. I hope that it was a good read and as always comments and questions are welcome.

Saturday, September 17, 2016

Matrix Representation of Quaternion Rotation

Quaternions are an amazing tool for performing rotations in 3-space. Every game engine worth of the name has a module for computing with quaternions. Internally, rotations are still performed using matrices but instead of composing rotations by multiplying matrices together, quaternions are multiplied. The final resulting quaternion is then converted into a rotation matrix which is fed to the game engine for further processing. This final step is very important yet many game programmers tend to use the conversion routines as a black box, which is okay if you are not interested in the mathematics; but understanding the mathematics can make a difference.

I am going to assume that you are familiar with quaternion multiplication and that you know the quaternion rotation formula. Here is the formula for reference:


Here $q=\exp(\frac{\theta}{2}n)=\cos(\frac{\theta}{2})+ \sin(\frac{\theta}{2})n_xi + \sin(\frac{\theta}{2})n_yj + \sin(\frac{\theta}{2})n_zk$, where $\theta$ is the rotation angle in radians; $n$ is the unit rotation vector, $n=n_xi + n_yj + n_zk$; and $v$ is the vector we want to rotate. $q^*$ is notation for the conjugate of $q$: $q^*=\cos(\frac{\theta}{2})- \sin(\frac{\theta}{2})n_xi - \sin(\frac{\theta}{2})n_yj - \sin(\frac{\theta}{2})n_zk$.

From Quaternion to Matrix

The formula for rotation involves quaternion multiplication and we are going to use that to represent the multiplication in the matrix world. So let us say that we are given two quaternions $p=a+bi+cj+dk$ and $q=e+fi+gj +hk$. Note that we can think of a quaternion as being a 4 dimensional column vector: $p$ can be represented as the column vector $\left( \begin{matrix} a & b & c & d \end{matrix} \right)^T$. Let us begin by multiplying the quaternions $p$ and $q$.

$$pq=(a+bi+cj+dk)(e+fi+gj +hk).$$

Recall that the rules for quaternion multiplication are: $i^2=j^2=k^2=ijk=-1$.

Now multiplying and collecting terms gives:

$$\begin{align} pq= & (ae-bf-cg-dh)+ \\ & (be+af-dg+ch)i+ \\ & (ce+df+ag-bh)j+ \\ & (de-cf+bg+ah)k  \end{align}$$

I laid out the product above in that way for a reason. If you know some linear algebra then you can tell how to represent the product above as a matrix-vector product just by looking at it. The first column has $e$'s in it, the second column has $f$'s in it, etc.. The product is crying out to be written in matrix form.

$$\left( \begin{matrix} a & -b & -c & -d \\ b & a & -d & c \\ c & d & a & -b \\ d & -c & b & a \end{matrix} \right) \left( \begin{matrix} e \\ f \\ g \\ h \end{matrix}\right)$$.

That's how you represent a product of two quaternions as a matrix-vector product. In the above matrix the quaternion $q$ is being used as a vector; can we use $p$ as a vector instead ? Yes, we just rearrange the terms.

$$\begin{align} pq= & (ae-bf-cg-dh)+ \\ & (af+be+ch-dg)i+ \\ & (ag-bh+ce+df)j+ \\ & (ah+bg-cf+de)k  \end{align}$$

This gives the following matrix-vector product.

$$\left( \begin{matrix} e & -f & -g & -h \\ f & e & h & -g \\ g & -h & e & f \\ h & g & -f & e \end{matrix} \right) \left( \begin{matrix} a \\ b \\ c \\ d \end{matrix}\right)$$.

That was not very hard. Quaternion multiplication is not commutative so the matrices above correspond to multiplication on the left and on the right. Let us now turn to the problem of expressing quaternion rotations as multiplication by matrices.

From Rotation Quaternion to Rotation Matrix

Now that we know how to represent quaternion multiplication as a matrix-vector product, we are going to use that knowledge to derive a formula to represent quaternion rotation as multiplication by a matrix. There seems to be a lot of mystery surrounding this on the internet; people who write about this give the formula for the reader to use as a black box and they don't worry about giving details about where it came from.

Look at the formula for quaternion rotation. We first multiply $v$ by the quaternion $q$ on the left and then we multiply the result by the quaternion $q^*$ on the right (or you can think of it as multiplying $v$ by $q^*$ on the right and then multiplying the result by $q$ on the left). Now call the matrix corresponding to multiplication by $q$ on the left $A$ and call the matrix corresponding to multiplication by $q^*$ on the right $B$. We first represent the product $qv$ as $Av^T$. Now we have to include $q^*$ in the product but we are multiplying on the right so the matrix product gives $BAv^T$ (here $v^T$ is the vector representation of the quaternion $v$). That's it, we have our matrix representation of quaternion rotation. In fact, $ABv^T$ gives the same effect; it depends on which multiplication you do first: $qv$ or $vq^*$. So this tells us that $A$ and $B$ commute in this case: $AB=BA$. Now let us go ahead and compute this product.

$$AB=\left( \begin{matrix} a & -b & -c & -d \\ b & a & -d & c \\ c & d & a & -b \\ d & -c & b & a \end{matrix} \right) \left( \begin{matrix} a & b & c & d \\ -b & a & -d & c \\ -c & d & a & -b \\ -d & -c & b & a \end{matrix} \right)$$

$$=\left( \begin{matrix} a^2+b^2+c^2+d^2 & 0 & 0 & 0 \\ 0 & a^2+b^2-c^2-d^2 & 2(bc-ad) & 2(ac+bd) \\ 0 & 2(ad+bc) & a^2-b^2+c^2-d^2 & 2(cd-ab) \\ 0 & 2(bd-ac) & 2(ab+cd) & a^2-b^2-c^2+d^2 \end{matrix} \right)$$

I will let you compute $BA$ to make sure that it gives the same result. One last thing to mention is that sometimes people change the order of the components in a quaternion; this will change the matrix but the terms will remain the same.

I hope that this will end up being useful to you. Feel free to drop a comment or ask a question.

Thursday, September 1, 2016

Maintaining Disjoint Sets under Union

Time for a new post. This time, we will cover an elegant data structure for maintaining a class of sets under the operation of union. Formally, we start with $n$ classes each consisting of a single element: $\{1\}, \{2\}, \dots, \{n\}$. These correspond to the equivalent classes of some equivalent relation. We would like to be able to quickly find which class an element belongs to and quickly join two classes together to form one class (destroying the old classes along the way). These two operations are known as find and union and the data structure is often times called union-find.

There are many ways to implement these operations but we will cut to the chase and implement find with path compression and union by rank.

Representation of a Class

We are going to use a tree to represent an equivalent class. Each node in the tree corresponds to an element in the class. Each node also points to its parent in the tree. The root of the tree points back to itself. The root of the tree also serves as the representative element of the class. The root node will also maintain the number of nodes in its tree (we will see why this is useful when we talk about union by rank). Initially, every element lives in a separate class, so the trees will look like this for $n=5$ (the size of the trees is dropped here).

Initial setting

Find with Path Compression

Given an element, we would like to find the class it belongs to (or the representative of that class since each class has a unique representative). Using the representation above, we just climb to the root by following parent pointers and return the root element. The path to the root could be long and so making several calls to find with the same argument could end up taking a lot of time. Path compression avoids this by making the nodes along the path (except for the root itself) point to the root. This renders subsequent calls to find very cheap in terms of running time. Here is a pictorial representation of path compression. The right tree is the new state we get after a call to $find(4)$. The path to the root is highlighted in red.

Find with path compression

Union by Rank

The union operation takes two disjoint classes as input and creates a new class that is the union of the elements in the input classes. An easy way to implement this is to take one of the root elements and make it a child of the other root element. The union by rank heuristic suggests that it is a good idea to make the root of the class with the smaller number of elements a child of the root of the class with the larger number of elements. In the following figure, the class $\{1,2\}$ is joined with the class $\{3,4,5\}$ by making $1$ a child of $3$.

Union by rank


For each node we need to know its parent node and the size of its subtree. We will use one array, $uf$ (for union-find), of integers to account for both pieces of data. $uf[x]$ gives the parent of a node when $x$ is not a root and it gives the size of the subtree when $x$ is a root. How do we differentiate between the two ? We simply negate the size when the element is a root. So if $uf[x]$ is negative we can say that $x$ is a root element and that $-uf[x]$ is the size of the tree rooted at $x$ and when $uf[x]$ is non-negative we can say that $x$ is a non-root element and that it is a child of $uf[x]$.

Initially, every element lives in its own tree and so is the root of that tree. We can write a function to initialize the array as follows:

The union operation is very easy to implement. First we find the root elements and then make the one with the smaller size a child of the one with the larger size.

The find operation is slightly harder to implement. We find the root element in a first pass and then make the nodes along the path children of the root in a second pass.

That is all for this post. I hope that it was helpful. As always, feedback and questions are welcome.

Tuesday, June 28, 2016

The Maximum Contiguous Subarray Problem

Among all the contiguous subarrays of a given array find one in which the sum of elements is maximized. This problem is known as the maximum contiguous subarray sum problem and we will see a few ways in which it can be solved. We will call the array $A$ and assume that it has $n$ numbers.

Brute Force

The brute force solution is to loop over all of the subarrays and take any one that maximizes the sum. There are $\cal O(n^2)$ subarrays and it takes linear time to compute the sum so the brute force approach gives a $\cal{O}(n^3)$ algorithm to solve this problem. Note that we print the sum at the end but it is easy to augment the algorithm to find the ends of the subarray.

Optimized Brute Force

With a bit more cleverness we can chop a linear factor. We keep track of the current sum as we go instead of fixing the right end and then computing the sum in a nested loop.

Divide and Conquer

The divide and conquer approach to problem solving is to take the original problem, split it into smaller subproblems, solve the subproblems recursively and use the solutions to the subproblems to construct the solution to the original problem. Now imagine that we split the array into two halves; what can we say about the optimal subarray ? It can be completely contained in the left half, completely contained in the right half, or it can start somewhere in the left half and end somewhere in the right half. The first two cases are subproblems of the problem we started with so we can solve them recursively. The third case can be solved independently.

Let's call the split point $m$. The left end can be computed by starting at position $m$ and going down to position $0$ while keeping track of the maximum sum. The right end can be computed similarly: start at position $m+1$ and go up to position $n-1$ while keeping track of the maximum sum. This is illustrated in the following figure.

The following is an implementation of the above idea.

The running time of this algorithm can be calculated by using the recurrence $T(n)=2T(n/2)+\cal \Theta(n)$ which gives $\cal O(n \lg n)$ after unfolding.

Dynamic Programming

There is another way to split the problem into smaller subproblems and use the solutions to the smaller subproblems to construct the solution to the original problem. Imagine that you are given the maximum sum ending at a position $p$, i.e., the maximum coniguous subarray sum with the additional constraint that the right end of the array is at position $p$. How can this information be used to find the maximum contiguous subarray ending at position $p+1$ ? First take a look at this figure to see the whole picture.

Note that $S$ is the best sum we can get by fixing the right end at position $p$ (adding cells to the left won't improve $S$). Now to construct the maximum sum ending at position $p+1$ we have two options:

  1. Extend the previous sum by $A[p + 1]$,
  2. Take $A[p + 1]$ and ignore $S$.

These are the only options we have, trying to extend just part of $S$ is useless because we know that $S$ is the maximum sum that ends at the previous position and so adding just a part of that sum is clearly going to give us a suboptimal answer.

This way of splitting problems gives us an efficient way to find the maximum contiguous subarray sum ending at a given position. Since the optimal solution for the entire array has to end somewhere, then we just solve the problem for each of the ending points and take the one that maximizes the sum.

The difference between the divide and conquer approach and the dynamic programming approach is that the subproblems overlap in the case of dynamic programming whereas they don't in the case of divide and conquer. The running time of the dynamic programming algorithm is $\cal O(n)$ because there are $n$ subproblems and computing the answer for each of them takes a constant amount of work.

Monday, June 20, 2016

Variants of Binary Search


Given a sorted array $A$ and a target value $k$ to search for, return a position $p$ in the array for which $A[p]=k$ or return $-1$ to signal that the element was not found. The folklore algorithm to solve this problem is binary search: you look at the middle element, compare it to $k$ and decide whether to terminate, search in the left half-array or search in the right half-array. This is pretty fast and it takes time $O(\lg N)$ where $N$ is the length of the array. In this post we will explore several ways to implement binary search.

Standard Binary Search

Let's start by giving the standard implementation. The recursive implementation is obvious so let's focus on iterative implementations instead.

Meta Binary Search

Meta binary search is a variant in which the position containing the target value is built one bit at a time, starting with the most significant bit.

To get a better idea of how this algorithm works, let's walk through an example. Imagine that the input array is $A=[1, 3, 5, 7, 11]$ and that we are looking for the value $k=7$. The last position of the array is $4$ and we need $2$ bits to store it. The first loop computes this value. Now we start at the most significant bit and we have two options: set it to $1$ or set it to $0$. We try both options and select the correct one. If setting the current bit to $1$ exceeds the length of the array then our only choice is to set it to $0$. In the other case we set the bit to $1$ and check if the value sitting in the position is strictly greater than $k$, in which case we set the bit to $0$ instead because our position never decreases in value (so by setting the current bit to $1$ we'll just be searching in parts of the array in which all values are strictly greater than the target value).

Like standard binary search, meta binary search takes $O(\lg N)$ time. The first loop computes the number of bits required to encode the largest position in the array and the second one iterates over these bits and builds up the answer.

This might be your first encounter with meta binary search but you might have used it in the past without paying attention. If you wrote code to answer lowest common ancestor (LCA) queries by using sparse tables then your LCA procedure is actually a form of meta binary search.

Binary Search over Real Numbers

How to write binary search when the search space is a monotonic real interval ? Many programmers use epsilons as a terminating condition. So you often see code that looks like:

This approach is error prone and might even time out. We'll describe a clean binary search over a real interval. Instead of storing the left and right end points of the interval as we did with the integers, we'll store the left end point and the size of the search space. We divide the size of the search space by 2 after each step and so the algorithm is guaranteed to terminate when the size reaches $0$.

Fractional Cascading

We will close this post with a useful technique that can be used to reduce the cost of searching for a target value $k$ in a list of $M \ge 1$ sorted arrays instead of just one. To simplify the discussion let's assume that all of the arrays have length $N$. An easy way to solve this problem is to binary search in each of the arrays; this takes time $O(M \lg N)$. Can we do better ? Yes !

We will preprocess the list of arrays so that search queries are fast. The idea is to cascade a fraction of the last list to the one right above it and add links from those cascaded elements to their locations in the list they came from. Now take this newly created list and cascade a fraction of its elements to the list on top of it, and so on. Now we perform one binary search in the first array and then follow the pointers to locate the values in the subsequent arrays. This technique is often used in geometric algorithms to reduce the cost of searching.

Saturday, June 11, 2016

Storing a Graph in Memory

Graphs are undoubtedly one of the most important objects in computer science. There are several algorithms that operate on graphs and these algorithms might take different amounts of execution time depending on the graph representation being used. In this post, we will explore some of the ways in which a graph can be stored in memory. Throughout the rest of this post we will assume that the underlying graph contains $n$ nodes $0,1,\dots,n-1$ and $m$ edges $0,1,\dots,m-1$.

Edge List

The simplest representation is to store the edges of the graph in a list. This representation is useful when adjacency information (which edges go out of a given node) is not important. For example, Kruskal's algorithm for minimum spanning tree does not use adjacency information but it needs the edges sorted by weight and so this representation can be used to implement Kruskal.

Adjacency Matrix

When the graph is dense: the number of edges is close to the maximal number of edges, an adjacency matrix can be used. The edges are stored as entries in a $n \times n$ matrix $A$ so that: $$A_{i,j} = \begin{cases}1 \text{, if there is an edge from i to j} \\0 \text{, elsewise}\end{cases}$$

Adjacency List

The adjacency list representation is the one that is used the most. In this representation every node stores a list of outgoing edges. Traversing the graph by using this representation takes time $\mathcal{O} (n+m)$ whereas traversing it by using an adjacency matrix takes time $\mathcal{O} (n^2)$. When the graph is dense it does not really matter because $m=\mathcal{O} (n^2)$ but when it is sparse then algorithms that use the adjacency list representation are way faster. The following is an example implementation.

A Triple-Array Representation

The adjacency list implementation uses dynamic arrays (vectors) and there is a call to push_back each time an edge is added. We can further optimize by using static arrays instead of vectors. In this representation every node remembers the last edge out of it and every edge remembers the previous edge (with respect to the node on which both edges are incident) and the node to which it is pointing. Here is an example implementation.