KD Tree

What is a KD tree (K-dimensional tree)?

A KD tree is a binary search tree that supports queries that have more than one dimension. A normal BST has only one dimension; it only has to compare one value. However, if we were trying to calculate the closest point on the xy plane, we would need to make some modifications. Both the x value and the y value would have been accounted for.

KD Tree:

A KD tree acts exactly the same as a BST. Let’s say we are attempting to tackle the closest point problem on a 2D plane. Since the values are two-dimensional, the tree will be broken into sections of two. The 0th layer will compare x values, the 1st layer will compare y values, and the 2nd layer will go back to comparing x values.

You can think of each point on the tree as a line that splits the xy plane. Where they split depends on their position, and which direction they split depends on the layer they are on.
 

Inserting into a KD tree:

A KD tree simplifies the problem to a normal BST problem. When inserting a point, go down the tree like a normal BST. If on an x layer, use the x values stored in the nodes to make the decision. If on a y layer, then use the y values stored in the nodes to make the decision.

 

Querying the tree:

The time complexity of quering a balanced KD tree is O(log n), but it has a larger constant factor than a standard balanced BST. Using the standard BST way of traversing the tree based on the layer is only half the problem. This problem has to be implemented thoroughly. This is because there is a possibility that the closest point was actually missed.

The first step is to just find the closest node based on doing the BST with the KD layers. Throughout the process, make sure to remember the best distance so far. Once at that node, the recursion will begin to unwind. Let’s call the nodes we want to get as close to possible as nodes (i,j). At each node (besides the root) in the recursion, calculate its distance to (i,j). For distance, it is usually calculated as the Euclidean distance.

Now, at the parent of the current node, calculate the distance from the root. Remember how the KD tree can be seen as a collection of lines? This is where it becomes useful. For this algorithm, we need to find the distance from (i,j) to the line produced from the parent of the current node. As a reminder, this line is always perpendicular to the line produced from the parent node.

If the distance from (i,j) to the parent line is greater than the current point, then there is a chance that there is a closer node. Simply traverse into the parent nodes of the other child and compare distances. Update the best distance if necessary. Once back at the root, simply return the node with the best distance.

Applications:

The KD tree was invented to solve the closest point problem. In the tutorial, the 2-dimensional version was used for simplicity. However, this can be extended to any dimension of variables (the K-Dimensions part of the KD tree). The closest point problem has a lot of applications.

Finding the nearest gas station (based strictly on distance): Every point in the KD tree will be a 3-dimensional point that stores the location of a gas station. Based on your current location, the KD tree is able to find the closest gas station.

Diagnostics: Let’s say a disease is determined by 10 different variables. Doctors are unable to find the best treatment for the disease, given the patient’s values. If a 10-dimensional tree were made where each value of a disease was stored, The patient’s data can be thrown into the KD tree to figure out what the most likely diagnosis is.