HLD – eggshellwarriors

What is heavy-light decomposition?

Heavy-light decomposition is a way to modify a tree to optimize for path queries. Operatins such as getting the max along a path or getting the sum along a path are now possible in log^2(n). It also allows for changes in the tree. Heavy-light decomposition is possible by assigning heavy edges and light edges.

Precomputation for Queries (There’s a lot):

Assigning heavy and light edges:

In heavy-light decomposition, there are two types of edges: heavy edges and light edges. Each node in the tree will have one child at most, which is connected by a heavy edge. The heavy edge is the edge that connects the biggest subtree. This will create heavy chains, which are paths of continuous heavy edges.

The biggest subtree can be calculated with a post-order DF. After assigning all edges, any path in the tree will contain at most log(n) light edges. There is proof for this.

Assigning an order:

Values on light edges are just accounted for like normal in O(1). Heavy chains will be done in log (n). The ability to get log(n) time for calculating information over a heavy chain is done by a segment tree. There is a certain order in which the values in the segment tree should be ordered to make range operations occur.

When assigning values 0 to n-1 for indices in the segment tree, whenever a heavy chain is detected, continue through the entire chain until going back to light edges. Now for each vertical chain in the tree, a segment tree will be constructed using the values.

Assigning the top of each chain:

It’s important to assign the highest node to each heavy chain. Initially, every node is simply assigned to itself. Then a search is done. When a node is connected to its parent through a heavy edge, it also inherits the exact same “top value”.

Queries:

Queries on a path can be expanded into two different paths. It can be separated into values from the starting node to the LCA (Lowest Common Ancestor) and the ending node to the LCA.

When calculating the paths, there might be some thought about what to include in the answer. Such as, depending on how the problem is set up, the LCA node may or may not be included.

This tutorial primarily uses edges that are used to compute the query, but it is possible that the questions use nodes. In that case, just push down the edges to their end nodes.

Determining the LCA:

The LCA can be determined using binary lifting. Binary lifting allows for the LCA to be calculated in O(log n) time.

Getting the Answer:

Start from one of the nodes. Continue up the tree until you reach the LCA. If a node is connected through a light edge, then simply process its value as normal in O(1). If a part of a heavy path is found, then calculate using segment tree range queries using the range set by their labels (determined in precomputation step 2). Then proceed to the node at the top of the chain for further calculations.

Sometimes, the LCA may be located inside a heavy chain. If that is the case, the LCA will have the same “top node” as the current value. Now, using the labels (determined in precomputation step 2), do a range query.

After getting to the LCA, just do it to the other node, then combine the answers to get the full answer.

Practice problems with short answer summaries:

All of these problems were shamelessly taken from the USACO Guide article on HLD. The links to the problem can be found on their website. https://usaco.guide/plat/hld?lang=cpp. I chose to include all, but unless I am missing something super obvious, some problems don’t require HLD.

Cow Land

This is more or less the exact scenario talked about in the article. Light edges would just store a value. Then, the segment trees will store XOR values. For each path, XOR the values obtained from light edges, and then also XOR the values from the range queries.

*XOR can be done in any order. 5 XOR 3 XOR 4 is the same as 4 XOR. 5 XOR 3

Vertex Path Composite

This one feels strange at first but, then there is a realization that it is just like all of the other problems except it uses composite functions instead of values. For example let f(x) = x^3 + 4 and g(x) = -3x^2 – 4. Then f(g(x)) = (-3x^2 -4) ^(3) +4. In each segment tree node the composition of the functions will be stored. In this problem it also chooses the path is “directed” since composing functions together doesn’t not allow for the order to be changed. So, in the segment trees if there is a f(x) and g(x) stored in the nodes. Then, it’s parent has to store f(g(x)) and g(f(x)).