Girvan-Newman Algorithim

What is the Girvan-Newman algorithm?

The Girvan Newman algorithm is used to find communities in groups in unweighted graphs. A community is a group of nodes that are all connected to each other, directly or indirectly. The GN algorithm gives a hierarchical decomposition of a graph. Every iteration shows more closely knit communities. It is good at discovering clusters and how clustered they are.

How do I find communities?

The Girvan-Newman algorithm uses the idea of edge betweeness. Edge betweeness is the number of shorter paths that use this edge. Shorter paths are counted using every single vertex pair. The algorithm is attempting to remove the edges that seem to be connecting two different groups loosely.

Conducting the algorithm starts with determining edge betweeness. The algorithm is very difficult to understand without a visual. So, I have provided some

  1. Begin a BFS from each node in the and calculate each node’s value.

    1. A node value is determined by the values that are leading into it in this iteration of BFS.

  2. Now, from the leaf nodes, go back and calculate the edge betweeness scores.

    1. Leaf nodes have an initial score of 0, and everything else starts with a score of 1.

      1. Edge scores are calculated as the sum of the edges going into the start node plus 1 (if not a leaf node).

    2. A node splits a total score of 1 between all of its parents proportionally based on their node scores.

      1. Multiple parents

        1. If there is only one node, the edge between the two nodes gets a score of 1.

        2. If there are two parent nodes, each with a node score of 2, Then, they will split the 1 evenly, with 0.5 going to each

        3. If there are two parents, one with a node score of 1 and one with a node score of 4, The first node will receive 1/5, and the other node will receive 4/5.

  3. Once all the edges have been calculated, go and choose a different node to start the BFS with. Continue until every node is used. The edge between them is a combination of all of the sums, so keep adding to them as it goes.

  4. After a BFS has occurred on every single node, remove the edge(s) with the highest score.