Skip to content

Similarity Metrics

Similarity metrics measure how alike two nodes are based on their network connections.

Overview

Similarity-based link prediction assumes that similar nodes are more likely to connect in the future.

Jaccard Coefficient

The Jaccard coefficient measures the overlap of neighbor sets:

pg.links.jaccard_coefficient(graph) -> Dict[Tuple, float]

Formula: J(u, v) = |N(u) ∩ N(v)| / |N(u) ∪ N(v)|

Ranges from 0 (no common neighbors) to 1 (identical neighborhoods).

Common Neighbors

The simplest metric - just count shared neighbors:

pg.links.common_neighbors(graph) -> Dict[Tuple, int]

Fast but less sophisticated than other metrics.

Adamic-Adar Index

Weights common neighbors by their degree (low-degree neighbors count more):

pg.links.adamic_adar_index(graph) -> Dict[Tuple, float]

Formula: A(u, v) = Σ(1 / log(degree(w))) for common neighbors w

Good for social networks where rare connections are more meaningful.

Example

import pygraphina as pg

# Create a network
g = pg.PyGraph()
nodes = [g.add_node(i) for i in range(10)]

# Add some edges
edges = [
    (0, 1), (0, 2), (1, 2),  # Triangle
    (2, 3), (3, 4), (4, 5),  # Path
]

for u, v in edges:
    g.add_edge(nodes[u], nodes[v], 1.0)

# Compare metrics
jaccard = pg.links.jaccard_coefficient(g)
adamic = pg.links.adamic_adar_index(g)
common = pg.links.common_neighbors(g)

# Example: nodes 0 and 3
key = (nodes[0], nodes[3])
print(f"Jaccard (0,3): {jaccard.get(key, 'N/A')}")
print(f"Adamic-Adar (0,3): {adamic.get(key, 'N/A')}")
print(f"Common neighbors (0,3): {common.get(key, 'N/A')}")

Comparison

Metric Complexity Range Sensitivity Computes
Common Neighbors O(d) [0, ∞) Simple count Single pair
Jaccard O(d²) [0, 1] Normalized All non-adj pairs
Adamic-Adar O(d² log V) [0, ∞) Low-degree aware All non-adj pairs

Note: d represents the average degree of nodes. Common neighbors computes for a specific pair in O(d) time, while Jaccard and Adamic-Adar compute for all non-adjacent pairs.

Use Cases

  • Friend recommendation in social networks
  • Citation prediction in academic networks
  • Biological network analysis
  • Knowledge graph completion

Strengths

  • Simple and interpretable
  • Fast computation
  • No parameters to tune
  • Good empirical performance

Limitations

  • Only considers local structure
  • Ignores node attributes
  • Equal weight to all common neighbors (except Adamic-Adar)