Similarity Metrics¶
Similarity metrics measure how alike two nodes are based on their network connections.
Overview¶
Similarity-based link prediction assumes that similar nodes are more likely to connect in the future.
Jaccard Coefficient¶
The Jaccard coefficient measures the overlap of neighbor sets:
Formula: J(u, v) = |N(u) ∩ N(v)| / |N(u) ∪ N(v)|
Ranges from 0 (no common neighbors) to 1 (identical neighborhoods).
Common Neighbors¶
The simplest metric - just count shared neighbors:
Fast but less sophisticated than other metrics.
Adamic-Adar Index¶
Weights common neighbors by their degree (low-degree neighbors count more):
Formula: A(u, v) = Σ(1 / log(degree(w))) for common neighbors w
Good for social networks where rare connections are more meaningful.
Example¶
import pygraphina as pg
# Create a network
g = pg.PyGraph()
nodes = [g.add_node(i) for i in range(10)]
# Add some edges
edges = [
(0, 1), (0, 2), (1, 2), # Triangle
(2, 3), (3, 4), (4, 5), # Path
]
for u, v in edges:
g.add_edge(nodes[u], nodes[v], 1.0)
# Compare metrics
jaccard = pg.links.jaccard_coefficient(g)
adamic = pg.links.adamic_adar_index(g)
common = pg.links.common_neighbors(g)
# Example: nodes 0 and 3
key = (nodes[0], nodes[3])
print(f"Jaccard (0,3): {jaccard.get(key, 'N/A')}")
print(f"Adamic-Adar (0,3): {adamic.get(key, 'N/A')}")
print(f"Common neighbors (0,3): {common.get(key, 'N/A')}")
Comparison¶
| Metric | Complexity | Range | Sensitivity | Computes |
|---|---|---|---|---|
| Common Neighbors | O(d) | [0, ∞) | Simple count | Single pair |
| Jaccard | O(d²) | [0, 1] | Normalized | All non-adj pairs |
| Adamic-Adar | O(d² log V) | [0, ∞) | Low-degree aware | All non-adj pairs |
Note: d represents the average degree of nodes. Common neighbors computes for a specific pair in O(d) time, while Jaccard and Adamic-Adar compute for all non-adjacent pairs.
Use Cases¶
- Friend recommendation in social networks
- Citation prediction in academic networks
- Biological network analysis
- Knowledge graph completion
Strengths¶
- Simple and interpretable
- Fast computation
- No parameters to tune
- Good empirical performance
Limitations¶
- Only considers local structure
- Ignores node attributes
- Equal weight to all common neighbors (except Adamic-Adar)