Spectral Clustering¶

Spectral clustering uses the graph Laplacian matrix to partition the graph into communities.

Function Signature¶

pg.community.spectral_clustering(
    graph: PyGraph,
    k: int
) -> Dict[int, int]

Parameters¶

graph: Undirected graph to analyze
k: Number of communities to find

Returns¶

Dictionary mapping node IDs to community labels (0 to k-1).

Description¶

Spectral clustering works by:

Computing the graph Laplacian matrix
Finding the k smallest eigenvectors
Using k-means clustering on the eigenvector coordinates
Assigning nodes to clusters

Time Complexity¶

O(V³) for eigenvalue decomposition

Space Complexity¶

O(V²) for the Laplacian matrix

Example¶

import pygraphina as pg

# Create a graph with clear cluster structure
g = pg.PyGraph()

# Define clusters
cluster1 = [g.add_node(i) for i in range(5)]
cluster2 = [g.add_node(i+5) for i in range(5)]
cluster3 = [g.add_node(i+10) for i in range(5)]

# Connect within clusters densely
for cluster in [cluster1, cluster2, cluster3]:
    for i in range(len(cluster)):
        for j in range(i+1, len(cluster)):
            g.add_edge(cluster[i], cluster[j], 1.0)

# Add weak inter-cluster edges
g.add_edge(cluster1[-1], cluster2[0], 0.5)
g.add_edge(cluster2[-1], cluster3[0], 0.5)

# Detect communities
communities = pg.community.spectral_clustering(g, k=3)

# Verify
from collections import Counter
print(f"Found {len(set(communities.values()))} communities")
print(f"Size distribution: {Counter(communities.values())}")

Advantages¶

Theoretically well-founded
Works well for balanced clusters
Handles complex network structures
Optimal for certain graph cuts

Disadvantages¶

Requires knowing k in advance
Computationally expensive for large graphs
O(V³) complexity limits scalability
Sensitive to graph structure

When to Use¶

Medium-sized graphs (< 10,000 nodes)
Known number of communities
Need mathematical rigor
Graph has balanced community sizes

Comparison with Other Methods¶

Method	Speed	Scalability	Quality	Requires k
Spectral	Slow	Poor	Very Good	Yes
Louvain	Fast	Excellent	Excellent	No
Label Propagation	Very Fast	Excellent	Good	No
Girvan-Newman	Very Slow	Very Poor	Good	Yes

Mathematical Background¶

The algorithm is based on the graph Laplacian:

L = D - A

Where: - D is the degree matrix (diagonal) - A is the adjacency matrix

The eigenvectors of L provide information about graph structure. Nodes with similar eigenvector values tend to be in the same community.

Practical Notes¶

Initialize k-means carefully for stable results
Normalize eigenvectors for consistent clustering
Works on both connected and disconnected graphs
May need preprocessing for weighted graphs

Implementation Notes¶

Uses spectral decomposition of Laplacian matrix
Applies k-means to eigenvector coordinates
Returns community labels (0 to k-1)
Deterministic for a given input

References¶

Ng, A. Y., Jordan, M. I., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm.
von Luxburg, U. (2007). A tutorial on spectral clustering.

When to Use¶

Known number of communities
Well-separated clusters expected
Smaller graphs (< 10,000 nodes)
Need theoretical guarantees

Spectral Clustering¶

Function Signature¶

Parameters¶

Returns¶

Description¶

Time Complexity¶

Space Complexity¶

Example¶

Advantages¶

Disadvantages¶

When to Use¶

Comparison with Other Methods¶

Mathematical Background¶

Practical Notes¶

Implementation Notes¶

See Also¶

References¶

When to Use¶