Parallel Algorithms¶

Parallel implementations of graph algorithms for large-scale graphs.

Overview¶

For very large graphs (millions of nodes, tens of millions of edges), PyGraphina provides parallel processing to speed up computations. The parallel module offers multi-threaded implementations of popular algorithms to maximize performance on multi-core systems.

Available Algorithms¶

Algorithm	Description
PageRank	Multi-threaded power iteration
BFS	Breadth-first search from multiple starting points
Connected Components	Parallel union-find for connected component detection
Degrees	Parallel degree calculation for all nodes
Triangles	Parallel triangle counting
Clustering Coefficients	Parallel clustering coefficient calculation
Shortest Paths	Parallel shortest path computation

Usage¶

Basic Usage¶

import pygraphina.parallel as pgp
import pygraphina as pg

g = pg.PyGraph()
# ... add nodes and edges ...

# Parallel PageRank
scores = pgp.pagerank_parallel(g, damping=0.85, max_iterations=100, tolerance=1e-6)

# Parallel BFS from multiple starting points
results = pgp.bfs_parallel(g, starts=[0, 10, 20])

# Parallel degree computation
degrees = pgp.degrees_parallel(g)

Processing Multiple Graphs¶

Use Python's multiprocessing for independent graphs:

from multiprocessing import Pool
import pygraphina as pg

def analyze_graph(graph_data):
    """Analyze a single graph."""
    g = pg.PyGraph()
    # Build graph from graph_data
    for node_attr in graph_data['nodes']:
        g.add_node(node_attr)
    for src, tgt, weight in graph_data['edges']:
        g.add_edge(src, tgt, weight)

    # Run analysis
    pagerank = pg.centrality.pagerank(g, 0.85, 100, 1e-6)
    communities = pg.community.label_propagation(g, max_iter=100)

    return {
        'pagerank': pagerank,
        'communities': communities
    }

# Process multiple graphs in parallel
graphs = [graph1_data, graph2_data, graph3_data, ...]
with Pool(processes=4) as pool:
    results = pool.map(analyze_graph, graphs)

Graph Construction¶

Build graphs efficiently:

# Recommended: Batch operations
g = pg.PyGraph()
node_ids = [g.add_node(attr) for attr in node_attributes]
for src, tgt, weight in edge_list:
    g.add_edge(src, tgt, weight)

# Slower: Many small operations
for attr in node_attributes:
    node_id = g.add_node(attr)
    # Do something with node_id immediately

API Reference¶

pagerank_parallel()¶

Compute PageRank scores using parallel processing.

import pygraphina.parallel as pgp

scores = pgp.pagerank_parallel(
    graph,
    damping=0.85,
    max_iterations=100,
    tolerance=1e-6,
    nstart=None
)

bfs_parallel()¶

Perform breadth-first search from multiple starting nodes.

import pygraphina.parallel as pgp

results = pgp.bfs_parallel(
    graph,
    starts=[0, 10, 20]
)

connected_components_parallel()¶

Find connected components using parallel union-find.

import pygraphina.parallel as pgp

components = pgp.connected_components_parallel(graph)

degrees_parallel()¶

Compute node degrees in parallel.

degrees = pgp.degrees_parallel(graph)

triangles_parallel()¶

Count triangles in parallel.

triangle_counts = pgp.triangles_parallel(graph)

clustering_coefficients_parallel()¶

Calculate clustering coefficients in parallel.

clustering = pgp.clustering_coefficients_parallel(graph)

shortest_paths_parallel()¶

Compute shortest paths in parallel.

paths = pgp.shortest_paths_parallel(graph, sources=[0, 1, 2])

Contributing¶

Interested in parallel algorithms or optimizations? We welcome contributions!

Check open issues for parallel algorithm tasks
See CONTRIBUTING.md for guidelines
Parallel implementations should maintain API compatibility