Sparse Graphs

For large or sparse graphs, the sparse storage format provides significant memory savings whilst maintaining efficient access patterns.

CSR Format

Compressed Sparse Row (CSR) is an efficient format for storing sparse matrices. Instead of storing an entire V×V adjacency matrix, CSR stores only the non-zero elements.

Structure:

adj_ia(:): Row pointer array of size V+1 indicating the start of each vertex’s edge list
adj_ja(:,:): 2D array of size (2, E) storing edge information:
- adj_ja(1, :): Indices of connecting vertices (column indices)
- adj_ja(2, :): Indices of the edges connecting the vertices

Memory usage: O(V + 2E) instead of O(V²) for dense format

How it works: For vertex i, its edges are stored in adj_ja(:, adj_ia(i):adj_ia(i+1)-1). This gives O(degree) access to a vertex’s neighbours rather than O(V) scanning.

Example: For a graph with 10,000 vertices and 50,000 edges:

Dense: 10,000 × 10,000 = 100,000,000 values (400 MB for real32)
Sparse: 10,001 + 2×50,000 = 110,001 values (440 KB) - over 900× smaller!

Format Conversion

Convert between formats at any time:

! Dense → Sparse
call graph%convert_to_sparse()

! Sparse → Dense
call graph%convert_to_dense()

! Check current format
if (graph%storage_format == 'sparse') then
  write(*,*) 'Using sparse storage'
else
  write(*,*) 'Using dense storage'
end if

The conversion preserves all graph data - vertices, edges, weights, and features. You can convert as many times as needed without losing information.

Accessing Neighbours

In sparse format, use the CSR arrays to find neighbours:

! Get all neighbours of vertex i
integer :: j, start_idx, end_idx, neighbour, edge_idx

start_idx = graph%adj_ia(i)
end_idx = graph%adj_ia(i+1) - 1

do j = start_idx, end_idx
  neighbour = graph%adj_ja(1, j)
  edge_idx = graph%adj_ja(2, j)

  write(*,*) 'Vertex', i, 'connects to vertex', neighbour
  write(*,*) '  via edge', edge_idx, 'with weight', graph%edge(edge_idx)%weight
end do

Count neighbours:

! Number of neighbours (degree)
num_neighbours = graph%adj_ia(i+1) - graph%adj_ia(i)
write(*,*) 'Vertex', i, 'has', num_neighbours, 'neighbours'

When to Use Sparse Format

Choose sparse format when:

Large graphs:

V > 1000 vertices
Memory becomes a constraint with dense format

Sparse graphs:

E << V² (far fewer edges than possible)
Typical for real-world networks (social networks, web graphs, molecular structures)

Memory-constrained environments:

Limited RAM
Need to fit multiple graphs in memory simultaneously

When to Use Dense Format

Choose dense format when:

Small graphs:

V < 1000 vertices
Memory overhead is negligible

Dense graphs:

E ≈ V² (many edges)
Little benefit from sparse storage

Fast lookups needed:

Frequent edge existence queries
O(1) adjacency lookup vs O(degree) in sparse

Complete Example

Building and querying a large sparse graph:

program sparse_example
  use graphstruc, only: graph_type
  implicit none

  type(graph_type) :: graph
  integer :: i, j, start_idx, end_idx, neighbour
  integer :: num_neighbours
  real :: sparsity

  ! Build large graph
  graph%directed = .false.
  call graph%set_num_vertices(10000)

  ! Add edges (sparse connectivity pattern)
  do i = 1, 50000
    call graph%add_edge(index=[src(i), tgt(i)])
  end do

  ! Calculate sparsity
  sparsity = real(graph%num_edges) / real(graph%num_vertices**2)

  write(*,*) 'Graph statistics:'
  write(*,*) '  Vertices:', graph%num_vertices
  write(*,*) '  Edges:', graph%num_edges
  write(*,'(A,F8.6)') '  Sparsity:', sparsity

  ! Convert to sparse format
  call graph%convert_to_sparse()
  write(*,*) '  Format:', trim(graph%storage_format)

  ! Query neighbours of vertex 100
  write(*,*) ''
  write(*,*) 'Neighbours of vertex 100:'

  start_idx = graph%adj_ia(100)
  end_idx = graph%adj_ia(101) - 1
  num_neighbours = end_idx - start_idx + 1

  do j = start_idx, end_idx
    neighbour = graph%adj_ja(1, j)
    write(*,*) '  Vertex', neighbour
  end do

  write(*,*) 'Total:', num_neighbours, 'neighbours'

  ! Calculate all vertex degrees
  call graph%calculate_degree()

  ! Find high-degree vertices
  write(*,*) ''
  write(*,*) 'High-degree vertices (degree > 10):'
  do i = 1, graph%num_vertices
    if (graph%vertex(i)%degree > 10) then
      write(*,*) '  Vertex', i, 'degree:', graph%vertex(i)%degree
    end if
  end do

  call graph%clear()
end program sparse_example

Performance Considerations

Sparse format advantages:

Memory efficiency: Dramatically lower memory usage for sparse graphs
Cache locality: Better cache performance when iterating through neighbours
Scalability: Enables working with very large graphs (millions of vertices)
Iteration speed: Faster to iterate through actual neighbours vs scanning entire rows

Sparse format trade-offs:

Edge queries: Slower to check if a specific edge exists - O(degree) vs O(1)
Access patterns: Slightly more complex code for neighbour access
Conversion cost: One-time overhead to convert (though typically negligible)
Random access: Less efficient for random edge lookups

Performance tips:

Build graphs in dense format, convert once to sparse after construction
Use sparse for iteration-heavy algorithms (breadth-first search, shortest paths)
Use dense for algorithms requiring frequent edge existence queries
Consider graph sparsity: if E > 0.1×V², dense may be competitive

Sparse vs Dense Comparison

Operation	Dense	Sparse
Memory usage	O(V²)	O(V + E)
Check edge exists	O(1)	O(degree)
Get all neighbours	O(V)	O(degree)
Add vertex	O(V)	O(1)
Add edge	O(1)	O(1)
Iterate all edges	O(V²)	O(E)

For most real-world graphs where E << V², sparse format provides significant advantages.