← Projects

SPACEc

A Stanford academic library for multiplexed imaging analysis — from cell segmentation to spatial analysis.

19,123 lines of Python · 6 core modules · 19 notebooks · Published on PyPI & bioRxiv

SPACEc logo

A streamlined, interactive Python workflow for multiplexed image processing — from tissue extraction to spatial analysis in a single library.

# From raw images to spatial analysis
sp.tl.cellpose_segmentation(img, ...)
sp.pp.filter_cells(adata, ...)
sp.tl.leiden(adata, resolution=1.0)
sp.tl.annotate(adata, method='stellar')
sp.tl.patch_proximity(adata, ...)
sp.pl.spatial(adata, color='cell_type')

The Problem

Multiplexed imaging lets researchers see 40+ proteins in a single tissue section — a massive leap for understanding how cells interact in cancer, autoimmunity, and neuroscience. But the analysis pipeline is duct tape: Cellpose for segmentation, scanpy for clustering, custom scripts for spatial analysis, matplotlib for visualization.

Researchers were spending more time wrangling code than doing science. Each lab had its own fragile collection of notebooks that broke when dependencies updated.

The Idea

A unified library that handles the entire workflow — raw TIFF images to spatial analysis results — in a single, consistent API. SPACEc follows the scanpy convention: sp.tl (tools), sp.pp (preprocessing), sp.pl (plotting), sp.hf (helpers). Built on AnnData objects, the standard format in single-cell biology.


Architecture

SPACEc pipeline overview — tissue extraction, segmentation, preprocessing, clustering, annotation, and spatial analysis

Full pipeline overview from the SPACEc paper

Input Multi-channel
TIFF images
CSV markers
Segmentation Cellpose
DeepCell (Mesmer)
QuPath import
Preprocessing Spillover comp
Noise removal
Normalization
Clustering Leiden / Louvain
FlowSOM
GPU (RAPIDS)
Annotation SVM / k-NN
STELLAR (GNN)
Hyperparameter tuning
Spatial Neighborhoods
Cell interactions
Patch Proximity*

*Patch Proximity Analysis is the novel research contribution.

Module Structure


The Novel Contribution

Patch Proximity Analysis

Traditional neighborhood analysis looks at individual cell-to-cell distances. It tells you that cell A is near cell B, but it misses the bigger picture: how are groups of cells organized at the tissue level?

Patch Proximity Analysis works differently. First, DBSCAN identifies spatial clusters of a given cell type — patches of tumor cells, immune cell aggregates, stromal clusters. Then concave hull algorithms draw tight boundaries around each patch. Finally, the analysis measures how other cell types relate to those boundaries: which cells are inside? Which are at the border? Which are nearby?

This captures tissue-level spatial organization that cell-level analysis misses entirely. A tumor microenvironment isn't just individual interactions — it's the architecture of which cell populations are adjacent to which other populations.


Decisions

Why AnnData?

AnnData is the standard data structure in single-cell biology. By building on it, SPACEc integrates immediately with scanpy, the dominant analysis framework. Researchers don't learn a new format — they move between SPACEc and scanpy seamlessly. The .obs, .var, .obsm structure maps naturally to cell metadata, marker metadata, and spatial coordinates.

Why support both Cellpose and DeepCell?

Different imaging modalities need different segmentation approaches. Cellpose excels at cytoplasm segmentation with limited training data. DeepCell (Mesmer) is better for nuclear segmentation in whole-cell multiplex images. Supporting both means researchers pick the best tool for their data without leaving the library.

Why GPU acceleration?

Multiplexed imaging datasets are enormous — tens of thousands of cells per sample, hundreds of samples per study. CPU-based Leiden clustering that takes 20 minutes drops to under 2 minutes on GPU via RAPIDS. For iterative analysis, that's the difference between interactive exploration and going for coffee.

Why the scanpy API convention?

Familiarity reduces adoption friction. Every computational biologist already knows sc.tl, sc.pp, sc.pl. Using the same pattern (sp.tl, sp.pp, sp.pl) means the learning curve is the domain, not the API. It's a small decision that makes a large difference in whether people actually use the library.


Tradeoffs

Python 3.9–3.10 only. TensorFlow, PyTorch, and RAPIDS all have strict version requirements that don't overlap cleanly above 3.10. This is the single biggest friction point for adoption. Docker containers (CPU and GPU variants) are the recommended install path for a reason.

Monolithic modules. _general.py at 5,446 lines is hard to navigate and contributes to. The scanpy convention prioritizes API consistency (sp.tl.function()) over file organization. This was a conscious tradeoff — the public API is clean even if the internals are dense.

Dependency sprawl. Cellpose, DeepCell, RAPIDS, PyTorch Geometric, GeoPandas, TensorFlow — each brings transitive dependencies that conflict with each other. A fresh install without Docker is an afternoon project. This is the cost of wrapping the best tool for each step rather than reimplementing everything.


Talk

SPACEc presentation at Stanford
SPACEc: A Streamlined, Interactive Python Workflow Stanford University · Nolan Lab

Python AnnData Cellpose DeepCell TensorFlow PyTorch RAPIDS scanpy GeoPandas Docker PyPI