Project Sandbox: Archival Devlog

Cosimaging

An archival coding challenge and personal sandbox exploring SMLM analysis algorithms, GPU acceleration, and zero-dependency pipeline orchestration.

The Engineering Challenge

In single-molecule localization microscopy (SMLM), workflows frequently suffer from massive friction, requiring disparate tools across ImageJ, MATLAB, and custom Python scripts to take raw camera data to cluster statistics.

The objective of this personal sandbox project was to see if a fully integrated, zero-dependency data pipeline could be orchestrated within a single environment. The benchmark was to handle millions of data points, execute publication-standard mathematics, and structure the entire workflow without requiring command-line execution.

Architectural Pillars

Unified Orchestration

The primary design goal was data continuity. The architecture was built to handle localisation, filtering, drift correction, clustering, and 3D visualisation within a single memory space, eliminating file-formatting friction.

Algorithmic Fidelity

Mathematical modules were coded from scratch to strictly mimic peer-reviewed methods: Thompson-Larson-Webb precision, Costes auto-thresholding, Manders coefficients, Fourier Ring Correlation, and DBSCAN.

Zero-CLI Execution

Engineered a full graphical interface to navigate from raw data arrays to PDF reports, proving complex imaging pipelines can be executed without script or command-line dependencies.

Standalone Compilation

To bypass environment setups, the entire Python stack was compiled into a standalone execution layer, handling complex dependencies seamlessly.

Automated Pipeline

Prototyped a one-click Auto-Analysis wizard to run the entire SMLM pipeline with smart defaults, automatically generating comprehensive PDF + CSV reports.

Batch Processing

Engineered batch capabilities to process hundreds of datasets sequentially using saved workflow parameters, ensuring analytical reproducibility.

Technical Deep-Dive

The sandbox successfully prototyped and validated core processing layers across multiple modalities.

🔬 Super-Resolution Microscopy (SMLM) Pipeline

A complete algorithmic workflow designed for single-molecule localisation microscopy, from raw blinking movies to cluster statistics.

  • Molecule Localisation — Implemented a batched 2D Gaussian fitting engine (linearised weighted least-squares) to convert raw TIFF blinking movies into precise molecular coordinates. Output calculations included X/Y, photon count, background, PSF width, Thompson-formula precision, and SNR.
  • Data Quality Check — Designed an automated grading matrix to evaluate localisation data from A (Excellent) to D (Poor). Included technique-specific thresholds for dSTORM, PALM, and DNA-PAINT.
  • Localisation Filters — Built non-destructive, interactive range sliders for spatial attributes (Photons, Background, Precision, PSF Sigma, SNR) featuring real-time 2D view updates.
  • Single Peak Removal — Integrated isolated noise spike removal utilising KD-Tree based spatio-temporal nearest-neighbour analysis with Z-axis scaling.
  • Drift Correction (DME) — Executed temporal segment cross-correlation paired with parabolic sub-pixel refinement algorithms.
  • Image Resolution (FRC) — Coded Fourier Ring Correlation using deterministic odd/even frame splits, Tukey windowed FFTs, and smoothed curves to report resolution at the 1/7 threshold (Nieuwenhuizen et al., 2013).
  • Cluster Analysis (DBSCAN) — Integrated density-based spatial clustering featuring separate XY and Z epsilon parameters for 3D topologies, validated via Ripley’s K analysis.
  • Positivity Analysis — Developed spatial centroid merging algorithms to classify clusters as single-, double-, or triple-positive across multi-channel datasets.

📐 Widefield & Confocal Image Analysis

Mathematical toolsets engineered to process standard TIFF microscopy image arrays.

  • Co-localisation — Implemented publication-standard metrics following Manders (1993) and Costes (2004). Utilised Costes automatic thresholding to segregate signal from background. Integrated a 200-iteration block-scramble significance test (p ≥ 0.95) and cytofluorogram scatter generation.
  • Auto Mask (Otsu) — Programmed automatic signal/background separation using Otsu’s thresholding to generate binary mask layers for downstream clustering.
  • Z-Projection (MIP) — Built streaming Maximum Intensity Projection (MIP) arrays capable of handling multi-frame TIFF stacks with memory-efficient architectures.
  • Intensity Line Profile — Engineered interactive point placement tools to measure intensity arrays along designated topological paths.

🎨 Visualisation & Rendering

  • 2D Viewer — Engineered a high-performance rendering canvas featuring Level of Detail (LOD) management, smooth panning, and customisable LUTs for multi-layer data compositing.
  • 3D Scatter Viewer — Utilised PyVista and VTK to build a GPU-accelerated 3D scatter viewer capable of handling datasets exceeding millions of points with eye-dome lighting.
  • Orthogonal Projections — Constructed simultaneous XY, XZ, and YZ viewports for verifying 3D data topologies.
  • 3D Volume Projector — Integrated cluster-based density projections with inter-cluster distance measurement tools.

⚡ Automation & Batch Processing

  • Auto-Analysis Wizard — Designed a 3-step orchestration framework to run the full algorithmic pipeline automatically: quality check → filter → drift → FRC → cluster → co-localisation.
  • Batch Processing — Enabled folder-level ingestion of CSV or TIFF arrays, allowing the replay of saved JSON workflows with consistent deterministic parameters.
  • Workflow Logs — Ensured every action within the execution environment generated a timestamped JSON workflow log for perfect scientific reproducibility.

🧰 Architecture Quality of Life

  • Layer System — Built a management tree for multiple data sources (CSV, TIFF, Cluster, Mask) with visibility toggles and per-layer properties.
  • Column Mapping — Implemented auto-detection for non-standard CSV column names from ThunderSTORM and rapidSTORM outputs.
  • Multi-Channel TIFF Handling — Coded routines to load dual-channel side-by-side TIFFs and auto-split them into separate rendering layers.

Automation Orchestration

1. Data Ingestion

Raw TIFF stacks or CSV arrays are ingested via drag-and-drop, triggering auto-detection of column mapping and camera presets.

2. Pipeline Execution

The wizard orchestrates the full pipeline run, or modules can be engaged individually for granular algorithmic control.

3. Deterministic Export

Outputs are compiled into publication-ready PDF reports, CSV statistics, high-resolution renders, and JSON workflow logs.

Technical Specifications

PlatformWindows 10 / 11 (standalone compiled executable)
Core LanguagePython 3.10+
Acceleration StackPyVista/VTK for 3D, CuPy for GPU-accelerated FFT
Data ThroughputValidated with 10M+ localisations and 4K×4K TIFF stacks
Ingest FormatsCSV (any delimiter), TIFF (8/16/32-bit, multi-frame)
Core LibrariesNumPy, SciPy, Pandas, Matplotlib, scikit-learn, scikit-image, ttkbootstrap

Trending