Tempo — Ethan Kong

Overview

Tempo is a 4D volumetric telepresence and chrono-spatial memory system built at TreeHacks 2026, where it was shortlisted for the Grand Prize. It captures navigable 3D point clouds from distributed Kinect v2 sensors and streams them in real time to desktop, web (Three.js), and HoloLens viewers — letting you walk around a live scene or scrub back through a 900-frame ring buffer replay.

Architecture

The system is split across three layers:

Capture clients (C++) — One process per Kinect v2 sensor. Each client encodes depth + color into a 9-byte-per-vertex quantized frame (optional ZSTD compression) and streams it over a framed TCP protocol to the orchestrator.

Orchestrator (C#) — Receives frames from all clients, applies OpenCV solvePnP extrinsic calibration to align sensors into a shared world frame, and optionally runs C++ ICP refinement for tighter registration. Fused frames are broadcast to connected viewers via WebSocket.

Viewers — A Three.js web client renders the live point cloud with orbit controls and a scrub bar for the replay buffer. A HoloLens client overlays the capture in AR. Multi-user annotations (point, circle, freehand) are synced across all connected sessions and stored in the .lssnap binary format.

Technical Details

Protocol: custom framed TCP (length-prefixed) between C++ clients and C# orchestrator; WebSocket fan-out to viewers
Calibration: OpenCV solvePnP for extrinsics; optional C++ ICP refinement per fused frame
Compression: 9 bytes/vertex (3 × Int16 XYZ quantized to mm, 3 × Uint8 RGB); optional ZSTD for bandwidth-constrained links
Replay: 900-frame ring buffer in the orchestrator with per-frame timestamps; viewers can scrub the timeline
Annotations: serialized as JSON inside .lssnap files alongside the point cloud geometry