Implement Merkle Search Tree (MST) data structure #19

Closed
opened 2026-04-12 17:28:37 +00:00 by Grandiras · 0 comments
Owner

Summary

The AT Protocol spec defines a Merkle Search Tree (MST) data structure as the core of repository storage. ATProto.NET can read CAR files and fetch repos, but lacks a proper MST implementation for creating, mutating, and verifying repository trees locally.

Spec Reference

From the Repository specification:

Required functionality

  1. MST Construction - Build MST from key/value pairs (path → CID)
  2. MST Mutation - Insert, update, delete entries with deterministic tree rebalancing
  3. Key Depth Calculation - SHA-256 hash of key, count leading zeros in 2-bit chunks (fanout of 4)
  4. Key Prefix Compression - Compact node encoding with shared prefix elision
  5. MST Verification - Validate tree structure, node integrity, key ordering
  6. Diff Generation - Generate "CAR slices" showing changes between two revisions
  7. Commit Object Handling - Create and sign commit objects (version 3 format)
  8. Repository Path Handling - <collection>/<rkey> path format

MST Node Schema (CBOR)

  • l (CID, nullable): left subtree link
  • e (array): ordered TreeEntry objects
    • p (int): prefix length shared with previous entry
    • k (bytes): key suffix after prefix removal
    • v (CID): link to record data
    • t (CID, nullable): right subtree link

Why this matters

  • Required for PDS implementation (issue #2)
  • Enables local repository verification
  • Enables offline repo construction/manipulation
  • Required for firehose commit verification ("inductive firehose" in Sync v1.1)
  • Needed for proper CAR file export generation
## Summary The AT Protocol spec defines a Merkle Search Tree (MST) data structure as the core of repository storage. ATProto.NET can read CAR files and fetch repos, but lacks a proper MST implementation for creating, mutating, and verifying repository trees locally. ## Spec Reference From the [Repository specification](https://atproto.com/specs/repository#mst-structure): ### Required functionality 1. **MST Construction** - Build MST from key/value pairs (path → CID) 2. **MST Mutation** - Insert, update, delete entries with deterministic tree rebalancing 3. **Key Depth Calculation** - SHA-256 hash of key, count leading zeros in 2-bit chunks (fanout of 4) 4. **Key Prefix Compression** - Compact node encoding with shared prefix elision 5. **MST Verification** - Validate tree structure, node integrity, key ordering 6. **Diff Generation** - Generate "CAR slices" showing changes between two revisions 7. **Commit Object Handling** - Create and sign commit objects (version 3 format) 8. **Repository Path Handling** - `<collection>/<rkey>` path format ### MST Node Schema (CBOR) - `l` (CID, nullable): left subtree link - `e` (array): ordered TreeEntry objects - `p` (int): prefix length shared with previous entry - `k` (bytes): key suffix after prefix removal - `v` (CID): link to record data - `t` (CID, nullable): right subtree link ### Why this matters - Required for PDS implementation (issue #2) - Enables local repository verification - Enables offline repo construction/manipulation - Required for firehose commit verification ("inductive firehose" in Sync v1.1) - Needed for proper CAR file export generation
Sign in to join this conversation.
No description provided.