FERRUM DB.
Engineering a high-performance, log-structured embedded storage engine from the ground up to solve the latency-security trade-off.
System Architecture
Incoming SET/GET processed via NAPI-RS (Node) or PyO3 (Python) bindings.
AES-256-GCM encryption with unique IV generation per entry.
The Architecture
Inspired by the Bitcask model, FerrumDB uses a log-structured hash table. All writes are appended to an immutable data file, while an in-memory 'KeyDir' stores the offset and size of each value for O(1) retrieval. This design ensures that disk seek times are minimized, making it ideal for write-heavy embedded workloads.
Security by Design
Implemented AES-256-GCM authenticated encryption at the storage layer. Unlike simple encryption, GCM provides both confidentiality and data integrity (AEAD), ensuring that encrypted data hasn't been tampered with at rest. Every block is cryptographically signed and verified on read.
Ferrum Studio
Built an embedded web dashboard (Axum) for real-time observability. Ferrum Studio provides a visual REPL, key-space visualization, and live metrics (OPS/sec, throughput, memory usage) directly within the binary, requiring no external processes.
The Hard Truths (Learnings)
Concurrency is hard.Managing cross-language memory safety between Rust and Python via PyO3 requires strict ownership rules. Over-abstraction leads to performance degradation; I learned to favor zero-copy pointers wherever possible.
Async Disk I/O.Using Tokio for file operations is efficient, but kernel synchronization (fsync) can become a bottleneck. I discovered that batching writes into segments significantly improves throughput by reducing syscall frequency.