A list of recent open-source research systems developed by the research group of Professor David G. Andersen and Dr. Michael Kaminsky in the Carnegie Mellon University Computer Science Department.
Memory efficient and cache-optimized algorithm for simultaneously searching for a large number of patterns in a very large corpus [SIAM ALENEX 2011]
Rank & select structures that use a cache-centric design approach to achieve performance competitive with the highest-performance prior designs while imposing space overhead as low as the most space-efficient, but slower, prior designs [SEA 2013]
Data structure that can replace Bloom filters for approximate set membership tests, supporting dynamic item addition and removal items while achieving even higher performance than Bloom filters [CoNEXT 2014]
High-throughput and memory-efficient concurrent hash table that supports multiple readers and writers [EuroSys 2014]
Fast and compact data structure for approximate membership tests that supports both single-key lookups and common range queries [SIGMOD 2018]
Live interactive demo (source code)
Fast dictionary-based compressor that encodes arbitrary keys while preserving their order [SIGMOD 2020]
Distributed consensus algorithm based on Paxos that efficiently achieves optimal commit latency in the wide-area when tolerating one and two failures, uniform load balancing across all replicas, and graceful performance degradation when replicas are slow or crash [SOSP 2013]
Technique that allows Paxos-based systems to perform reads with high throughput and low latency without sacrificing consistency or write latency or accepting more than minimal impact on system availability [CMU PDL 2014]
Benchmark demonstrating that cache partitioning can protect the performance of latency-sensitive networked applications from local resource contention [CMU CSD 2017]
Benchmark demonstrating a likely explanation for contemporary serverless platforms' high microservice invocation latencies in the hundreds of milliseconds for cold starts and tens of milliseconds for warm starts [ATC 2018]
Program primitives for limiting function execution time and isolating memory within a process
Mechanism for synchronously performing a function call with a precise timeout that is lightweight, efficient, and composable, all while being portable between programming languages [ATC 2020]
libturquoise (preemptive tokio-threadpool)
Set of architecturally and workload inspired algorithmic and engineering improvements to the popular Memcached system that substantially improve both its memory efficiency and throughput [NSDI 2013]
Key-value system designed to make the best use of an RDMA network by reducing network round trips while using efficient RDMA primitives [SIGCOMM 14]
Scalable in-memory key-value store that achieves a record-setting throughput of 120 million requests per second (MRPS) on a single commodity server, 9.2X the performance (RPS) and 2.8X the system energy efficiency (RPS/watt) of the best-published FPGA-based claims [NSDI 2014, ISCA 2015]
Version presented in NSDI paper
Version presented in ISCA paper
Updated implementation (MICA 2)
Single-node multi-core in-memory transactional database with serializability that provides high performance under diverse workloads by leveraging optimistic and multi-version concurrency control schemes and multiple loosely synchronized clocks [SIGMOD 2017]
Systems design proposals for consistent, durable, and safe memory management for future byte-addressable non-volatile (NV) memory [CMU PDL 2011]
New analytic primitives and MSLS design models that quickly give accurate performance estimates for evaluation of systems such as LevelDB, RocksDB, HBase, and Cassandra [FAST 2016]
Software-based Ethernet switch design built around a memory-efficient, high-performance, and highly-concurrent hash table for compact and fast FIB lookup [CoNEXT 2013]
Exploration of a hypothesis that many of the benefits of using Graphics Processing Units (GPUs) as accelerators for software-based routing and packet handling applications arise less from the GPU hardware itself as from the expression of the problem in a language such as CUDA or OpenCL that facilitates memory latency hiding and vectorization through massive concurrency [NSDI 2015]
Framework for understanding RDMA performance and helping system designers to navigate the RDMA design space [ATC 2016]
RDMA-based system that provides distributed in-memory transactions with serializability and durability, yet eschews one-sided RDMA for fast RPCs using two-sided unreliable datagrams [OSDI 2016]