Dragonfly: A Fast and Efficient In-Memory Data Store

·Aug 24, 2023·

7 min read

Cover Image for Dragonfly: A Fast and Efficient In-Memory Data Store

What is Dragonfly?
What are the features of Dragonfly?
Conclusion

Are you looking for a way to speed up your applications, reduce your infrastructure costs, and simplify your data management? If so, you might want to consider using Dragonfly, a fast and efficient in-memory data store that is compatible with Redis and Memcached APIs.

What is Dragonfly?

Dragonfly is a new in-memory data store that is designed to handle modern cloud workloads and provide high performance, scalability, efficiency, and simplicity. It is a drop-in replacement for Redis and Memcached, meaning you can migrate your existing workloads without touching your code. It also has a single binary that can be deployed via Docker, Kubernetes, or anywhere else you choose.

What are the features of Dragonfly?

Dragonfly has many features that make it a fast and efficient in-memory data store for your applications. In this section, we will explore some of the main features of Dragonfly and what they can do for you.

Redis/Memcached API-compatible

This feature means that you can use Dragonfly as a drop-in replacement for Redis or Memcached, without changing your code or infrastructure. You can use the same commands, data types, and protocols that you are familiar with, and enjoy the benefits of Dragonfly’s performance, scalability, efficiency, and simplicity. You can also use the same client libraries and tools that you use with Redis or Memcached, such as redis-cli, Memcached-tool, redis-py, etc.

Single binary

You can deploy Dragonfly easily and quickly, without any dependencies or installation steps. You can download the binary from the website or use Docker, Kubernetes, or any other platform that supports binary deployment. You can also use the Dragonfly dashboard to monitor and manage your instances.

Native monitoring

You can also monitor your Dragonfly instances natively with Prometheus, without any agent or configuration required. You can use Open Telemetry to trace and monitor your Dragonfly instances with any backend of your choice.

dragonfly-dashboard

Simple vertical scaling

You can scale your Dragonfly instances vertically up to 1 TB of memory on each instance, without the need for resharding or clustering. You can also migrate your existing Redis or Memcached workloads to Dragonfly without changing your code, as it is fully compatible with their APIs.

Non-contending, multi-threaded processing

Dragonfly allows you to handle millions of queries per second per instance, with consistent sub-millisecond latency. You can also achieve higher hit rates and better protection for your primary database than Redis. This is because Dragonfly uses a non-contending, multi-threaded processing model that eliminates lock contention and maximizes CPU utilization. It also uses a novel caching algorithm called LRFU (Least Recently/Frequently Used), which balances recency and frequency of data access to optimize cache hit rates.

Unique data structures and algorithms

Dragonfly uses unique data structures which allow you to achieve 30-60% better memory utilization than Redis. You can also run tasks asynchronously, allowing for more CPU cycles. This is because Dragonfly uses unique data structures and algorithms, such as dashtable and denseSet, to store and compress data efficiently. It also uses a parallel snapshotting algorithm that reduces memory usage and completes up to 30X faster than Redis.

Dashtable

Dashtable is a data structure that combines the advantages of hash tables and arrays. It allows for fast and constant-time access, insertion, deletion, and iteration of key-value pairs. It also supports variable-length keys and values and can store any data type.

Dashtable is the default data structure for storing strings, hashes, lists, sets, and sorted sets in Dragonfly. It uses a novel hashing algorithm that minimizes collisions and memory fragmentation. It also uses a dynamic array that grows and shrinks automatically, without wasting memory or causing performance degradation.

Dashtable has several benefits over the traditional hash table used by Redis:

It can store any data type as values, not just strings. This means that you can store complex objects such as JSON documents or images in Dashtable without serialization or deserialization overhead.
It can store variable-length keys and values, without padding or truncating them. This means that you can use any key size or value size in Dashtable without wasting memory or losing information.
It can iterate over all key-value pairs in constant time, without scanning the entire hash table. This means that you can perform operations such as scan, keys, or hgetall in Dashtable much faster than in Redis.
It can resize itself automatically based on the load factor, without blocking or copying the data. This means that you can add or remove key-value pairs in Dashtable without affecting the performance or memory usage.

DenseSet

DenseSet is a data structure that stores integers in a compact and efficient way. It uses a bit array to represent the presence or absence of integers in a range. It also uses a prefix tree to index the bit array segments and support fast operations such as union, intersection, difference, and membership testing.

DenseSet is used for storing bitmaps and hyperloglogs in Dragonfly. It can achieve up to 10X better compression than Redis for sparse bitmaps. It also supports fast cardinality estimation and probabilistic counting with hyperloglogs.

DenseSet has several benefits over the traditional bitmap used by Redis:

It can store integers in any range, not just from 0 to 2^32 - 1. This means that you can use any integer value in DenseSet without worrying about the range or overflow.
It can compress the bit array segments based on the density of the integers. This means that you can use less memory for storing sparse bitmaps in DenseSet than in Redis.
It can index the bit array segments using a prefix tree, which allows for fast and easy access to any segment. This means that you can perform operations such as union, intersection, difference, or membership testing in DenseSet much faster than in Redis.
It can estimate the cardinality of the integers using hyperloglogs, which are probabilistic data structures that use a small amount of memory. This means that you can count the number of unique integers in DenseSet without scanning the entire bit array.

LRFU

LRFU is a caching algorithm that balances recency and frequency of data access to optimize cache hit rates. It assigns a weight to each item in the cache, based on how recently and frequently it was accessed. It then evicts the item with the lowest weight when the cache is full.

LRFU is used for managing the memory usage of Dragonfly. It allows Dragonfly to keep the most relevant data in memory and reduce the load on the primary database. It also adapts to different workloads and patterns, such as temporal locality, spatial locality, Zipfian distribution, etc.

memory usage chart

LRFU has several benefits over the traditional LRU (Least Recently Used) algorithm used by Redis:

It considers both recency and frequency of data access, not just recency. This means that it can keep both hot and cold data in memory, depending on their popularity and importance.
It assigns a weight to each item based on a decay function, not just a timestamp. This means that it can adjust the weight of each item dynamically, based on its access history and current status.
It evicts the item with the lowest weight, not just the oldest item. This means that it can evict the least relevant item from the cache, regardless of its age or position.

Parallel snapshotting

Parallel snapshotting is a feature that allows Dragonfly to take consistent backups of its data without blocking or slowing down the operations. It uses a parallel algorithm that divides the data into segments and writes them to disk concurrently. It also uses copy-on-write semantics to avoid locking or copying the data during snapshotting.

Parallel snapshotting is used for persisting and restoring the data of Dragonfly. It can reduce memory usage and complete up to 30X faster than Redis during snapshotting. It also supports incremental backups and point-in-time recovery.

Parallel snapshotting has several benefits over the traditional fork-and-copy method used by Redis:

It does not fork a child process to perform snapshotting, which can consume a lot of memory and CPU resources. This means that it does not affect the performance or availability of Dragonfly during snapshotting.
It does not copy the entire data set to disk, which can take a long time and cause disk I/O bottlenecks. This means that it can complete snapshotting much faster and more predictably than Redis.
It can ensure data consistency and durability across snapshots. This means that it can recover from data loss or corruption events with minimal data loss.

Conclusion

Dragonfly is a fast and efficient in-memory data store that is compatible with Redis and Memcached APIs. It offers high performance, scalability, efficiency, and simplicity for your applications. You can use it as a drop-in replacement for Redis or Memcached, without changing your code or infrastructure. You can also benefit from its unique features, such as dashtable, denseSet, LRFU, and parallel snapshotting.

If you are interested in trying out Dragonfly, you can check:

Kunal Kushwaha