This blog post introduces the Haskey project, an ACID compliant embedded key-value store entirely written in Haskell, based on an MVCC B+-tree implementation. It was developed as part of the Summer of Haskell 2017. We will take a look at two libraries that we wrote this summer.
Before introducing the libraries, I must say I’m extremely grateful that I got the opportunity to be part of this year’s Summer of Haskell. I had really great fun working on this project for the last two months, even though they were incredibly challenging for me, as I had very little experience with the internals of database systems. This is entirely thanks to Steven Keuchel who not only helped me and guided me during the challenging moments, but also made sure we were able to bond outside of the project. Furthermore, I’d like to thank the people who organized SoH 2017, not only because of the opportunities they have created for me, but also because of all the work they have done. I especially want to thank Jasper, who helped me a lot with the necessary paperwork
Disclaimer: Haskey is not production ready yet. We are still actively making changes to the public API and the internals, as well as the binary format.
EDIT (9-Apr-2018): Haskey is now considered stable.
EDIT (9-Apr-2018): Note that this document gives a good overview on how the library is structured, but might use outdated APIs. See this up to date tutorial, which covers a lot of the content in this article, to get started with Haskey.
Haskey’s goal and design
The goal of Haskey is to deliver a space efficient, fast and scalable key-value store to the Haskell community, which can be embedded into Haskell projects. To achieve this, we modeled Haskey after LMDB. Our design is based on an MVCC B+-tree, which allows us to provide non-blocking concurrent read access, and serialized write access, where readers and writers do not block each other.
The past two months we have implemented this design, which resulted in two libraries, available on Hackage. The first is haskey-btree, which implements both a copy-in-place and copy-on-write B+-tree. The copy-on-write variant abstracts over an allocator, which is able to write and read B+tree nodes on demand. The second package is called haskey. This package implements such an allocator. It allocates and manages pages to write the nodes to, and it provides transaction support and concurrent access, as well as data integrity. haskey-btree modifies the tree, while haskey supports tree modifications.
The B+-tree implementation
The haskey-btree package includes two separate implementations of B+-trees. The first one can be found in the Data.BTree.Pure module, which contains a pure, standard, textbook implementation of an in-memory B+-tree of a certain minimum degree. It is a standalone implementation, not used anywhere else in the code base, but still a nice thing to have. More interesting, however, is the copy-on-write B+-tree implementation in Data.BTree.Impure.
If we look at some functions in Data.BTree.Impure that are used to modify and query the tree, we notice the following type signatures:
class Monad m => AllocReaderM m where readNode :: (Key key, Value val) => Height height -> NodeId height key val -> m (Node height key val) readOverflow :: Value val => OverflowId -> m val -- Not all functions included... class AllocReaderM m => AllocM m where allocNode :: (Key key, Value val) => Height height -> Node height key val -> m (NodeId height key val) freeNode :: Height height -> NodeId height key val -> m () allocOverflow :: Value val => val -> m OverflowId freeOverflow :: OverflowId -> m ()
We can see that the
AllocReaderM type class supports operations to read B+-tree nodes and overflow values with a certain ID, while the
AllocM type class supports operations to write and free B+-tree nodes and overflow values. How do we get such an allocator? Well, the Date.BTree.Alloc.Debug module contains an in-memory allocator based on Data.Map. It is very inefficient, and solely intended for testing and debugging purposes, but it provides a good example on how to start implementing an allocator from scratch.
import Data.BTree.Alloc.Debug (Pages, runDebugT, emptyPages) import Data.BTree.Impure (Tree, empty, insertTree) debugAllocatorExample :: (Tree ByteString ByteString, Pages) debugAllocatorExample = runDebugT emptyPages $ do return empty >>= insertTree "key1" "wrong value" >>= insertTree "key1" "correct value" >>= insertTree "key2" "other value"
In the code listing above you can see the debug allocator in action. For a more robust, space efficient and thread-safe allocator we have to turn to the haskey library.
Robust, thread-safe allocator
The haskey library builds upon the haskey-btree library by providing a robust, thread-safe page allocator with transaction support. It allows for multiple concurrent readers which do not block each other, and serialized write access. Furthermore, readers do not block writers, and vice versa.
The actual allocator is implemented in an internal module called Database.Haskey.Alloc.Concurrent.Monad, but you don’t really need to know about it. The exported functions in Database.Haskey.Alloc.Concurrent use this monad internally, and they are sufficient to use the allocator. The following functions are particularly interesting:
-- Simplified type signatures... createConcurrentDb :: (ConcurrentMetaStoreM m, Key k, Value v) => ConcurrentHandles -> m (ConcurrentDb k v) openConcurrentDb :: (Key k, Value v, ConcurrentMetaStoreM m) => ConcurrentHandles -> m (Maybe (ConcurrentDb k v)) transact :: (ConcurrentMetaStoreM m, Key k, Value v) => (forall n. AllocM n => Tree key val -> n (Transaction key val a)) -> ConcurrentDb key val -> m a transactReadOnly :: (ConcurrentMetaStoreM m, Key k, Value v) => (forall n. AllocReaderM n => Tree key val -> n a) -> ConcurrentDb key val -> m a
createConcurrentDb functions are used to create new and open existing databases. While the
transact and the
transactReadOnly functions are used to start read-write and read-only transactions.
We can see that the
transactReadOnly functions take a function that takes a tree root and can do
AllocReaderM actions. Exactly what we need to run tree modifications and queries from Data.BTree.Impure! Excellent!
But there is one more catch! We need to run these functions in a
ConcurrentMetaStoreM! Where can we find such a monad?
The storage back-end
class StoreM FilePath m => ConcurrentMetaStoreM m where -- Omitted type class body... -- Simplified type signatures, not all functions included... class Monad m => StoreM hnd m | m -> hnd where putNodePage :: (Key key, Value val) => hnd -> Height height -> NodeId height key val -> Node height key val -> m () getNodePage :: (Key key, Value val) => hnd -> Height height -> NodeId height key val -> m (Node height key val) putOverflow :: Value val => hnd -> val m () getOverflow :: Value val => hnd -> m val
We can see that the
StoreM type class simply abstracts over a storage back-end that can read and write pages and overflow values. There are two built-in storage back-ends. One is the InMemory storage back-end, which we’ll let you discover by yourself. The other one is the on-disk store in Database.Haskey.Store.File. Let’s look at the interesting functions.
newFileStore function creates a new uninitialized state for the file storage back-end. This is a simple
IORef that can safely be accessed by concurrent threads. Those threads can safely call
runFileStoreT with that state.
Full code example
module Main where import Control.Concurrent.Async (async, wait) import Control.Monad (void, replicateM) import Data.BTree.Impure (toList, insertTree) import Data.ByteString (ByteString) import Data.Int (Int32) import Data.Text.Encoding (encodeUtf8) import qualified Data.Text as Text import Database.Haskey.Alloc.Concurrent (ConcurrentDb, ConcurrentHandles, concurrentHandles, openConcurrentDb, createConcurrentDb, transact_, transactReadOnly, commit_) import Database.Haskey.Store.File (FileStoreT, Files, newFileStore, runFileStoreT, defFileStoreConfig) main :: IO () main = do store <- newFileStore db <- openOrCreate store writers <- mapM (async . writer store db) [1..100] readers <- replicateM 100 $ async (reader store db) mapM_ wait writers mapM_ wait readers putStrLn "Done" writer :: Files FilePath -> ConcurrentDb Int32 ByteString -> Int32 -> IO () writer store db i = runDatabase store $ transact_ tx db where bs = encodeUtf8 $ Text.pack (show i) tx tree = insertTree i bs tree >>= commit_ reader :: Files FilePath -> ConcurrentDb Int32 ByteString -> IO () reader files db = void $ replicateM 100 $ runDatabase files $ transactReadOnly toList db openOrCreate :: Files FilePath -> IO (ConcurrentDb Int32 ByteString) openOrCreate store = runDatabase store $ do maybeDb <- openConcurrentDb handles case maybeDb of Nothing -> createConcurrentDb handles Just db -> return db runDatabase :: Files FilePath -> FileStoreT FilePath m a -> m a runDatabase files action = runFileStoreT action defFileStoreConfig files handles :: ConcurrentHandles handles = concurrentHandles "example-database.haskey"
Note that at the moment concurrent readers are disabled, because of a race condition in the code. However, the code example will work.↩