Henri's Blog - Introducing Haskey (Summer of Haskell 2017)

This blog post introduces the Haskey project, an ACID compliant embedded key-value store entirely written in Haskell, based on an MVCC B+-tree implementation. It was developed as part of the Summer of Haskell 2017. We will take a look at two libraries that we wrote this summer.

Before introducing the libraries, I must say I’m extremely grateful that I got the opportunity to be part of this year’s Summer of Haskell. I had really great fun working on this project for the last two months, even though they were incredibly challenging for me, as I had very little experience with the internals of database systems. This is entirely thanks to Steven Keuchel who not only helped me and guided me during the challenging moments, but also made sure we were able to bond outside of the project. Furthermore, I’d like to thank the people who organized SoH 2017, not only because of the opportunities they have created for me, but also because of all the work they have done. I especially want to thank Jasper, who helped me a lot with the necessary paperwork

~~Disclaimer: Haskey is not production ready yet. We are still actively making changes to the public API and the internals, as well as the binary format.~~

EDIT (9-Apr-2018): Haskey is now considered stable.

EDIT (9-Apr-2018): Note that this document gives a good overview on how the library is structured, but might use outdated APIs. See this up to date tutorial, which covers a lot of the content in this article, to get started with Haskey.

Haskey’s goal and design

The goal of Haskey is to deliver a space efficient, fast and scalable key-value store to the Haskell community, which can be embedded into Haskell projects. To achieve this, we modeled Haskey after LMDB. Our design is based on an MVCC B+-tree, which allows us to provide non-blocking concurrent read access, and serialized write access, where readers and writers do not block each other.

The past two months we have implemented this design, which resulted in two libraries, available on Hackage. The first is haskey-btree, which implements both a copy-in-place and copy-on-write B+-tree. The copy-on-write variant abstracts over an allocator, which is able to write and read B+tree nodes on demand. The second package is called haskey. This package implements such an allocator. It allocates and manages pages to write the nodes to, and it provides transaction support and concurrent access, as well as data integrity. haskey-btree modifies the tree, while haskey supports tree modifications.

The B+-tree implementation

The haskey-btree package includes two separate implementations of B+-trees. The first one can be found in the Data.BTree.Pure module, which contains a pure, standard, textbook implementation of an in-memory B+-tree of a certain minimum degree. It is a standalone implementation, not used anywhere else in the code base, but still a nice thing to have. More interesting, however, is the copy-on-write B+-tree implementation in Data.BTree.Impure.

If we look at some functions in Data.BTree.Impure that are used to modify and query the tree, we notice the following type signatures:

lookupTree :: (AllocReaderM m, Key key, Value val)
          => key
          -> Tree key val-> m (Maybe val)

insertTree :: (AllocM m, Key key, Value val)
            => key
            -> val
            -> Tree key val
            -> m (Tree key val)

deleteTree :: (AllocM m, Key key, Value val)
           => key
           -> Tree key val
           -> m (Tree key val)

We can see that the functions operate in either the AllocReaderM or the AllocM monad. Let’s take a closer look at the definitions of these type classes:

class Monad m => AllocReaderM m where
  readNode :: (Key key, Value val)
           => Height height
           -> NodeId height key val
           -> m (Node height key val)

  readOverflow :: Value val => OverflowId -> m val

-- Not all functions included...
class AllocReaderM m => AllocM m where
  allocNode :: (Key key, Value val)
            => Height height
            -> Node height key val
            -> m (NodeId height key val)
  freeNode :: Height height -> NodeId height key val -> m ()

  allocOverflow :: Value val => val -> m OverflowId
  freeOverflow :: OverflowId -> m ()

We can see that the AllocReaderM type class supports operations to read B+-tree nodes and overflow values with a certain ID, while the AllocM type class supports operations to write and free B+-tree nodes and overflow values. How do we get such an allocator? Well, the Date.BTree.Alloc.Debug module contains an in-memory allocator based on Data.Map. It is very inefficient, and solely intended for testing and debugging purposes, but it provides a good example on how to start implementing an allocator from scratch.

import Data.BTree.Alloc.Debug (Pages, runDebugT, emptyPages)
import Data.BTree.Impure (Tree, empty, insertTree)

debugAllocatorExample :: (Tree ByteString ByteString, Pages)
debugAllocatorExample = runDebugT emptyPages $ do
      return empty
  >>= insertTree "key1" "wrong value"
  >>= insertTree "key1" "correct value"
  >>= insertTree "key2" "other value"

In the code listing above you can see the debug allocator in action. For a more robust, space efficient and thread-safe allocator we have to turn to the haskey library.

Robust, thread-safe allocator

The haskey library builds upon the haskey-btree library by providing a robust, thread-safe page allocator with transaction support. It allows for multiple concurrent readers which do not block each other, and serialized write access. Furthermore, readers do not block writers, and vice versa.

The actual allocator is implemented in an internal module called Database.Haskey.Alloc.Concurrent.Monad, but you don’t really need to know about it. The exported functions in Database.Haskey.Alloc.Concurrent use this monad internally, and they are sufficient to use the allocator. The following functions are particularly interesting:

-- Simplified type signatures...
createConcurrentDb :: (ConcurrentMetaStoreM m, Key k, Value v)
                   => ConcurrentHandles
                   -> m (ConcurrentDb k v)

openConcurrentDb :: (Key k, Value v, ConcurrentMetaStoreM m)
                 => ConcurrentHandles
                 -> m (Maybe (ConcurrentDb k v))

transact :: (ConcurrentMetaStoreM m, Key k, Value v)
         => (forall n. AllocM n => Tree key val -> n (Transaction key val a))
         -> ConcurrentDb key val
         -> m a

transactReadOnly :: (ConcurrentMetaStoreM m, Key k, Value v)
                 => (forall n. AllocReaderM n => Tree key val -> n a)
                 -> ConcurrentDb key val
                 -> m a

The openConcurrentDb and createConcurrentDb functions are used to create new and open existing databases. While the transact and the transactReadOnly functions are used to start read-write and read-only transactions.

We can see that the transact and transactReadOnly functions take a function that takes a tree root and can do AllocM and AllocReaderM actions. Exactly what we need to run tree modifications and queries from Data.BTree.Impure! Excellent!

But there is one more catch! We need to run these functions in a ConcurrentMetaStoreM! Where can we find such a monad?

The storage back-end

Let’s take a closer look to the ConcurrentMetaStoreM type class. The functions in the type class aren’t all that interesting, but the superclass is! It’s the StoreM class:

class StoreM FilePath m => ConcurrentMetaStoreM m where
  -- Omitted type class body...

-- Simplified type signatures, not all functions included...
class Monad m => StoreM hnd m | m -> hnd where
  putNodePage :: (Key key, Value val)
              => hnd
              -> Height height
              -> NodeId height key val
              -> Node height key val
              -> m ()

  getNodePage :: (Key key, Value val)
              => hnd
              -> Height height
              -> NodeId height key val
              -> m (Node height key val)

  putOverflow :: Value val => hnd -> val m ()
  getOverflow :: Value val => hnd -> m val

We can see that the StoreM type class simply abstracts over a storage back-end that can read and write pages and overflow values. There are two built-in storage back-ends. One is the InMemory storage back-end, which we’ll let you discover by yourself. The other one is the on-disk store in Database.Haskey.Store.File. Let’s look at the interesting functions.

newFileStore :: IO (Files fp)

runFileStoreT :: FileStoreT fp m a -- ^ Database actions
              -> FileStoreConfig   -- ^ Database configuration
              -> Files fp          -- ^ Open database files
              -> m a

The newFileStore function creates a new uninitialized state for the file storage back-end. This is a simple IORef that can safely be accessed by concurrent threads. Those threads can safely call runFileStoreT with that state.

The runFileStoreT function takes a FileStoreT action. Lucky for us, there is an instance of AllocM and ConcurrentMetaStoreM for FileStoreT ! This allows us to run our beloved transactions.

Full code example

We now have all the pieces to stitch together an example. Let the code do the talking! Example code is also available on GitHub ¹.

module Main where

import Control.Concurrent.Async (async, wait)
import Control.Monad (void, replicateM)

import Data.BTree.Impure (toList, insertTree)
import Data.ByteString (ByteString)
import Data.Int (Int32)
import Data.Text.Encoding (encodeUtf8)
import qualified Data.Text as Text

import Database.Haskey.Alloc.Concurrent (ConcurrentDb,
                                         ConcurrentHandles,
                                         concurrentHandles,
                                         openConcurrentDb,
                                         createConcurrentDb,
                                         transact_,
                                         transactReadOnly,
                                         commit_)
import Database.Haskey.Store.File (FileStoreT, Files, newFileStore,
                                   runFileStoreT, defFileStoreConfig)

main :: IO ()
main = do
    store <- newFileStore
    db    <- openOrCreate store

    writers <- mapM (async . writer store db) [1..100]
    readers <- replicateM 100 $ async (reader store db)
    mapM_ wait writers
    mapM_ wait readers
    putStrLn "Done"


writer :: Files FilePath
       -> ConcurrentDb Int32 ByteString
       -> Int32
       -> IO ()
writer store db i =
    runDatabase store $ transact_ tx db
  where
    bs = encodeUtf8 $ Text.pack (show i)

    tx tree = insertTree i bs tree >>= commit_

reader :: Files FilePath
       -> ConcurrentDb Int32 ByteString
       -> IO ()
reader files db = void $ replicateM 100 $ runDatabase files $
    transactReadOnly toList db

openOrCreate :: Files FilePath
             -> IO (ConcurrentDb Int32 ByteString)
openOrCreate store = runDatabase store $ do
    maybeDb <- openConcurrentDb handles
    case maybeDb of
        Nothing -> createConcurrentDb handles
        Just db -> return db

runDatabase :: Files FilePath
            -> FileStoreT FilePath m a
            -> m a
runDatabase files action = runFileStoreT action defFileStoreConfig files

handles :: ConcurrentHandles
handles = concurrentHandles "example-database.haskey"

Note that at the moment concurrent readers are disabled, because of a race condition in the code. However, the code example will work.↩