Sharing structure when Tying the Knot

2020-10-02 16:19:35

I'm exploring an encoding of B+ Trees that embeds some of the invariant properties of the structure in the types. A B+ Tree is an m-ary tree with a few properties but the important one here is that the leaves at the bottom of the tree are linked together like a linked-list. I've modeled it like so:

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE PolyKinds #-}

module BTree where

-- | This is used to represent the @height@ of a B+ Tree which is
-- tracked at the type level.
data Nat = Z | S Nat

data Node h k v
  = Node
  { nodeKeys     :: [k] -- ^ length is < @order@
  , nodeChildren :: [Node h k v] -- ^ length is 1 greater than keys
  }

data Leaf h k v
  = Leaf
  { leafKeys   :: [k] -- ^ length is < @order@
  , leafValues :: [v] -- ^ length is 1 greater than keys
  , leafNext   :: Maybe (Leaf h k v)
  }

-- | This representation of the B+ Tree maintains the @height@ and
-- @order@ invariants at the type level.
data BTree o h k v where
  BNode :: Ord k => Node h k v -> BTree o ('S h) k v
  -- ^ where we construct a new B+ Tree node by splitting we add a
  -- parent and increase the height of the tree to maintain the
  -- invariant
  BLeaf :: Ord k => Leaf h k v -> BTree o 'Z k v
  -- ^ A leaf node is at the "bottom" of the tree and has no height.

insert :: Ord k => k -> v -> BTree o h k v -> BTree o h k v
insert = undefined

When constructing such a tree, if on insert, we need to split a Leaf in order to maintain the order invariant, would we also need to rebuild the entire tree in order to tie the knot or is there some way to share the structure in order to maintain performance and immutability without having to pepperIORef's throughout the tree and perform all of our tree operations in IO...? :thinking:

TheMatten

2020-10-02 16:51:44

Hmm, you'll probably have to rebuild leafs before yours to point to it, so that really means rebuilding big part of the tree

Hjulle

2020-10-02 20:12:46

I'm pretty sure that it would be impossible to modify a cyclic immutable structure without copying everything that transitively links to the element you are modifying. Your structure is not quite cyclic, but the same still holds: everything transitively pointing to a modified cell will need to be rebuilt.

Otoh, there are probably some clever tricks one can do with laziness to get a similar performance from some similar, but not quite the same, structure. That is often the case.

James King

2020-10-02 20:56:27

That's where I feel like this is heading. I could manipulate pointers but having IO in all of the operations to manipulate the IORef's would be a bit painful.

Some ad-joint structure that could maintain the linked list... :thinking:

TheMatten

2020-10-03 07:16:23

I mean, I guess this layout just inherently assumes mutable references - same as with array-based binary heap

TheMatten

2020-10-03 07:16:58

But you can use ST instead of IO, and now we even have linear types...

James King

2020-10-05 16:29:03

I do intend to eventually provide concurrent versions of the operations on the structure so I suppose IO is unavoidable anyway.

Daniel Bramucci

2020-10-09 22:48:54

This sounds like something you probably can solve with lazyness.
In particular, see Chapter 5 - Lazy Rebuilding [PDF] of Chris Okasaki's Purely Functional Datastructures thesis (it should also be in his book if you have a copy).

Sharing structure when Tying the Knot - Haskell