Sharing structure when Tying the Knot - Haskell

Welcome to the Functional Programming Zulip Chat Archive. You can join the chat here.

James King

I'm exploring an encoding of B+ Trees that embeds some of the invariant properties of the structure in the types. A B+ Tree is an m-ary tree with a few properties but the important one here is that the leaves at the bottom of the tree are linked together like a linked-list. I've modeled it like so:

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE PolyKinds #-}

module BTree where

-- | This is used to represent the @height@ of a B+ Tree which is
-- tracked at the type level.
data Nat = Z | S Nat

data Node h k v
  = Node
  { nodeKeys     :: [k] -- ^ length is < @order@
  , nodeChildren :: [Node h k v] -- ^ length is 1 greater than keys
  }

data Leaf h k v
  = Leaf
  { leafKeys   :: [k] -- ^ length is < @order@
  , leafValues :: [v] -- ^ length is 1 greater than keys
  , leafNext   :: Maybe (Leaf h k v)
  }

-- | This representation of the B+ Tree maintains the @height@ and
-- @order@ invariants at the type level.
data BTree o h k v where
  BNode :: Ord k => Node h k v -> BTree o ('S h) k v
  -- ^ where we construct a new B+ Tree node by splitting we add a
  -- parent and increase the height of the tree to maintain the
  -- invariant
  BLeaf :: Ord k => Leaf h k v -> BTree o 'Z k v
  -- ^ A leaf node is at the "bottom" of the tree and has no height.

insert :: Ord k => k -> v -> BTree o h k v -> BTree o h k v
insert = undefined

When constructing such a tree, if on insert, we need to split a Leaf in order to maintain the order invariant, would we also need to rebuild the entire tree in order to tie the knot or is there some way to share the structure in order to maintain performance and immutability without having to pepperIORef's throughout the tree and perform all of our tree operations in IO...? :thinking:

TheMatten

Hmm, you'll probably have to rebuild leafs before yours to point to it, so that really means rebuilding big part of the tree

Hjulle

I'm pretty sure that it would be impossible to modify a cyclic immutable structure without copying everything that transitively links to the element you are modifying. Your structure is not quite cyclic, but the same still holds: everything transitively pointing to a modified cell will need to be rebuilt.

Otoh, there are probably some clever tricks one can do with laziness to get a similar performance from some similar, but not quite the same, structure. That is often the case.

James King

That's where I feel like this is heading. I could manipulate pointers but having IO in all of the operations to manipulate the IORef's would be a bit painful.

Some ad-joint structure that could maintain the linked list... :thinking:

TheMatten

I mean, I guess this layout just inherently assumes mutable references - same as with array-based binary heap

TheMatten

But you can use ST instead of IO, and now we even have linear types...

James King

I do intend to eventually provide concurrent versions of the operations on the structure so I suppose IO is unavoidable anyway.

Daniel Bramucci

This sounds like something you probably can solve with lazyness.
In particular, see Chapter 5 - Lazy Rebuilding [PDF] of Chris Okasaki's Purely Functional Datastructures thesis (it should also be in his book if you have a copy).