I'm exploring an encoding of B+ Trees that embeds some of the invariant properties of the structure in the types. A B+ Tree is an m-ary tree with a few properties but the important one here is that the leaves at the bottom of the tree are linked together like a linked-list. I've modeled it like so:
{-# LANGUAGE GADTs #-}{-# LANGUAGE DataKinds #-}{-# LANGUAGE PolyKinds #-}moduleBTreewhere-- | This is used to represent the @height@ of a B+ Tree which is-- tracked at the type level.dataNat=Z|SNatdataNodehkv=Node{nodeKeys::[k]-- ^ length is < @order@,nodeChildren::[Nodehkv]-- ^ length is 1 greater than keys}dataLeafhkv=Leaf{leafKeys::[k]-- ^ length is < @order@,leafValues::[v]-- ^ length is 1 greater than keys,leafNext::Maybe(Leafhkv)}-- | This representation of the B+ Tree maintains the @height@ and-- @order@ invariants at the type level.dataBTreeohkvwhereBNode::Ordk=>Nodehkv->BTreeo('Sh)kv-- ^ where we construct a new B+ Tree node by splitting we add a-- parent and increase the height of the tree to maintain the-- invariantBLeaf::Ordk=>Leafhkv->BTreeo'Zkv-- ^ A leaf node is at the "bottom" of the tree and has no height.insert::Ordk=>k->v->BTreeohkv->BTreeohkvinsert=undefined
When constructing such a tree, if on insert, we need to split a Leaf in order to maintain the order invariant, would we also need to rebuild the entire tree in order to tie the knot or is there some way to share the structure in order to maintain performance and immutability without having to pepperIORef's throughout the tree and perform all of our tree operations in IO...? :thinking:
I'm pretty sure that it would be impossible to modify a cyclic immutable structure without copying everything that transitively links to the element you are modifying. Your structure is not quite cyclic, but the same still holds: everything transitively pointing to a modified cell will need to be rebuilt.
Otoh, there are probably some clever tricks one can do with laziness to get a similar performance from some similar, but not quite the same, structure. That is often the case.
That's where I feel like this is heading. I could manipulate pointers but having IO in all of the operations to manipulate the IORef's would be a bit painful.
Some ad-joint structure that could maintain the linked list... :thinking:
This sounds like something you probably can solve with lazyness.
In particular, see Chapter 5 - Lazy Rebuilding [PDF] of Chris Okasaki's Purely Functional Datastructures thesis (it should also be in his book if you have a copy).
I'm exploring an encoding of B+ Trees that embeds some of the invariant properties of the structure in the types. A B+ Tree is an m-ary tree with a few properties but the important one here is that the leaves at the bottom of the tree are linked together like a linked-list. I've modeled it like so:
When constructing such a tree, if on
insert
, we need to split aLeaf
in order to maintain the order invariant, would we also need to rebuild the entire tree in order to tie the knot or is there some way to share the structure in order to maintain performance and immutability without having to pepperIORef
's throughout the tree and perform all of our tree operations inIO
...? :thinking:Hmm, you'll probably have to rebuild leafs before yours to point to it, so that really means rebuilding big part of the tree
I'm pretty sure that it would be impossible to modify a cyclic immutable structure without copying everything that transitively links to the element you are modifying. Your structure is not quite cyclic, but the same still holds: everything transitively pointing to a modified cell will need to be rebuilt.
Otoh, there are probably some clever tricks one can do with laziness to get a similar performance from some similar, but not quite the same, structure. That is often the case.
That's where I feel like this is heading. I could manipulate pointers but having
IO
in all of the operations to manipulate theIORef
's would be a bit painful.Some ad-joint structure that could maintain the linked list... :thinking:
I mean, I guess this layout just inherently assumes mutable references - same as with array-based binary heap
But you can use
ST
instead ofIO
, and now we even have linear types...I do intend to eventually provide concurrent versions of the operations on the structure so I suppose
IO
is unavoidable anyway.This sounds like something you probably can solve with lazyness.
In particular, see Chapter 5 - Lazy Rebuilding [PDF] of Chris Okasaki's Purely Functional Datastructures thesis (it should also be in his book if you have a copy).