`streaming-bytestring` vs `streaming` `ByteString`s - Haskell

Welcome to the Functional Programming Zulip Chat Archive. You can join the chat here.

Asad Saeeduddin

Is there a reason there is a separate streaming-bytestring library with this type:

data ByteString m r
  = Empty r
  | Chunk S.ByteString (ByteString m r)
  | Go (m (ByteString m r))

as opposed to just using the following types from the basic streaming package?

data Stream f m r
  = Return r
  | Step (f (Stream f m r))
  | Effect (m (Stream f m r))

data Of a b = a :> b

type ByteString m r = Stream (Of S.ByteString) m r
Asad Saeeduddin

is there some performance optimization the specialized version can benefit from that the corresponding instantiation of the general Stream cannot?

Daniel Díaz Carrete

I'm not knowledgeable about Haskell performance optimization, but the Stream version has more memory indirections, because of the Of functor. Also, in the ByteString version, the inner bytestrings are unpacked, which I guess it can help. It would be nice if there were benchmarks comparing the performance of ByteString vs. vanilla Stream (Of ByteString).

Asad Saeeduddin

is there no equivalent F of Of S.ByteString that we can write such that Stream F performs identically to ByteString from streaming-bytestring?

Daniel Díaz Carrete

Any pluggable F would incur in memory indirections like Of; I think it's inevitable (?) given how Haskell handles polymorphism in datatypes. But again, I'm not sure how much it impacts performance.

Daniel Díaz Carrete

I've toyed with the idea of writing a streaming-chunked package very similar to streaming-bytestring, but where the "packed" inner type would be configurable by way of Backpack. conduit has useful -E suffixed versions of funtcions, like for example lengthE http://hackage.haskell.org/package/conduit-1.3.2/docs/Data-Conduit-Combinators.html#v:lengthE for working with chunked data. They abstract the chunked type using typeclasses, abstracting it using Backpack would be an interesting experiment. And it would let you maintain the "UNPACKED" in the implementation.

Asad Saeeduddin

that's interesting. i wonder if there's just theoretical limitations that make things like memory layout inherently anticompositional or if it's just hard to implement

Asad Saeeduddin

I don't know anything about performance or memory layout, so it isn't obvious to me why it has to be the case that this:

data ByteString m r
  = Empty r
  | Chunk {-#UNPACK #-} !S.ByteString (ByteString m r)
  | Go (m (ByteString m r))

is more efficient than this:

data BS r = BS {-# UNPACK #-} !S.ByteString r

data Stream f m r
  = Return r
  | Step !(f (Stream f m r))
  | Effect (m (Stream f m r))

type ByteString' m r = Stream BS m r
Asad Saeeduddin

Regarding your idea about typeclasses, do you mean something like the following?

class ChunkyStream s a m r | s -> a, s -> m, s -> r
  where
  empty :: r -> s
  chunk :: a -> s -> s
  effect :: m s -> s
  match :: (r -> x) -> (a -> s -> x) -> (m s -> x) -> s -> x
Asad Saeeduddin

Then like instance ChunkyStream (Stream (Of a) m r) a m r and instance ChunkyStream (ByteString m r) S.ByteString m r etc.

Daniel Díaz Carrete

I meant typeclasses like IsSequence in conduit http://hackage.haskell.org/package/conduit-1.3.2/docs/Data-Conduit-Combinators.html#v:takeE that define the operations a "packed" datatype like Text or ByteString might have. That way you can have a generic takeE function that works with any IsSequence instance.

About the BS datatype: that has one less indirection that Stream (Of ByteString), but the one from the Step constructor to BS itself would remain. And you wouldn't be able to reuse any function which worked over Stream (Of ByteString).

Daniel Díaz Carrete

Spurred by curiosity, I wrote a small criterion microbenchmark (my first ever use of criterion!) comparing Stream (Of ByteString) with Data.ByteString.Streaming.BytesString. I didn't find much of a difference, but maybe my benchmark is too naive. https://github.com/danidiaz/streaming-benchmarks

comparing the performance of Streams from "streaming" and ByteStrings from "streaming-bytestring" - danidiaz/streaming-benchmarks