I'm not knowledgeable about Haskell performance optimization, but the Stream version has more memory indirections, because of the Of functor. Also, in the ByteString version, the inner bytestrings are unpacked, which I guess it can help. It would be nice if there were benchmarks comparing the performance of ByteString vs. vanilla Stream (Of ByteString).
Any pluggable F would incur in memory indirections like Of; I think it's inevitable (?) given how Haskell handles polymorphism in datatypes. But again, I'm not sure how much it impacts performance.
I've toyed with the idea of writing a streaming-chunked package very similar to streaming-bytestring, but where the "packed" inner type would be configurable by way of Backpack. conduit has useful -E suffixed versions of funtcions, like for example lengthEhttp://hackage.haskell.org/package/conduit-1.3.2/docs/Data-Conduit-Combinators.html#v:lengthE for working with chunked data. They abstract the chunked type using typeclasses, abstracting it using Backpack would be an interesting experiment. And it would let you maintain the "UNPACKED" in the implementation.
that's interesting. i wonder if there's just theoretical limitations that make things like memory layout inherently anticompositional or if it's just hard to implement
I don't know anything about performance or memory layout, so it isn't obvious to me why it has to be the case that this:
data ByteString m r
= Empty r
| Chunk {-#UNPACK #-} !S.ByteString (ByteString m r)
| Go (m (ByteString m r))
is more efficient than this:
data BS r = BS {-# UNPACK #-} !S.ByteString r
data Stream f m r
= Return r
| Step !(f (Stream f m r))
| Effect (m (Stream f m r))
type ByteString' m r = Stream BS m r
About the BS datatype: that has one less indirection that Stream (Of ByteString), but the one from the Step constructor to BS itself would remain. And you wouldn't be able to reuse any function which worked over Stream (Of ByteString).
Spurred by curiosity, I wrote a small criterion microbenchmark (my first ever use of criterion!) comparing Stream (Of ByteString) with Data.ByteString.Streaming.BytesString. I didn't find much of a difference, but maybe my benchmark is too naive. https://github.com/danidiaz/streaming-benchmarks
Is there a reason there is a separate
streaming-bytestring
library with this type:as opposed to just using the following types from the basic
streaming
package?is there some performance optimization the specialized version can benefit from that the corresponding instantiation of the general
Stream
cannot?I'm not knowledgeable about Haskell performance optimization, but the
Stream
version has more memory indirections, because of theOf
functor. Also, in theByteString
version, the inner bytestrings are unpacked, which I guess it can help. It would be nice if there were benchmarks comparing the performance ofByteString
vs. vanillaStream (Of ByteString)
.is there no equivalent
F
ofOf S.ByteString
that we can write such thatStream F
performs identically toByteString
fromstreaming-bytestring
?Any pluggable
F
would incur in memory indirections likeOf
; I think it's inevitable (?) given how Haskell handles polymorphism in datatypes. But again, I'm not sure how much it impacts performance.I've toyed with the idea of writing a streaming-chunked package very similar to streaming-bytestring, but where the "packed" inner type would be configurable by way of Backpack. conduit has useful -E suffixed versions of funtcions, like for example
lengthE
http://hackage.haskell.org/package/conduit-1.3.2/docs/Data-Conduit-Combinators.html#v:lengthE for working with chunked data. They abstract the chunked type using typeclasses, abstracting it using Backpack would be an interesting experiment. And it would let you maintain the "UNPACKED" in the implementation.that's interesting. i wonder if there's just theoretical limitations that make things like memory layout inherently anticompositional or if it's just hard to implement
I don't know anything about performance or memory layout, so it isn't obvious to me why it has to be the case that this:
is more efficient than this:
Regarding your idea about typeclasses, do you mean something like the following?
Then like
instance ChunkyStream (Stream (Of a) m r) a m r
andinstance ChunkyStream (ByteString m r) S.ByteString m r
etc.I meant typeclasses like
IsSequence
in conduit http://hackage.haskell.org/package/conduit-1.3.2/docs/Data-Conduit-Combinators.html#v:takeE that define the operations a "packed" datatype like Text or ByteString might have. That way you can have a generictakeE
function that works with anyIsSequence
instance.About the
BS
datatype: that has one less indirection thatStream (Of ByteString)
, but the one from theStep
constructor toBS
itself would remain. And you wouldn't be able to reuse any function which worked overStream (Of ByteString)
.Spurred by curiosity, I wrote a small criterion microbenchmark (my first ever use of criterion!) comparing
Stream (Of ByteString)
withData.ByteString.Streaming.BytesString
. I didn't find much of a difference, but maybe my benchmark is too naive. https://github.com/danidiaz/streaming-benchmarks