Maybe Core could tell us something
BTW, thanks for investigating this! It seems like all of us are busy right now - I can try to look into this next week personally
so, I couldn't reproduce the TVar thing in a test case, but I took a glance at Applicative…this is current master, so the fix that's supposed to eliminate performance issues is included.
10k iterations, printing current memory every second, taking about 10 seconds.
The first variant that does an explicit pattern match takes 70kB of memory and grows with about 300 bytes/s.
The variant that uses Applicative grows constantly with about 500kB/s, terminating with 5MB memory.
While traverse_ has an Applicative constraint, it doesn't show up in the Core. With traverse it does, but the performance is identical. traverse_ also uses foldr, but traverse @Maybe just pattern matches.
so while the Applicative variant gets desugared to the same pattern match, there is another bind and two matches on (), and everything else
is identical.
Any clue where the difference in the memory footprint comes from, @TheMatten ? Are the ()s building up on the stack?
I've noticed something that I might be using wrong.
This:
appears to perform much worse than
by a factor of 50!
Is that me doing it wrong or is it intended to work that way?
Maybe Core could tell us something
BTW, thanks for investigating this! It seems like all of us are busy right now - I can try to look into this next week personally
great!
then I shall look into the Core code.
Use
-ddump-simpl -dsuppress-coercions -dsuppress-idinfo -dsuppress-module-prefixes -dsuppress-ticks -dsuppress-timestamps -dsuppress-type-applications -dsuppress-uniques
flagsthanks!
(Everything after
-ddump-simpl
is just to make output cleaner :big_smile: )so, I couldn't reproduce the
TVar
thing in a test case, but I took a glance atApplicative
…this is current master, so the fix that's supposed to eliminate performance issues is included.Setup:
10k iterations, printing current memory every second, taking about 10 seconds.
The first variant that does an explicit pattern match takes 70kB of memory and grows with about 300 bytes/s.
The variant that uses
Applicative
grows constantly with about 500kB/s, terminating with 5MB memory.While
traverse_
has anApplicative
constraint, it doesn't show up in theCore
. Withtraverse
it does, but the performance is identical.traverse_
also usesfoldr
, buttraverse @Maybe
just pattern matches.The Core differs like this:
traverse_
:so while the
Applicative
variant gets desugared to the same pattern match, there is another bind and two matches on()
, and everything elseis identical.
Any clue where the difference in the memory footprint comes from, @TheMatten ? Are the
()
s building up on the stack?Would be interesting to see with profiling enabled
I tried that a few times before, couldn't make any sense of it. got any tips on how to get meaningful output for this case?
@TheMatten :red_triangle_up:
You could maybe try
hs-speedscope
to get some readable outputthanks!
profiteur also has nice output