Sem memory leaks - Polysemy

2020-06-25 16:26:18

I've noticed something that I might be using wrong.
This:

interpretFoo :: InterpreterFor Foo r
interpretFoo s = do
  tv <- newTVarIO def
  runAtomicStateTVar tv $
    interpretFooState $
    raiseUnder $
    s

main1 :: Sem r ()
main1 =
  runFinal $ interpretOthers $ interpretFoo prog

appears to perform much worse than

main2 :: Sem r ()
main2 = do
  tv <- newTVarIO def
  runFinal $ interpretOthers $ runAtomicStateTVar tv $ interpretFooState prog

by a factor of 50!

Is that me doing it wrong or is it intended to work that way?

TheMatten

2020-06-25 16:37:49

Maybe Core could tell us something
BTW, thanks for investigating this! It seems like all of us are busy right now - I can try to look into this next week personally

Torsten Schmits

2020-06-25 17:02:33

great!
then I shall look into the Core code.

TheMatten

2020-06-25 17:38:53

Use -ddump-simpl -dsuppress-coercions -dsuppress-idinfo -dsuppress-module-prefixes -dsuppress-ticks -dsuppress-timestamps -dsuppress-type-applications -dsuppress-uniques flags

thanks!

(Everything after -ddump-simpl is just to make output cleaner :big_smile: )

Torsten Schmits

2020-08-20 17:51:10

so, I couldn't reproduce the TVar thing in a test case, but I took a glance at Applicative…this is current master, so the fix that's supposed to eliminate performance issues is included.

Setup:

progInputMonad ::
  Member (Input (Maybe ())) r =>
  Int ->
  Sem r ()
progInputMonad limit =
  go 0
  where
    go iteration = do
      a <- input @(Maybe ())
      when (iteration < limit) $
        case a of
          Just () -> go (iteration + 1)
          Nothing -> pure ()

progInputApplicative ::
  Member (Input (Maybe ())) r =>
  Int ->
  Sem r ()
progInputApplicative limit =
  go 0
  where
    go iteration = do
      a <- input @(Maybe ())
      when (iteration < limit) $
        traverse_ (\ () -> (go (iteration + 1))) a

10k iterations, printing current memory every second, taking about 10 seconds.
The first variant that does an explicit pattern match takes 70kB of memory and grows with about 300 bytes/s.
The variant that uses Applicative grows constantly with about 500kB/s, terminating with 5MB memory.

While traverse_ has an Applicative constraint, it doesn't show up in the Core. With traverse it does, but the performance is identical.
traverse_ also uses foldr, but traverse @Maybe just pattern matches.

The Core differs like this:

              let {
                lvl3 :: m1 ()
                lvl3 = pure ww2 () } in
              ww3
                (w4 lvl)
                (\ (z :: Maybe ()) ->
                   case lvl2 of {
                     False -> lvl3;
                     True ->
                       case z of {
                         Nothing -> lvl3;
                         Just ds -> case ds of { () -> lvl1 }
                       }

traverse_:

             let {
                lvl3 :: m1 ()
                lvl3 = pure ww2 () } in
              let {
                lvl4 :: () -> m1 ()
                lvl4 = \ _ -> lvl3 } in
              ww3
                (w4 lvl)
                (\ (z :: Maybe ()) ->
                   case lvl2 of {
                     False -> lvl3;
                     True ->
                       case z of {
                         Nothing -> lvl3;
                         Just x -> ww3 (case x of { () -> lvl1 }) lvl4
                       }

so while the Applicative variant gets desugared to the same pattern match, there is another bind and two matches on (), and everything else
is identical.

Any clue where the difference in the memory footprint comes from, @TheMatten ? Are the ()s building up on the stack?

TheMatten

2020-08-20 20:07:50

Would be interesting to see with profiling enabled

Torsten Schmits

2020-08-20 20:46:57

I tried that a few times before, couldn't make any sense of it. got any tips on how to get meaningful output for this case?

Torsten Schmits

2020-08-23 10:26:55

@TheMatten :red_triangle_up:

TheMatten

2020-08-23 11:15:32