Input memory leak - Polysemy

Welcome to the Functional Programming Zulip Chat Archive. You can join the chat here.

Torsten Schmits
leak :: IO ()
leak =
  runM $ runInputViaStream (Stream.repeat ()) prog
  where
    prog =
      input >> prog

can someone explain why this grows unbounded, and how to avoid it?

Torsten Schmits

I see that it can be avoided by forcing evaluation of the input element, so this test case isn't representative of my real world problem. I'll have to revisit

Torsten Schmits

I'm not getting anywhere. One thing that appears quite obviously to me is that the more interpreters I remove from my problematic program, the smaller the memory growth, going down from 10MB/s to almost nothing. I assume that the interpreter state is somehow flooded, but even this trivial recursion grows:

loop ::
  Members '[Embed IO, Input (Maybe Int)] r =>
  Int ->
  Sem r ()
loop n = do
  input >>= \case
    Just !_ | n > 100000 -> do
      printCurrentMemory blah
      loop 0
    Just !_ ->
      loop (n + 1)
    _ ->
      pure ()

runInputList (repeat 100) (loop 0)

In this case, it increases only by 100 bytes per second.
Is this unavoidable, and should it scale that massively with the size of the interpreter stack?

Torsten Schmits

btw profiling shows polysemy as the main cost center

Torsten Schmits

after substituting some *> with bind, the main app now appears to be constant, while the test case still grows. weird

Georgi Lyubenov // googleson78

hm, this reminds me of the disclaimer on parser-combinators, warning that Applicative versions may leak memory, while Monad ones don't

Georgi Lyubenov // googleson78

did you also try using -fexpose-all-unfoldings, or is it unfeasible?

Georgi Lyubenov // googleson78

not sure if it will actually help, just randomly guessing

Torsten Schmits

thanks, I'll give that a look

Torsten Schmits

added that ghc option, appears to have no impact

Torsten Schmits

Torsten Schmits said:

after substituting some *> with bind, the main app now appears to be constant, while the test case still grows. weird

so that was a lie, it just looked fine in htop. measuring inside the program shows that it's growing at 5MB/s with 10 recursions/s

Georgi Lyubenov // googleson78

your current minimal repro is the "loop" thing from above?

Torsten Schmits

not really, I'm struggling to produce a sensible test case. The leak is proportional to the number of interpreters in the stack, so it's very visible in a larger program, but for the thing I posted even the version with IO grows, about 100 times slower than with Sem(I removed Input and just recursed). I would speculate that it's the calls to getRTSStats, but if I run only those it stays constant.

Maybe it's ghci, I only ran a slightly larger test case as native code. I'm going to do more analysis of my dev environment

Torsten Schmits

finally I have a reliable case where the same semi-dummy recursion with 200k loops takes 500k constant memory with a small interpreter stack, while the large stack grows up to 190M. But only when compiled to native code, in ghci they just grow at different rates :joy: ridiculous

And in ghci, the larger stack is about 10 times faster than in native, and grows to 150M :dizzy:

Now to find out which interpreter causes all that drama :upside_down:

Georgi Lyubenov // googleson78

could be the machinery itself, there was another space leak issue reported - https://github.com/polysemy-research/polysemy/issues/340

I've been using Polysemy in a larger project and have run into some memory problems. I was able to create a super simple example at https://github.com/jhgarner/Polysemy-Testing where I compare ...
Torsten Schmits

got the fucker!!

in this interpreter, I used

  atomicModify' $ over someLens (items <>)

and apparently that is lazy in the existing value in the field. So I replaced the lens with a function with strictness annotation and the problem disappeared :big_smile: Both the old value and items are empty lists btw.

now when running the entire app, the space still grows, but slower, so I assume that there's another one of those rascals lurking somewhere in there.
but at least I'm on the right track now.

thanks for coming to my show! :mic:

Georgi Lyubenov // googleson78

but wait, so it wasn't actually polysemy

Georgi Lyubenov // googleson78

was this the biggest memory leak?

Torsten Schmits

not sure yet, but pretty likely

Torsten Schmits

found another one, this time more obvious: using modify instead of modify'

Torsten Schmits

so, interestingly, when I run this program as native code with -O2, it uses constant memory, as expected:

memTestIO ::
  Word64 ->
  IO ()
memTestIO base = do
  loop (0 :: Int)
  where
    loop n = do
      when (n > 1000000) $ do
        cur <- currentMemRelIO base
        print cur
      loop (if n > 1000000 then 0 else n + 1)

But, if I replace IO with Sem and use Embed IO or Final IO, the space grows linearly.

Torsten Schmits

also when running the IO version in ghci, as said before, there's also growth.

Torsten Schmits

so I'm wondering, is this related to the issue you linked, @Georgi Lyubenov // googleson78 ?

Georgi Lyubenov // googleson78

I unfortunately have no clue or time to investigate right now :( but it is definitely worth posting about this in that issue, or a new one, because it's a huge effort to track these things down!

Torsten Schmits

I'll do that right away!

Torsten Schmits

I've noticed something that I might be using wrong.
This:

interpretFoo :: InterpreterFor Foo r
interpretFoo s = do
  tv <- newTVarIO def
  runAtomicStateTVar tv $
    interpretFooState $
    raiseUnder $
    s

main1 :: Sem r ()
main1 =
  runFinal $ interpretOthers $ interpretFoo prog

appears to perform much worse than

main2 :: Sem r ()
main2 = do
  tv <- newTVarIO def
  runFinal $ interpretOthers $ runAtomicStateTVar tv $ interpretFooState prog

by a factor of 50!

Is that me doing it wrong or is it intended to work that way?

TheMatten

Maybe Core could tell us something
BTW, thanks for investigating this! It seems like all of us are busy right now - I can try to look into this next week personally

Torsten Schmits

small update: I found the worst culprit in my memory problems story, an unbounded queue.
Now I still have a bit of leaking space, but it's relatively small, about 200kB/s with ten threads.

Torsten Schmits

seems I didn't predict that one item per second with a no-op as consumer would be too much for the queue. though maybe the stream that's feeding the queue is somehow being consumed eagerly…