I see that it can be avoided by forcing evaluation of the input element, so this test case isn't representative of my real world problem. I'll have to revisit
I'm not getting anywhere. One thing that appears quite obviously to me is that the more interpreters I remove from my problematic program, the smaller the memory growth, going down from 10MB/s to almost nothing. I assume that the interpreter state is somehow flooded, but even this trivial recursion grows:
In this case, it increases only by 100 bytes per second.
Is this unavoidable, and should it scale that massively with the size of the interpreter stack?
not really, I'm struggling to produce a sensible test case. The leak is proportional to the number of interpreters in the stack, so it's very visible in a larger program, but for the thing I posted even the version with IO grows, about 100 times slower than with Sem(I removed Input and just recursed). I would speculate that it's the calls to getRTSStats, but if I run only those it stays constant.
Maybe it's ghci, I only ran a slightly larger test case as native code. I'm going to do more analysis of my dev environment
finally I have a reliable case where the same semi-dummy recursion with 200k loops takes 500k constant memory with a small interpreter stack, while the large stack grows up to 190M. But only when compiled to native code, in ghci they just grow at different rates :joy: ridiculous
And in ghci, the larger stack is about 10 times faster than in native, and grows to 150M :dizzy:
Now to find out which interpreter causes all that drama :upside_down:
I've been using Polysemy in a larger project and have run into some memory problems. I was able to create a super simple example at https://github.com/jhgarner/Polysemy-Testing where I compare ...
and apparently that is lazy in the existing value in the field. So I replaced the lens with a function with strictness annotation and the problem disappeared :big_smile: Both the old value and items are empty lists btw.
now when running the entire app, the space still grows, but slower, so I assume that there's another one of those rascals lurking somewhere in there.
but at least I'm on the right track now.
I unfortunately have no clue or time to investigate right now :( but it is definitely worth posting about this in that issue, or a new one, because it's a huge effort to track these things down!
small update: I found the worst culprit in my memory problems story, an unbounded queue.
Now I still have a bit of leaking space, but it's relatively small, about 200kB/s with ten threads.
seems I didn't predict that one item per second with a no-op as consumer would be too much for the queue. though maybe the stream that's feeding the queue is somehow being consumed eagerly…
can someone explain why this grows unbounded, and how to avoid it?
I see that it can be avoided by forcing evaluation of the input element, so this test case isn't representative of my real world problem. I'll have to revisit
I'm not getting anywhere. One thing that appears quite obviously to me is that the more interpreters I remove from my problematic program, the smaller the memory growth, going down from 10MB/s to almost nothing. I assume that the interpreter state is somehow flooded, but even this trivial recursion grows:
In this case, it increases only by 100 bytes per second.
Is this unavoidable, and should it scale that massively with the size of the interpreter stack?
btw profiling shows polysemy as the main cost center
after substituting some
*>
with bind, the main app now appears to be constant, while the test case still grows. weirdhm, this reminds me of the disclaimer on
parser-combinators
, warning thatApplicative
versions may leak memory, whileMonad
ones don'tdid you also try using
-fexpose-all-unfoldings
, or is it unfeasible?not sure if it will actually help, just randomly guessing
thanks, I'll give that a look
added that ghc option, appears to have no impact
Torsten Schmits said:
so that was a lie, it just looked fine in htop. measuring inside the program shows that it's growing at 5MB/s with 10 recursions/s
your current minimal repro is the "loop" thing from above?
not really, I'm struggling to produce a sensible test case. The leak is proportional to the number of interpreters in the stack, so it's very visible in a larger program, but for the thing I posted even the version with
IO
grows, about 100 times slower than withSem
(I removedInput
and just recursed). I would speculate that it's the calls togetRTSStats
, but if I run only those it stays constant.Maybe it's ghci, I only ran a slightly larger test case as native code. I'm going to do more analysis of my dev environment
finally I have a reliable case where the same semi-dummy recursion with 200k loops takes 500k constant memory with a small interpreter stack, while the large stack grows up to 190M. But only when compiled to native code, in ghci they just grow at different rates :joy: ridiculous
And in ghci, the larger stack is about 10 times faster than in native, and grows to 150M :dizzy:
Now to find out which interpreter causes all that drama :upside_down:
could be the machinery itself, there was another space leak issue reported - https://github.com/polysemy-research/polysemy/issues/340
damn
closing in!
got the fucker!!
in this interpreter, I used
and apparently that is lazy in the existing value in the field. So I replaced the lens with a function with strictness annotation and the problem disappeared :big_smile: Both the old value and
items
are empty lists btw.now when running the entire app, the space still grows, but slower, so I assume that there's another one of those rascals lurking somewhere in there.
but at least I'm on the right track now.
thanks for coming to my show! :mic:
nice
but wait, so it wasn't actually polysemy
was this the biggest memory leak?
not sure yet, but pretty likely
found another one, this time more obvious: using
modify
instead ofmodify'
so, interestingly, when I run this program as native code with
-O2
, it uses constant memory, as expected:But, if I replace
IO
withSem
and useEmbed IO
orFinal IO
, the space grows linearly.also when running the
IO
version in ghci, as said before, there's also growth.so I'm wondering, is this related to the issue you linked, @Georgi Lyubenov // googleson78 ?
I unfortunately have no clue or time to investigate right now :( but it is definitely worth posting about this in that issue, or a new one, because it's a huge effort to track these things down!
(thanks!!)
I'll do that right away!
small update: I found the worst culprit in my memory problems story, an unbounded queue.
Now I still have a bit of leaking space, but it's relatively small, about 200kB/s with ten threads.
seems I didn't predict that one item per second with a no-op as consumer would be too much for the queue. though maybe the stream that's feeding the queue is somehow being consumed eagerly…