Amazonka/Conduit, running out of file descriptors - Haskell

Welcome to the Functional Programming Zulip Chat Archive. You can join the chat here.

Magnus Therning

I've run into a problem with running out of file descriptors. The following snippet is a trimmed down version of what I'm doing:

main :: IO ()
main = do
  awsEnv <- newEnv Discover
  runAWSCond awsEnv $
    sqsSource queueUrl
      .| C.mapC snd
      .| sqsDeleteSink queueUrl
    runAWSCond awsEnv = runResourceT . runAWS awsEnv . within Frankfurt . C.runConduit

sqsSource :: MonadAWS m => T.Text -> C.ConduitT () (T.Text, T.Text) m ()
sqsSource queueUrl = do
  (_, msgs) <- C.lift $ recvSQS queueUrl
  C.yieldMany msgs
  sqsSource queueUrl

sqsDeleteSink :: MonadAWS m => T.Text -> C.ConduitT T.Text o m ()
sqsDeleteSink queueUrl = do
  C.await >>= \case
    Nothing -> pure ()
    Just receiptHandle -> do
      void $ C.lift $ delSQS queueUrl receiptHandle
      sqsDeleteSink queueUrl

recvSQS queueUrl = do
  let rm = receiveMessage queueUrl & rmMaxNumberOfMessages ?~ 10
  rmrs <- send rm
  let status = rmrs ^. rmrsResponseStatus
      msgs = rmrs ^. rmrsMessages & traversed %~ extract
  pure (status, catMaybes msgs)
    extract msg = do
      body <- msg ^. mBody
      rh <- msg ^. mReceiptHandle
      pure (body, rh)

delSQS queueUrl receiptHandle = do
  let dm = deleteMessage queueUrl receiptHandle
  send dm

This works fine for a while, but given a queue with enough messages it will fail with something like

TransportError (HttpExceptionRequest Request {
  host                 = ""
  port                 = 443
  secure               = True
  requestHeaders       = [("Host",""),("X-Amz-Date","20201126T101659Z"),("X-Amz-Content-SHA256","2e4bdf20a857a1416f218b1218670cf019ff53268d0adb34fe06402a62f3271d"),("Content-Type","application/x-www-form-urlencoded; charset=utf-8"),("Authorization","<REDACTED>")]
  path                 = "/"
  queryString          = ""
  method               = "POST"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 0
  responseTimeout      = ResponseTimeoutMicro 70000000
  requestVersion       = HTTP/1.1
 (ConnectionFailure Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 0, addrAddress = <assumed to be undefined>, addrCanonName = <assumed to be undefined>}, host name: Just "", service name: Just "443"): does not exist (System error)))

After some detours I found out that it's actually not a network issue, but rather that the process runs out of file descriptors. Using lsof I can see that it doesn't seem to close /any/ sockets at all, instead they get stuck in a CLOSE_WAIT state:

wd-stats 88674 magnus   23u  IPv4 815196      0t0  TCP> (CLOSE_WAIT)
wd-stats 88674 magnus   24u  IPv4 811362      0t0  TCP> (CLOSE_WAIT)
wd-stats 88674 magnus   25u  IPv4 811386      0t0  TCP> (CLOSE_WAIT)
wd-stats 88674 magnus   26u  IPv4 813527      0t0  TCP> (CLOSE_WAIT)

Am I using Amazonka and/or Conduit in a way that results in this? How should I use them?

Or, is it an issue somewhere "below" my code? What can I do address that?

Magnus Therning

I just remembered that I read something that could be related to this, a blog post about a race condition on Linux. IIRC the author used BPF to instrument the kernel in order to find the cause. Maybe someone here has read the same post?

Magnus Therning

Another night's sleep and something occurred to me, I actually have some Amazonka/Conduit code that _doesn't_ leak FDs that way. So now there's the PR brendanhay/amazonka#608.

I've run into a problem with running out of file descriptors. I suspect that use of Network.AWS.Response.receiveNull results in the program not closing sockets properly. The following snippet i...