I've run into a problem with running out of file descriptors. The following snippet is a trimmed down version of what I'm doing:
main :: IO ()
main = do
awsEnv <- newEnv Discover
runAWSCond awsEnv $
sqsSource queueUrl
.| C.mapC snd
.| sqsDeleteSink queueUrl
where
runAWSCond awsEnv = runResourceT . runAWS awsEnv . within Frankfurt . C.runConduit
sqsSource :: MonadAWS m => T.Text -> C.ConduitT () (T.Text, T.Text) m ()
sqsSource queueUrl = do
(_, msgs) <- C.lift $ recvSQS queueUrl
C.yieldMany msgs
sqsSource queueUrl
sqsDeleteSink :: MonadAWS m => T.Text -> C.ConduitT T.Text o m ()
sqsDeleteSink queueUrl = do
C.await >>= \case
Nothing -> pure ()
Just receiptHandle -> do
void $ C.lift $ delSQS queueUrl receiptHandle
sqsDeleteSink queueUrl
recvSQS queueUrl = do
let rm = receiveMessage queueUrl & rmMaxNumberOfMessages ?~ 10
rmrs <- send rm
let status = rmrs ^. rmrsResponseStatus
msgs = rmrs ^. rmrsMessages & traversed %~ extract
pure (status, catMaybes msgs)
where
extract msg = do
body <- msg ^. mBody
rh <- msg ^. mReceiptHandle
pure (body, rh)
delSQS queueUrl receiptHandle = do
let dm = deleteMessage queueUrl receiptHandle
send dm
This works fine for a while, but given a queue with enough messages it will fail with something like
TransportError (HttpExceptionRequest Request {
host = "sqs.eu-central-1.amazonaws.com"
port = 443
secure = True
requestHeaders = [("Host","sqs.eu-central-1.amazonaws.com"),("X-Amz-Date","20201126T101659Z"),("X-Amz-Content-SHA256","2e4bdf20a857a1416f218b1218670cf019ff53268d0adb34fe06402a62f3271d"),("Content-Type","application/x-www-form-urlencoded; charset=utf-8"),("Authorization","<REDACTED>")]
path = "/"
queryString = ""
method = "POST"
proxy = Nothing
rawBody = False
redirectCount = 0
responseTimeout = ResponseTimeoutMicro 70000000
requestVersion = HTTP/1.1
}
(ConnectionFailure Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 0, addrAddress = <assumed to be undefined>, addrCanonName = <assumed to be undefined>}, host name: Just "sqs.eu-central-1.amazonaws.com", service name: Just "443"): does not exist (System error)))
After some detours I found out that it's actually not a network issue, but rather that the process runs out of file descriptors. Using lsof I can see that it doesn't seem to close /any/ sockets at all, instead they get stuck in a CLOSE_WAIT state:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
wd-stats 88674 magnus 23u IPv4 815196 0t0 TCP ip-192-168-0-9.eu-central-1.compute.internal:60624->52.119.188.213:https (CLOSE_WAIT)
wd-stats 88674 magnus 24u IPv4 811362 0t0 TCP ip-192-168-0-9.eu-central-1.compute.internal:43482->52.119.189.184:https (CLOSE_WAIT)
wd-stats 88674 magnus 25u IPv4 811386 0t0 TCP ip-192-168-0-9.eu-central-1.compute.internal:60628->52.119.188.213:https (CLOSE_WAIT)
wd-stats 88674 magnus 26u IPv4 813527 0t0 TCP ip-192-168-0-9.eu-central-1.compute.internal:43486->52.119.189.184:https (CLOSE_WAIT)
...
Am I using Amazonka and/or Conduit in a way that results in this? How should I use them?
Or, is it an issue somewhere "below" my code? What can I do address that?
I just remembered that I read something that could be related to this, a blog post about a race condition on Linux. IIRC the author used BPF to instrument the kernel in order to find the cause. Maybe someone here has read the same post?
Another night's sleep and something occurred to me, I actually have some Amazonka/Conduit code that _doesn't_ leak FDs that way. So now there's the PR brendanhay/amazonka#608.
I've run into a problem with running out of file descriptors. I suspect that use of Network.AWS.Response.receiveNull results in the program not closing sockets properly. The following snippet i...
I've run into a problem with running out of file descriptors. The following snippet is a trimmed down version of what I'm doing:
This works fine for a while, but given a queue with enough messages it will fail with something like
After some detours I found out that it's actually not a network issue, but rather that the process runs out of file descriptors. Using
lsof
I can see that it doesn't seem to close /any/ sockets at all, instead they get stuck in aCLOSE_WAIT
state:Am I using Amazonka and/or Conduit in a way that results in this? How should I use them?
Or, is it an issue somewhere "below" my code? What can I do address that?
I just remembered that I read something that could be related to this, a blog post about a race condition on Linux. IIRC the author used BPF to instrument the kernel in order to find the cause. Maybe someone here has read the same post?
Another night's sleep and something occurred to me, I actually have some Amazonka/Conduit code that _doesn't_ leak FDs that way. So now there's the PR brendanhay/amazonka#608.