Socket Disconnections

Hello, I’m using Godot with the TS runtime. Some of my players are getting disconnected randomly (via the _on_socket_close) event. Is there any way for me to debug this? This is my config:

session:
 single_socket: true
 single_match: true
 single_session: true
 token_expiry_sec: 7200 # 2 hours
 refresh_token_expiry_sec: 86400 # 1 day
 max_request_size_bytes: 6291456
socket:
 max_message_size_bytes: 4096 # reserved buffer
 max_request_size_bytes: 6291456
 write_timeout_ms: 20000
 read_timeout_ms: 20000
 write_wait_ms: 5000

The server is behind a Cloudflare proxy if that helps.

Thanks

The server logs may provide more information on why the clients have been disconnected. That being said, there’s a plethora of reasons that can lead to disconnects, including network issues outside the control of the server.

What kind of logs am I looking for? I don’t see anything out of the blue than “Client connection closed”.

There “may” be some logs around disconnections forced by the server (e.g.: the client buffer being full), but if there’s no errors, then it’s likely that these are client driven or network disconnects.

Ok got it, the issue is it’s happening with multiple players randomly. Is there no parameter I can set on the client to wait for a bit more before closing the socket?

You can have a look at the socket ping/pong values: Configuration - Heroic Labs Documentation.

Mind that changing these configs has tradeoffs (see: How to handle MatchLeave Trigger Time - #4 by sesposito) and other considerations.

Ideally the client should also attempt to gracefully handle reconnects if possible.

Thanks but as far as I understand this is a server side setting right? Would setting a higher value mean the server would wait for longer to disconnect them?
From talking to the players it seems like the disconnect and reconnect is really quick, maybe it’s caused by a latency spike? I even removed the WSS proxy and am sending the traffic straight to nakama. That seems to have made it better for some players.

It is, but the socket handling by the underlying OS can also have an impact on socket disconnection detection in some scenarios.
I cannot comment on custom infrastructure deployments.

Hello again @sesposito
I noticed these errors in my logs:

{"level":"error","ts":"2025-03-18T03:13:49.407Z","caller":"server/tracker.go:1164","msg":"Failed to deliver presence event","sid":"e10b8a38-0393-11f0-81cd-7d6d9d4af57d","error":"session outgoing queue full"}

What would be causing this? Is it related to the Global Stream I used to track newly joined users? Any way to avoid it?

Around 3000 such logs were printed for 7-8 user IDs at the same time for 2 seconds. Then it never happened again.

It means the clients couldn’t keep up with the amount of messages that were being sent by the server. In this case, the server will disconnect the client because otherwise it’ll start dropping messages which can cause issues with the client logic.
This can be because too many messages are being sent and/or the receiver doesn’t have enough bandwidth or stable connection to keep up or it isn’t processing the messages fast enough.

Thank you, I checked my codebase and there’s no place where the outgoing message queue is higher than the buffer. The only I can think of is the global stream I use to display the people who join the global chat. Could that be it?

They should be unrelated. If you’ve reduced the buffer size I recommend you keep the default. Otherwise, you may have an issue with how the client is handling the messages or how fast it’s going through them.

Hello, I’ve increased the outgoing queue size to 32. I still see errors like these:

{"level":"error","ts":"2025-03-25T01:38:39.241Z","caller":"server/tracker.go:1298","msg":"Failed to deliver presence event","sid":"3bf39f46-0919-11f0-b12c-7d6d9d4af57d","error":"session outgoing queue full"}
{"level":"error","ts":"2025-03-25T01:38:39.262Z","caller":"server/tracker.go:1298","msg":"Failed to deliver presence event","sid":"3bf39f46-0919-11f0-b12c-7d6d9d4af57d","error":"session outgoing queue full"}
{"level":"error","ts":"2025-03-25T01:38:39.278Z","caller":"server/tracker.go:1298","msg":"Failed to deliver presence event","sid":"3bf39f46-0919-11f0-b12c-7d6d9d4af57d","error":"session outgoing queue full"}
{"level":"error","ts":"2025-03-25T01:38:39.281Z","caller":"server/tracker.go:1298","msg":"Failed to deliver presence event","sid":"3bf39f46-0919-11f0-b12c-7d6d9d4af57d","error":"session outgoing queue full"}
{"level":"error","ts":"2025-03-25T01:38:39.299Z","caller":"server/tracker.go:1298","msg":"Failed to deliver presence event","sid":"3bf39f46-0919-11f0-b12c-7d6d9d4af57d","error":"session outgoing queue full"}
{"level":"error","ts":"2025-03-25T01:38:39.342Z","caller":"server/tracker.go:1298","msg":"Failed to deliver presence event","sid":"6d9d7198-090f-11f0-b12c-7d6d9d4af57d","error":"session outgoing queue full"}
{"level":"error","ts":"2025-03-25T01:38:39.342Z","caller":"server/tracker.go:1298","msg":"Failed to deliver presence event","sid":"3bf39f46-0919-11f0-b12c-7d6d9d4af57d","error":"session outgoing queue full"}
{"level":"error","ts":"2025-03-25T01:38:39.497Z","caller":"server/tracker.go:1164","msg":"Failed to deliver presence event","sid":"6d9d7198-090f-11f0-b12c-7d6d9d4af57d","error":"session outgoing queue full"}
{"level":"error","ts":"2025-03-25T01:38:39.497Z","caller":"server/tracker.go:1164","msg":"Failed to deliver presence event","sid":"3bf39f46-0919-11f0-b12c-7d6d9d4af57d","error":"session outgoing queue full"}
{"level":"error","ts":"2025-03-25T01:38:39.673Z","caller":"server/tracker.go:1164","msg":"Failed to deliver presence event","sid":"4b0042c3-0908-11f0-b12c-7d6d9d4af57d","error":"session outgoing queue full"}

The weird this is they appear in a burst for a couple seconds for a handful of clients and then stop. Is this device related?

Previous response still applies:

This can be because too many messages are being sent and/or the receiver doesn’t have enough bandwidth or stable connection to keep up or it isn’t processing the messages fast enough.