Lots of context canceled error after the server run a while

We are encountering a strange issue with our AWS instance connected to an RDS PostgreSQL database. Since yesterday, our server has been reporting numerous database operation errors, such as:

{"level":"error","ts":"2024-11-20T04:07:44.624Z","caller":"server/core_storage.go:780","msg":"Error deleting storage objects.","error":"timeout: context already done: context canceled"}

{"level":"error","ts":"2024-11-20T04:07:44.626Z","caller":"server/runtime_javascript_logger.go:94","msg":"[PlayerDataUpdateRpc (uid: xxx)] deleteAttributeRequestList get error GoError: failed to remove storage: timeout: context already done: context canceled","rpc_id":"protorpc"}

{"level":"info","ts":"2024-11-20T04:07:45.098Z","caller":"server/session_ws.go:81","msg":"New WebSocket session connected","uid":"xxx","sid":"feed8e5e-a6f4-11ef-a2e2-c8f4650eefa3","format":0}

{"level":"info","ts":"2024-11-20T04:07:45.814Z","caller":"server/session_ws.go:81","msg":"New WebSocket session connected","uid":"222","sid":"ff5abf02-a6f4-11ef-a2e2-c8f4650eefa3","format":0}

{"level":"error","ts":"2024-11-20T04:07:46.861Z","caller":"server/core_storage.go:601","msg":"Error writing storage objects.","error":"context canceled"}

{"level":"info","ts":"2024-11-20T04:07:47.175Z","caller":"server/session_ws.go:520","msg":"Closed client connection","uid":"xxxxx","sid":"53044847-a6f4-11ef-a2e2-c8f4650eefa3"}

{"level":"error","ts":"2024-11-20T04:07:47.647Z","caller":"server/runtime_javascript.go:575","msg":"JavaScript runtime function raised an uncaught exception","mode":"rpc","id":"protorpc","error":"Error: nakama runtime error: GoError: failed to write storage objects: context canceled at call (native)"}

{"level":"info","ts":"2024-11-20T04:07:51.255Z","caller":"server/session_ws.go:520","msg":"Closed client connection","uid":"xxxx","sid":"1bd9c704-a6ea-11ef-a2e2-c8f4650eefa3"}

{"level":"info","ts":"2024-11-20T04:07:51.816Z","caller":"server/session_ws.go:520","msg":"Closed client connection","uid":"xxx","sid":"7db077ba-a6f3-11ef-a2e2-c8f4650eefa3"}

{"level":"error","ts":"2024-11-20T04:07:51.836Z","caller":"server/core_account.go:91","msg":"Error retrieving user account.","error":"timeout: context already done: context canceled"}

{"level":"error","ts":"2024-11-20T04:07:51.836Z","caller":"server/runtime_javascript.go:575","msg":"JavaScript runtime function raised an uncaught exception","mode":"rpc","id":"protorpc","error":"GoError: error getting account: timeout: context already done: context canceled at github.com/heroiclabs/nakama/v3/server.(*runtimeJavascriptNakamaModule).mappings.(*runtimeJavascriptNakamaModule).accountGetId.func38 (native)"}

{"level":"info","ts":"2024-11-20T04:07:52.366Z","caller":"server/session_ws.go:520","msg":"Closed client connection","uid":"1264e9f8","sid":"0340529b-a6f4-11ef-a2e2-c8f4650eefa3"}

{"level":"error","ts":"2024-11-20T04:07:52.926Z","caller":"server/core_storage.go:780","msg":"Error deleting storage objects.","error":"timeout: context already done: context canceled"}

{"level":"error","ts":"2024-11-20T04:07:52.926Z","caller":"server/runtime_javascript_logger.go:94","msg":"[PlayerDataUpdateRpc (uid: 5a4e5bd0)] deleteAttributeRequestList get error GoError: failed to remove storage: timeout: context already done: context canceled","rpc_id":"protorpc"}

{"level":"info","ts":"2024-11-20T04:07:53.906Z","caller":"server/core_authenticate.go:212","msg":"User account is disabled.","deviceID":"642AA140","username":"xKTmoUXVli","create":true}

{"level":"error","ts":"2024-11-20T04:07:56.052Z","caller":"server/core_storage.go:780","msg":"Error deleting storage objects.","error":"timeout: context already done: context canceled"}

{"level":"error","ts":"2024-11-20T04:07:56.054Z","caller":"server/runtime_javascript_logger.go:94","msg":"[PlayerDataUpdateRpc (uid: fa290022)] deleteAttributeRequestList get error GoError: failed to remove storage: timeout: context already done: context canceled","rpc_id":"protorpc"}

{"level":"error","ts":"2024-11-20T04:07:57.901Z","caller":"server/core_storage.go:601","msg":"Error writing storage objects.","error":"context canceled"}

{"level":"error","ts":"2024-11-20T04:07:58.213Z","caller":"server/core_storage.go:591","msg":"Error writing storage objects.","error":"timeout: context already done: context canceled"}

{"level":"error","ts":"2024-11-20T04:07:58.380Z","caller":"server/core_storage.go:601","msg":"Error writing storage objects.","error":"failed to connect to `user=xxx_admin database=xxx_db`:\n\txxx:5432 (xxx.com): dial error: dial tcp 172.1.1.1:5432: operation was canceled\n\t172.1.1.1:5432 (xxx.com): dial error: dial tcp 172.1.1.1:5432: operation was canceled"}

{"level":"error","ts":"2024-11-20T04:07:58.463Z","caller":"server/core_storage.go:601","msg":"Error writing storage objects.","error":"failed to connect to `user=xxx_admin database=xxx_db`:\n\txxx:5432 (xxx.com): dial error: dial tcp 172.1.1.1:5432: operation was canceled\n\t172.1.1.1:5432 (xxx.com): dial error: dial tcp 172.1.1.1:5432: operation was canceled"}

{"level":"error","ts":"2024-11-20T04:07:58.927Z","caller":"server/core_storage.go:591","msg":"Error writing storage objects.","error":"timeout: context already done: context canceled"}

We tried rebuilding the instance, which temporarily resolved the issue, but the errors reappeared after 1-2 hours.

According to this forum thread, the context canceled error is related to client connections being terminated. However, the frequency and scale of the errors are concerning, as almost all players are unable to authenticate with the server.

Could there be any server-side causes for this issue? We would appreciate any insights or guidance to help address the problem.

It seems we’ve identified a potential issue:

1.	Performing extensive database operations within a single RPC, such as using nakamaRuntime.nk.storageList to page through multiple records.
2.	All players call this RPC during login.
3.	If many players log in simultaneously, the RPC will be invoked repeatedly, triggering these database operations.
4.	If the RPC times out, the client will terminate the request, causing the server to report a “context canceled” error (is this correct?).

Could someone help confirm if this scenario could lead to the above-mentioned errors?

That seems plausible, however I wouldn’t expect to see failed to connect to (...), so I’d have a look at DB stats and internal logs to see if something happened on the database side. Unfortunately there’s not much else we can help with given that this is a custom deployment.

1 Like

Thank you! If that’s the case, is there a way to make the server more robust without disrupting the entire system? For instance, the server could implement a timeout mechanism that aligns with the client’s timeout. If a database operation exceeds the timeout, it could be terminated, and the corresponding queue could be cleared to allow other RPC requests to proceed.

This is already the behavior - if the http request times out the underlying db query is cancelled.

if so the issue we encountered is not normal, right? even if the case happens, the Nakama server should still work or recover for further requests and DB operation correct?

As I mentioned before, I wouldn’t expect to see db connection errors in the logs, this looks like an issue with the database setup, not Nakama. We’ve haven’t seen such issues on Heroic Cloud to my knowledge, unless there’s a database scheduled maintenance process ongoing.