Lots of context canceled error after the server run a while

mengxin · November 20, 2024, 10:45am

We are encountering a strange issue with our AWS instance connected to an RDS PostgreSQL database. Since yesterday, our server has been reporting numerous database operation errors, such as:

{"level":"error","ts":"2024-11-20T04:07:44.624Z","caller":"server/core_storage.go:780","msg":"Error deleting storage objects.","error":"timeout: context already done: context canceled"}

{"level":"error","ts":"2024-11-20T04:07:44.626Z","caller":"server/runtime_javascript_logger.go:94","msg":"[PlayerDataUpdateRpc (uid: xxx)] deleteAttributeRequestList get error GoError: failed to remove storage: timeout: context already done: context canceled","rpc_id":"protorpc"}

{"level":"info","ts":"2024-11-20T04:07:45.098Z","caller":"server/session_ws.go:81","msg":"New WebSocket session connected","uid":"xxx","sid":"feed8e5e-a6f4-11ef-a2e2-c8f4650eefa3","format":0}

{"level":"info","ts":"2024-11-20T04:07:45.814Z","caller":"server/session_ws.go:81","msg":"New WebSocket session connected","uid":"222","sid":"ff5abf02-a6f4-11ef-a2e2-c8f4650eefa3","format":0}

{"level":"error","ts":"2024-11-20T04:07:46.861Z","caller":"server/core_storage.go:601","msg":"Error writing storage objects.","error":"context canceled"}

{"level":"info","ts":"2024-11-20T04:07:47.175Z","caller":"server/session_ws.go:520","msg":"Closed client connection","uid":"xxxxx","sid":"53044847-a6f4-11ef-a2e2-c8f4650eefa3"}

{"level":"error","ts":"2024-11-20T04:07:47.647Z","caller":"server/runtime_javascript.go:575","msg":"JavaScript runtime function raised an uncaught exception","mode":"rpc","id":"protorpc","error":"Error: nakama runtime error: GoError: failed to write storage objects: context canceled at call (native)"}

{"level":"info","ts":"2024-11-20T04:07:51.255Z","caller":"server/session_ws.go:520","msg":"Closed client connection","uid":"xxxx","sid":"1bd9c704-a6ea-11ef-a2e2-c8f4650eefa3"}

{"level":"info","ts":"2024-11-20T04:07:51.816Z","caller":"server/session_ws.go:520","msg":"Closed client connection","uid":"xxx","sid":"7db077ba-a6f3-11ef-a2e2-c8f4650eefa3"}

{"level":"error","ts":"2024-11-20T04:07:51.836Z","caller":"server/core_account.go:91","msg":"Error retrieving user account.","error":"timeout: context already done: context canceled"}

{"level":"error","ts":"2024-11-20T04:07:51.836Z","caller":"server/runtime_javascript.go:575","msg":"JavaScript runtime function raised an uncaught exception","mode":"rpc","id":"protorpc","error":"GoError: error getting account: timeout: context already done: context canceled at github.com/heroiclabs/nakama/v3/server.(*runtimeJavascriptNakamaModule).mappings.(*runtimeJavascriptNakamaModule).accountGetId.func38 (native)"}

{"level":"info","ts":"2024-11-20T04:07:52.366Z","caller":"server/session_ws.go:520","msg":"Closed client connection","uid":"1264e9f8","sid":"0340529b-a6f4-11ef-a2e2-c8f4650eefa3"}

{"level":"error","ts":"2024-11-20T04:07:52.926Z","caller":"server/core_storage.go:780","msg":"Error deleting storage objects.","error":"timeout: context already done: context canceled"}

{"level":"error","ts":"2024-11-20T04:07:52.926Z","caller":"server/runtime_javascript_logger.go:94","msg":"[PlayerDataUpdateRpc (uid: 5a4e5bd0)] deleteAttributeRequestList get error GoError: failed to remove storage: timeout: context already done: context canceled","rpc_id":"protorpc"}

{"level":"info","ts":"2024-11-20T04:07:53.906Z","caller":"server/core_authenticate.go:212","msg":"User account is disabled.","deviceID":"642AA140","username":"xKTmoUXVli","create":true}

{"level":"error","ts":"2024-11-20T04:07:56.052Z","caller":"server/core_storage.go:780","msg":"Error deleting storage objects.","error":"timeout: context already done: context canceled"}

{"level":"error","ts":"2024-11-20T04:07:56.054Z","caller":"server/runtime_javascript_logger.go:94","msg":"[PlayerDataUpdateRpc (uid: fa290022)] deleteAttributeRequestList get error GoError: failed to remove storage: timeout: context already done: context canceled","rpc_id":"protorpc"}

{"level":"error","ts":"2024-11-20T04:07:57.901Z","caller":"server/core_storage.go:601","msg":"Error writing storage objects.","error":"context canceled"}

{"level":"error","ts":"2024-11-20T04:07:58.213Z","caller":"server/core_storage.go:591","msg":"Error writing storage objects.","error":"timeout: context already done: context canceled"}

{"level":"error","ts":"2024-11-20T04:07:58.380Z","caller":"server/core_storage.go:601","msg":"Error writing storage objects.","error":"failed to connect to `user=xxx_admin database=xxx_db`:\n\txxx:5432 (xxx.com): dial error: dial tcp 172.1.1.1:5432: operation was canceled\n\t172.1.1.1:5432 (xxx.com): dial error: dial tcp 172.1.1.1:5432: operation was canceled"}

{"level":"error","ts":"2024-11-20T04:07:58.463Z","caller":"server/core_storage.go:601","msg":"Error writing storage objects.","error":"failed to connect to `user=xxx_admin database=xxx_db`:\n\txxx:5432 (xxx.com): dial error: dial tcp 172.1.1.1:5432: operation was canceled\n\t172.1.1.1:5432 (xxx.com): dial error: dial tcp 172.1.1.1:5432: operation was canceled"}

{"level":"error","ts":"2024-11-20T04:07:58.927Z","caller":"server/core_storage.go:591","msg":"Error writing storage objects.","error":"timeout: context already done: context canceled"}

We tried rebuilding the instance, which temporarily resolved the issue, but the errors reappeared after 1-2 hours.

According to this forum thread, the context canceled error is related to client connections being terminated. However, the frequency and scale of the errors are concerning, as almost all players are unable to authenticate with the server.

Could there be any server-side causes for this issue? We would appreciate any insights or guidance to help address the problem.

mengxin · November 20, 2024, 2:13pm

It seems we’ve identified a potential issue:

1.	Performing extensive database operations within a single RPC, such as using nakamaRuntime.nk.storageList to page through multiple records.
2.	All players call this RPC during login.
3.	If many players log in simultaneously, the RPC will be invoked repeatedly, triggering these database operations.
4.	If the RPC times out, the client will terminate the request, causing the server to report a “context canceled” error (is this correct?).

Could someone help confirm if this scenario could lead to the above-mentioned errors?

sesposito · November 20, 2024, 2:21pm

That seems plausible, however I wouldn’t expect to see failed to connect to (...), so I’d have a look at DB stats and internal logs to see if something happened on the database side. Unfortunately there’s not much else we can help with given that this is a custom deployment.

mengxin · November 21, 2024, 9:48am

Thank you! If that’s the case, is there a way to make the server more robust without disrupting the entire system? For instance, the server could implement a timeout mechanism that aligns with the client’s timeout. If a database operation exceeds the timeout, it could be terminated, and the corresponding queue could be cleared to allow other RPC requests to proceed.

sesposito · November 21, 2024, 10:31am

This is already the behavior - if the http request times out the underlying db query is cancelled.

mengxin · November 21, 2024, 3:35pm

if so the issue we encountered is not normal, right? even if the case happens, the Nakama server should still work or recover for further requests and DB operation correct?

sesposito · November 21, 2024, 4:31pm

As I mentioned before, I wouldn’t expect to see db connection errors in the logs, this looks like an issue with the database setup, not Nakama. We’ve haven’t seen such issues on Heroic Cloud to my knowledge, unless there’s a database scheduled maintenance process ongoing.

Topic		Replies	Views
What to do about "timeout: context canceled" in critical after-hook Runtime Framework server-framework , lua , storage	1	1410	April 30, 2022
How to increase time-out of default context for the StorageReadObjects method? Local Setup	2	459	August 13, 2020
Problem With Nakama queries Local Setup	1	502	May 24, 2023
"Error pinging database" on nakama start up but migrate up completes successfully Local Setup getting-started	7	3045	August 29, 2019
Nakama says "fatal" but it's still running Local Setup	2	456	November 6, 2020

Lots of context canceled error after the server run a while

Related topics