Nakama docker works but after a while cockroachdb container falling

1
{
Nakama docker on digitalocean server works well. but after a while only cockroachdb container falls
nakama and prometheus containers keep running.

how can i determine problem and solve this?
}

{Details}

  1. Versions: Nakama {3.5}, {Windows, Mac, Linux binary or Docker}, {client library (SDK) and version}
  2. Server Framework Runtime language (If relevant) {Go, TS/JS, Lua}
{code or log snippet}

:tv: Media:

Hello @ennteresan,

You can start by checking the logs of the container running cockroachdb. The cmd is docker logs -t <container-name>.

I’ve been running a local CockroachDB for my local development server for a while. I’ve noticed that it’s not really that stable and periodically just crashes. Maybe once per week or so. It’s no big deal for me when developing, but I would not deploy a live game on CockroachDB.

I’m running Nakama on Digital Ocean, the default setup using docker and cockroachdb. I’ve noticed the service stopping too, usually after 24h or so. A restart brings everything back to normal, ie. connecting to Nakama works again and the permanent authoritative match resumes.

On startup, the cockroach init message says the logs are stored in /var/lib/cockroach/logs, but the “cockroach” directory doesn’t exist on the droplet.

Has anyone been able to solve this, or is there something else that could help figure out why cockroach stops unexpectedly?

ps. I’m using Cockroach 20.2.19 and Nakama 3.15.0.

Checking the logs doesn’t show anything apart from the startup messages from a day or so ago. No errors.

I’m assuming this happens because Cockroach runs out of resources, but I’d like to know that’s the case before upgrading the server.

Update: I upgraded the server to meet the minimum spec of Cockroachdb, and it’s stayed up for almost a week with no issues. So it looks like it ran out of resources before, fingers crossed. :crossed_fingers:

1 Like

i run nakama server on beelink mini pc atom x5-z8350 cpu and 4gb ram ubuntu server.
now prometheus stops. but not exactly. docker ps shows nakama and cockroch still run with healty tag but prometheus there without that tag.
So if its still resources problem, why cant see any log?
after killing processes and compose up from yml, it works again.

In my case (see above) I also didn’t get anything in the logs. In case it helps, this is the option I use on Digital Ocean. It’s been up for months without any problems.

I think atom x5-z8350 cpu and 4gb of ram should be enough . Maybe my beelink mini pc has hardware problem🤔.
I tried minimal droplet digital ocean for testing when i get this problem. If i find psu , i will try server on 4 generation i3 desktop with 2 gb of ram. Hope it will work without problem.

I’m also experiencing a stability issue when deployed on Digital Ocean, however, I’m using the Regular Intel / 2vCPU / 4GB configuration so I doubt it’s a resource issue.

Memory peaks at ~60% usage and goes down to ~20% once cockroach fails. Nakama and prometheus are still running.

I see a single error message when checking the logs with docker logs -t <container-name, but there’s not really any details to dive into.

2023-12-09T19:44:51.310542051Z * ERROR: Queued as error eb0b8525aca94f93894f7b028c951c18

Do you have any configurations out of the ordinary?

I cant see any error message. 2 days ago i composed up yml. Now i try to access nakama, didnt work. Then i connect ssh to server and run docker ps command, containers still running.
Then i try to connect nakama web 7351 port, it works now. And game works again.

So what now, how can i determine problem?

1- composed up nakama
2- it not work after a while (dont know how much time)
3- after only ssh connect, it works again
4- now i tried again. After exit ssh session it stops again .

I used docker compose up with -d flag for run nakama on boot . But it works only while ssh connection login .

:thinking:

My last problem was about some auth problem. When i login with user, then nakama continue running , after ssh connection closed , nakama stops but docker still shown🤔.

I compose up nakama but this time with root , now it works without problem.

Just an update from me; this turned out to be an issue with DigitalOcean’s server.

Some times in the Docker log I saw a warning about disk slowness detected prior to the error message, so I reached out to their support who responded with the following:

Upon reviewing the Droplet, we observed some CPU steal occurring at specific timeframes.

I have gone ahead and live-migrated(no downtime) your Droplet to another hypervisor to alleviate any hypervisor-related issues.

Since the migration I have not experienced any issues :tada: