Increase in CPU usage over time

Hi,
I host my nakama server on GCP for a relatively small multiplayer game. It usually hosts around 150 concurrent users. My problem is the CPU usage keeps increasing every day even though I have almost the same number of players.

When I checked the nakama admin console, I noticed, when the CPU usage is increased, the number of goroutines fluctuates like crazy, it jumps from e.g. 700 to 45000. Please see the secreenshots which were taken 2-3 secs apart.

Because of the increasing CPU usage over time, I need to restart the server every 4-5 days. After the restart it drops from 100% to 7-8% hosting same number of players and then it increases every day until I do a manual restart. You can see the screenshot of CPU usage for the last 5-6 days (there was a manual restart when the CPU usage reached to 100%).

Screenshots:


I don’t use any plugin or any custom code, the server version is: 2.11.1+5a218d6a

Do you know why it is happening?

@briar Please can you upgrade to the 2.12.0 release of the server and report back if you still have issues? Also please leave a note for what features of the game server you use at the moment. i.e. Leaderboards, chat, friends, matchmaker, etc, etc.

@novabyte, we only use mathcmaker, we don’t use leaderboards, chat or friends.

@briar I was hoping you’d give a list of what features of the server you do use not to list what you don’t use from those I’ve mentioned. Please share that list.

Also share an export of the full server configuration YML from the Dev Console UI.

@novabyte We only use matchmaker and multiplayer, to be more precise; I can list all the APIs I call from my clients:

  • nakamaClient.authenticateDevice()

  • nakamaClient.createSocket()

  • clientSocket.connect()

  • clientSocket.addMatchmaker()

  • clientSocket.joinMatchToken()

  • clientSocket.sendMatchData()

  • clientSocket.leaveMatch()

  • clientSocket.disconnect()

Please see the config.yaml export in the attachments.config.txt (1.8 KB)

Please let me know if you need anything else.

@briar What CPU and RAM do you have associated to the Nakama server and do you run the database on a separate server? Do you run a load balancer in front of the server which handles SSL termination?

@novabyte Both DB and Nakama Server run on the same VM which has 1 vCPU and 1.7GB RAM. We don’t run LB in front of the server, SSL is terminated on Nakama server

Do you have any idea why we are getting this behaviour @novabyte?

@briar Did you upgrade to the latest server release?

@novabyte sure I can try, is there any known performance issue with 2.11? I checked the CHANGELOG, I could not see any change around that.

@briar There’s a bunch of internal dependencies which were updated between releases but not any one particular change that I can be sure will make a difference.

The goroutine pattern you see with the server is likely caused because you do SSL termination in the game server which is heavily not recommended for a production game. Usually you’d use a dedicated load balancer to manage the SSL negotiation which also separates the CPU overhead of that work from the game server itself.

Is there a reason you’ve opted to avoid a dedicated load balancer with your game project?

@novabyte thank you for the information.

We did not have LB in front of our servers just because of cost saving.

If it is the SSL termination, the weird thing is it is getting worse everyday, so feel like something is leaking over time after the clients connect/disconnect. Something is not cleaned properly I guess

@briar We released 2.13.0 of Nakama this week. Please update to that release and report back whether the issue with SSL is resolved.

@briar Can you please inform us about the current state of your issue and if you managed to solve it by upgrading to 2.13.0?

@briar I wanted to check in on whether you’ve had the chance to update the game server yet?

@novabyte unfortunately we have not had a chance to upgrade. I will let you know as soon as we upgrade to latest version

@briar We’ve made another release of the server since this thread was opened.

I believe it will solve the issue you may have encountered with a memory leak around the SSL negotiation in the GRPC runtime we use in the server. We’ve updated the version of that library and those which depend on it within our latest release. Have a look at the release notes for more information:

https://github.com/heroiclabs/nakama/releases/v2.14.0

Please do report back to indicate this problem is solved. @Mahdad-Baghani You seemed interested in the issue as well so please test and let us know.

@novabyte thank you so much for the information. Hopefully we can test that version soon.

1 Like

Now the server is updated to v2.14.0, we will wait and see…
We will update this thread with our findings

1 Like