Hi,
I host my nakama server on GCP for a relatively small multiplayer game. It usually hosts around 150 concurrent users. My problem is the CPU usage keeps increasing every day even though I have almost the same number of players.
When I checked the nakama admin console, I noticed, when the CPU usage is increased, the number of goroutines fluctuates like crazy, it jumps from e.g. 700 to 45000. Please see the secreenshots which were taken 2-3 secs apart.
Because of the increasing CPU usage over time, I need to restart the server every 4-5 days. After the restart it drops from 100% to 7-8% hosting same number of players and then it increases every day until I do a manual restart. You can see the screenshot of CPU usage for the last 5-6 days (there was a manual restart when the CPU usage reached to 100%).
@briar Please can you upgrade to the 2.12.0 release of the server and report back if you still have issues? Also please leave a note for what features of the game server you use at the moment. i.e. Leaderboards, chat, friends, matchmaker, etc, etc.
@briar I was hoping you’d give a list of what features of the server you do use not to list what you don’t use from those I’ve mentioned. Please share that list.
Also share an export of the full server configuration YML from the Dev Console UI.
@briar What CPU and RAM do you have associated to the Nakama server and do you run the database on a separate server? Do you run a load balancer in front of the server which handles SSL termination?
@novabyte Both DB and Nakama Server run on the same VM which has 1 vCPU and 1.7GB RAM. We don’t run LB in front of the server, SSL is terminated on Nakama server
@briar There’s a bunch of internal dependencies which were updated between releases but not any one particular change that I can be sure will make a difference.
The goroutine pattern you see with the server is likely caused because you do SSL termination in the game server which is heavily not recommended for a production game. Usually you’d use a dedicated load balancer to manage the SSL negotiation which also separates the CPU overhead of that work from the game server itself.
Is there a reason you’ve opted to avoid a dedicated load balancer with your game project?
We did not have LB in front of our servers just because of cost saving.
If it is the SSL termination, the weird thing is it is getting worse everyday, so feel like something is leaking over time after the clients connect/disconnect. Something is not cleaned properly I guess
@briar We’ve made another release of the server since this thread was opened.
I believe it will solve the issue you may have encountered with a memory leak around the SSL negotiation in the GRPC runtime we use in the server. We’ve updated the version of that library and those which depend on it within our latest release. Have a look at the release notes for more information: