Scaling Authoritative Matches for 200 Players with High Tick Rate: Best Practices and Configuration Tips

Hi everyone,

I’m working on a multiplayer game using Nakama where I aim to host authoritative matches with around 200 clients, all sharing real-time data like player positions and rotations. In addition to players, there are multiple objects in the game world that also need to share their positions and rotations with all clients.

Current Setup:

  • Authoritative Matches: I’m using authoritative matches that hold a game state. The game state includes information about all players and objects, structured like this:
{
  label: {},
  transforms: {
    object1: {
      position: { x, y, z },
      rotation: { x, y, z }
    },
    object2: {
      position: { x, y, z },
      rotation: { x, y, z }
    },
    // ...more objects
  },
  players: {
    player1: {
      position: { x, y, z },
      rotation: { x, y, z }
    },
    // ...more players
  }
}
  • Message Handling: When sending player or object data, each has its own opcode with a JSON payload:

json

Copy code

{
  "timeStamp": "2024-11-08T15:24:05.226497+01:00",
  "UserId": "1746c806-c306-4e81-a728-a4fda6a7aead",
  "position": { "x": 0, "y": 0, "z": 0 },
  "rotation": { "x": 0, "y": 0, "z": 0 }
}
  • Tick Rate: For my game design, I need a high refresh rate and I’m aiming for a tick rate of 20, updating player and object positions 20 times per second.
  • Match Loop: My match loop is based on a tutorial I watched. I loop through incoming messages, use a switch statement depending on the opcode, and only parse JSON when necessary. After processing all incoming messages, I calculate the most current transforms and use dispatcher.broadcastMessageDeferred() to share the new state with all clients, then return the updated state.

Problem:

Despite my efforts, I can currently have a maximum of about 15 players in a session who can send and receive game state at a rate of 20 times per second. Beyond that, connections drop, messages can’t be delivered, and socket clients get disconnected.

Current Configuration:

Here is my Nakama configuration:

yaml

Copy code

# name: nakama_de
data_dir: "./data/"

api:
  http_port: 7350
  address: "0.0.0.0"

logger:
  file: "/nakama/logs/logfile.log"
  max_size: 100
  max_age: 30
  rotation: true

runtime:
  js_entrypoint: "build/index.js"
  env:
    - NAKAMA_CORS_ALLOW_ORIGINS=*
    - NAKAMA_CORS_ALLOW_METHODS=GET,POST,PUT,DELETE,OPTIONS
    - NAKAMA_CORS_ALLOW_HEADERS=Authorization,Content-Type

session:
  token_expiry_sec: 7200 # 2 hours

match:
  input_queue_size: 8192
  call_queue_size: 8192
  deferred_queue_size: 8192

socket:
  address: "0.0.0.0"
  max_message_size_bytes: 262144 # Increased buffer for larger messages
  max_request_size_bytes: 262144 # Larger buffer to handle bigger requests
  read_buffer_size_bytes: 262144 # Larger read buffer
  write_buffer_size_bytes: 262144 # Larger write buffer
  outgoing_queue_size: 8192 # Higher queue size to handle more messages per session

  # Connection lifecycle settings
  idle_timeout_ms: 120000 # Default: 60000
  ping_period_ms: 100000 # Default: 15000
  pong_wait_ms: 120000 # Default: 25000
  write_timeout_ms: 120000 # Default: 10000
  read_timeout_ms: 120000 # Default: 10000
  write_wait_ms: 120000 # Default: 5000

console:
  port: 7351
  address: "0.0.0.0"

Questions:

  1. Configuration Parameters: Are there specific configuration parameters I could adjust further to support a higher number of concurrent players at a high tick rate?
  2. Best Practices for High Player Count: What are your experiences with accommodating a high number of players in an authoritative match with a complex shared state? Are there patterns or optimizations I should consider?
  3. Optimizing Message Handling: Is my approach to handling incoming messages and broadcasting updates efficient for this scale? Would it be better to batch updates differently or use a different message format to reduce overhead?
  4. Scaling Strategies: Should I consider sharding the game state or distributing the load across multiple matches or servers? If so, how can that be achieved with Nakama?

Additional Context:

  • The server is running on a machine with sufficient resources (CPU, RAM, bandwidth). Please let me know if hardware specs would be helpful.
  • I’m using the JavaScript runtime for Nakama.
  • Clients are connected via WebSockets.
  • I’m open to using other data formats (e.g., Protobuf) if it would improve performance.

Any insights, experiences, or suggestions would be greatly appreciated!

Hello @exot,

What errors are you observing on the server logs when connections are dropped?

Hello @sesposito

When disconnecting I primarily see:

2024-11-19 17:39:42 {"level":"warn","ts":"2024-11-19T16:39:42.382Z","caller":"server/match_handler.go:277","msg":"Match handler data processing too slow, dropping data message","mid":"ead8effc-90e5-4a92-9c7e-7e6c3857ba4a","m":{"UserID":"778f1b22-df86-48e9-b1c0-43f512906605","SessionID":"d9ef957f-a694-11ef-95f8-5fac628c8b0d","Username":"K6_r0rq74ro","Node":"nakama_dev","OpCode":21,"Data":"eyJ0aW1lU3RhbXAiOjE3MzIwMzQzODIwMDAsIlVzZXJJZCI6Iks2X3IwcnE3NHJvIn0=","Reliable":false,"ReceiveTime":1732034382379}}

2024-11-19 17:39:41 {"level":"warn","ts":"2024-11-19T16:39:41.128Z","caller":"server/session_ws.go:334","msg":"Could not write message","uid":"6687b636-f966-4583-a969-df3969aca3a1","sid":"023d5a07-a694-11ef-95f8-5fac628c8b0d","error":"websocket: close sent"}

I am running in a docker, on my MacBook Pro M3. I am allocating 12 CPU cores to my docker instance, as well as 10GB of Ram.

@exot the volume of messages is too big, there’s 2 problems: the match handler can’t go through the received messages fast enough, so messages start to pile up in the input queue until max capacity is reached and then incoming messages are dropped until the buffer has capacity again; the client cannot handle the volume of messages being sent by the server - in this scenario the server drops the connection as otherwise it would have to drop outgoing messages.

I think the first step you need to take is to reduce the number of positional updates that are sent by the clients and by the server - coalescing the updates before sending them out may help with this.

Using protobuf may help further in reducing the overhead of message encoding/decoding and reducing their size.

Another optimization could be to move to the Go runtime as it’ll remove the overhead of the JS interpreter speeding things further.

Hope this helps.

@sesposito Is there a way, for us devs, to know when we see message in logs like that,that it comes to this?
Is it possible maybe to add to console additional logs, that explains that queue is full or match processing is being slow. It could be usefull as many folks at some point will probably hit this error.

We’ll see if we can make the error messages a bit more more explicit or easier to understand

Hello @sesposito thank you for those explanations.

I understand that the server cant keep up with the volume of messages. Would more CPU power help in this case? More CPU cores? I could get my hands on powerful servers with latest Ryzen CPUs. Would that help? Does nakama make use of multiple CPU cores/threads for the same match?

You say, the clients are overwhelmed with the amount of messages, but i am sending only one message from the server per tick. I do collect all the incoming transforms and calculate the latest state of transforms and then send out just one message - so i dont understand how the recieving end (the clients) get overwhelmed. Is there any configuration parameter that i could increase for the clients not to break?

I will also look into reducing the amount of messages send. And thank you for the hint to switch to the GO runtime! I’ll definitely do that!

Each match runs in a dedicated goroutine, more CPUs may help with more matches, but within a single match I don’t expect there to be significant difference, unless the server is currently unable to maintain the tick rate stable due to the amount of work being done in the match loop.

The clients may not have enough network bandwidth to receive that many messages or aren’t processing them fast enough, causing them to pile up and leading the server to close the connection once no more space is available in the buffer. You could increase the buffer further but this will likely only delay disconnection from happening. Using protobuf to encode the messages may help with this but I’d still try reducing the number of messages as much as possible first.