Javascript runtime match performance

Hi! I’m using nakama for a project, and I’m trying to get some stress testing done and see what I can improve in terms of performance.

I’m currently using the Javascript runtime.

I built a quick stress test tool, that creates a number of “bots” and sends match data at a certain tick rate to the server. Since what I’m optimizing is the syncing of 3d player positions/movement I’m basically sending these updates every 100ms on the clients. Bots are separated in buckets in the tool, as to not send all updates at once, and then sit idle for the rest of the tick. So every 10ms a number of updates are sent (totalBots / numBuckets). This effectively means that each bot sends an update every 100ms, and the the server always has data incoming every 10ms.

The payload contains compressed/quantized values for Location, Rotation, LinearVelocity and AngularVelocity.

Now if I make the most basic match handler possible for these messages:

for (let k = 0; k < messages.length; ++k) {
    let message = messages[k];
    
    // Get the player representing the sender
    let senderPlayer: PlayerState = gameState.players[message.sender.sessionId];
    
    switch (message.opCode) {
        case MatchOpcode.MovementReplication:
            // Read our compressed message
            let bitReader = new BitReader(new Uint8Array(message.data));
            let actorReplicationMessage: ActorReplicationMessage = new ActorReplicationMessage(bitReader);
            
            // Update the state with the location
            senderPlayer.location = actorReplicationMessage.location;

            // Broadcast the original message to other presences
            dispatcher.broadcastMessage(message.opCode, message.data, null, message.sender, false);

            break;
    }
}

Some very non-scientific time calculation using Date.now I get:

nakama-1    | {"level":"warn","ts":"2024-07-21T05:07:50.148Z","caller":"server/runtime_javascript_logger.go:84","msg":"92 message in Execution time: 4 ms","mid":"b72a5a09-1e80-4993-96a9-1e369165e822"}
nakama-1    | {"level":"warn","ts":"2024-07-21T05:07:50.250Z","caller":"server/runtime_javascript_logger.go:84","msg":"100 message in Execution time: 6 ms","mid":"b72a5a09-1e80-4993-96a9-1e369165e822"}
nakama-1    | {"level":"warn","ts":"2024-07-21T05:07:50.350Z","caller":"server/runtime_javascript_logger.go:84","msg":"100 message in Execution time: 6 ms","mid":"b72a5a09-1e80-4993-96a9-1e369165e822"}
nakama-1    | {"level":"warn","ts":"2024-07-21T05:07:50.450Z","caller":"server/runtime_javascript_logger.go:84","msg":"100 message in Execution time: 6 ms","mid":"b72a5a09-1e80-4993-96a9-1e369165e822"}
nakama-1    | {"level":"warn","ts":"2024-07-21T05:07:50.551Z","caller":"server/runtime_javascript_logger.go:84","msg":"99 message in Execution time: 6 ms","mid":"b72a5a09-1e80-4993-96a9-1e369165e822"}

So let’s call it 5ms to process 100 messages.

Now the previous code is quite simple and sends the message to everyone. Eventually I want to use a quadtree and an area/interest system to segment what users to send the message to, but for purposes of stress testing let’s settle for sending it to everyone except the sender!

PS: It would be nice if the broadcastMessage functions took a list of presences to NOT send the message to, instead/in addition of a list of presences to send the message to.

... 

// dispatch to each user other than sender
const playerList: string[] = Object.keys(state.players);
for (let i = 0; i < playerList.length; ++i)
{
    let player: PlayerState = gameState.players[playerList[i]];

    if (player.presence.sessionId == message.sender.sessionId) {
        continue;
    }
                
    dispatcher.broadcastMessage(message.opCode, message.data, [ player.presence ], message.sender, false);
}

break;
...

nakama-1    | {"level":"warn","ts":"2024-07-21T05:15:44.415Z","caller":"server/runtime_javascript_logger.go:84","msg":"111 message in Execution time: 100 ms","mid":"282d5ed8-ee30-45eb-bb91-748e5e02c3ee"}
nakama-1    | {"level":"warn","ts":"2024-07-21T05:15:44.512Z","caller":"server/runtime_javascript_logger.go:84","msg":"100 message in Execution time: 97 ms","mid":"282d5ed8-ee30-45eb-bb91-748e5e02c3ee"}
nakama-1    | {"level":"warn","ts":"2024-07-21T05:15:44.609Z","caller":"server/runtime_javascript_logger.go:84","msg":"100 message in Execution time: 97 ms","mid":"282d5ed8-ee30-45eb-bb91-748e5e02c3ee"}
nakama-1    | {"level":"warn","ts":"2024-07-21T05:15:44.697Z","caller":"server/runtime_javascript_logger.go:84","msg":"90 message in Execution time: 87 ms","mid":"282d5ed8-ee30-45eb-bb91-748e5e02c3ee"}

Ok, so that’s really bad. There’s a lot of overhead at calling broadcastMessage on each iteration. So let’s try collecting the list of presences first, and then passing that to broadcastMessage. Also let’s try broadcastMessageDeferred (had to increase the queue size quite a bit).

nakama-1    | {"level":"warn","ts":"2024-07-21T05:42:47.268Z","caller":"server/runtime_javascript_logger.go:84","msg":"100 message in Execution time: 18 ms","mid":"1e850662-ea1b-4759-9613-6eb87aaae60b"}
nakama-1    | {"level":"warn","ts":"2024-07-21T05:42:47.364Z","caller":"server/runtime_javascript_logger.go:84","msg":"100 message in Execution time: 13 ms","mid":"1e850662-ea1b-4759-9613-6eb87aaae60b"}
nakama-1    | {"level":"warn","ts":"2024-07-21T05:42:47.464Z","caller":"server/runtime_javascript_logger.go:84","msg":"90 message in Execution time: 14 ms","mid":"1e850662-ea1b-4759-9613-6eb87aaae60b"}
nakama-1    | {"level":"warn","ts":"2024-07-21T05:42:47.566Z","caller":"server/runtime_javascript_logger.go:84","msg":"110 message in Execution time: 16 ms","mid":"1e850662-ea1b-4759-9613-6eb87aaae60b"}

Ok, not great, not terrible. Compared to our initial 5ms, this is about 3x worse, just by collecting the list of users.

So let’s increase users! 200 users, 55ms. 300 users, match handler too slow. (running at a tickrate of 10 I get a target time of 100ms, anything over that and we lag messages behind and eventually drop them).


So the question here is, should I just focus on doing these things natively in Go (I haven’t yet even checked out how to use Go modules)? Is there a considerable performance difference?

Looking at the CPU usage of the docker container, I can see it peaking at around 200% of a 24 core system (2400% max). So there’s definetly room to process a lot more! Couldn’t messages be processed in buckets and sent to different threads? Is everything match related single threaded?

Is there a way to implement such task system if I go into the Go runtime? Sounds like messages should be quite an ideal candidate to separate in a bunch of worker threads.

Implementing a spatial subdivision to more easily get a list of presences the message should be sent to would be ideal, but this is a stress test, so around 200 users standing on top of each other would be my cap. I feel we should do more!

I’m open to suggestions :smiley:

Btw, it’s not lost on me that the common use case for Nakama is many matches with few users, and that’s probably properly multithreaded and occupies the CPU better than a single match with many users. So I’m not complaining, I’m just looking for options to adapt Nakama to my needs :slight_smile:

Hello @xamado :wave:

There’s certainly some overhead to the JS runtime, if you write the module in Go you’ll have more flexibility and you can also optimize by using Goroutines for worker threads if needed. In JS each match runs its own JS runtime in a Goroutine but there’s no multi-threading from within the runtime.

It would be interesting to see the difference between both implementations, for some simple match handlers you should be able to port them in no time, you can have a look at our project-template for the implementation of tic-tac-toe in both TS and Go.

We do have customers using the JS runtime at scale, there’s likely things we can tweak and tune but given your use-case of many smaller matches it may be interesting to see the difference out-of-the-box under the same configs and hardware.

Best.

1 Like