Sockets are highly unstable

Hi there!
I’m getting a lot of sockets disconnection in realtime authoritative matches and I don’t really have any clue why is this happening. I’m just synching the position and rotation 10 times per second and players are disconnected very easily without apparent reason (server is up and running fine)

Something I also noticed is the message overhead nakama adds my messages are a little less than 60 bytes, but then the message after being encapsulated is almost 200

Can anybody share some light about this? Any help is appreciated

“A lot of sockets disconnection” and “disconnected very easily” are hard to work with, can you share concrete numbers for connected user count, disconnect frequency, message rate at the time of disconnect etc. The server also always logs a reason why sockets close, if it’s an unexpected disconnect. What do the server logs show? What server and client library platform/version are you on?

Messages have a constant-size envelope on top of your own content - the envelope size doesn’t grow with your payload. This won’t be the reason your sockets close.

Sorry if it sounded harsh, I didn’t mean to and it was far from my intention.

Basically the sockets are closed in the middle of a match (of two players). These disconnections happen from only one match (two players) to three simultaneously (six player, the maximum we have tested).
The match is a fast paced game and I’m synchronizing a transform 10 times a second. The transforms includes position, velocity and rotation (36 bytes) and after the enveloping it’s more than 150 bytes

The only log I’ve got from the server is

{"level":"info","ts":"2020-02-04T17:33:07.449Z","msg":"Closed client connection","uid":"3146afa7-21a9-434f-a6cf-b1d1c6dd7adc","sid":"e4d9867f-a872-4ca3-8dc1-54755e164f5d"}

I’m using nakama 2.8.0 as a docker image and .net client taken from github

Kind regards and thanks in advance

Hi! We are developing a fast past action game but we haven’t moved to stress tests yet, i am interested in the conclusion of this issue. Thx!

@danim Nakama has a lot of different tuning parameters that can be applied to help fit specific types of games.

For example the server has socket send buffers and socket receive buffers - both of these are configured by default to prevent bad clients that try to perform a denial of service attack with what’s called a “slow clients overload”. These can be tweaked to better fit the specifics of the game. We do this often to help the game studios that support our team.

There’s also no real information here about your hardware setup which should also be chosen to appropriately fit your game type. Remember that Nakama is a highly scalable game server we’ve deployed with customers that handle fast paced gameplay on mobile. But like any game server it needs to be tuned on the hardware to achieve the most optimal setup.

We’ll probably need to speak on a call to be able to help further.

Well I’ve coded my own WebSocketAdapter that encapsulates the .net ClientWebSocket (instead of using Ninja Socket provided). The idea was to see if I can get more details about the errors.
Right now the errors I’m getting when the socket is closed are:
The remote party closed the WebSocket connection without completing the close handshake
and the other one is…nothing just socket closed

Is there any way to get extra info on the server side about the socket disconnection?

Run the server with --logger.level debug, it should print more output about socket behaviour.

@danim If you use a non-official client sdk with your own socket driver to connect to Nakama don’t you think its unreasonable to describe the socket connectivity as “highly unstable”?

But that’s no what I said. I had to implement my own socket because the default one wasn’t doing things properly (no disconnect info, disconnection time out wasn’t handled correctly) and using a third party websocket when .net has its own seemed reasonable to me (not to mention ninjasockets GitHub tells the implementation is prior 2017, before .net had provided one)

El El sáb, 8 feb 2020 a las 13:15, Chris Molozian via Heroic Labs heroiclabs@discoursemail.com escribió:

@danim The official client sdk is in use in production games with a vast amount of realtime features from the game server. It’s an extremely stable piece of code that we’ve designed and maintained for over 2 years. There is always improvements we can make but have had no issues with the customers who run it in their production games on mobile, desktop, and web (i.e. WebGL exports).

default one wasn’t doing things properly (no disconnect info, disconnection time out wasn’t handled correctly)

What exactly do you mean that disconnection wasn’t handled properly? Please be specific and open a pull request for any improvements you think would be good to see. This helps all developers and studios in the community and is a nice way to give back.

using a third party websocket when .net has its own seemed reasonable to me

What you probably don’t know is that Unity uses a very old System.Net.WebSockets implementation from Mono that has a known issue where sockets will be disconnected after ~300 seconds. It also has a second issue where DNS resolution fails if the IPv6 address is tried before IPv4. These both continue to be problems on the latest Unity 2018 LTS releases.

There are very specific reasons we manage the socket code the way we do. It is not a limitation of how the team thinks about Unity or .NET libraries. We strive to use the latest and greatest technologies and are always open to new suggestions. Nevertheless we also have to work within the limitations of Unity engine as a platform.

not to mention ninjasockets GitHub tells the implementation is prior 2017, before .net had provided one

You should also look at other popular projects that integrate with Unity engine like the Mirror project which also uses Ninja.WebSockets to work around bugs in the game engine. The reason Ninja.WebSockets is no infrequently updated is because it’s feature complete and stable and not because it is unmaintained.

Hope this helps.

You are right, sorry, I’ll post on another thread

@danim No need to open another thread when it can be discussed here.

Okay I thought it will deviate the topic but here I go.

Use the default WebSocketAdapter (that encapsulates NinjaSockets) and connect to host like this:

m_client = new Client(connectionType, host, port, serverkey);
m_socket = Socket.From(m_client);
string user = "someid"
m_session = await m_client.AuthenticateDeviceAsync(user, user);
await m_socket.ConnectAsync(m_session, true, 10);

Then let’s say I had a heartbeat for keeping the socket alive:

async void heartBeat()
{
    while(alive)
    {
        try
        {
            if (m_socket.IsConnected)
            {
                Debug.Log("[heartBeat] sending ping");
                IApiRpc apiResponse = await m_socket.RpcAsync( "my_ping", "{}" );
            }
        }
        catch (Exception ex)
        {
            Debug.LogErrorFormat("[heartBeat] Error: {0}", ex);
            await Connection.CloseSocket();
        }

        await Task.Delay( 5000 );
    }
}

If I turn off the internet connection, the timeout is not raised until…two minutes? more less. If we check the source code at: https://github.com/heroiclabs/nakama-dotnet/blob/master/src/Nakama/Socket.cs the default time is 30 seconds. Maybe this is not the parameter I’m looking for, but is the only accesible from the client (when I say client I mean the game, not the nakama’s library). And we can’t let the users wait for two minutes for a disconnection notification.
Another thing that happens is if I turn on the internet connection again before this error I receive the following two errors (depending on the send timeout, I think).
The short time one is a task canceled (with a socket closed event), which is weird, for the fact I have to wait to have internet again to receive it.

The other one:

Unable to read data from the transport connection: Connection reset by peer

which is ok and I can try to reconnect it without many problems, but an auto reconnect could be nicer.

I can provide a simple test project if you like

Thanks! Now I can get more info on the server-side

which is ok and I can try to reconnect it without many problems, but an auto reconnect could be nicer.

@danim I agree. It’d definitely be nice to add auto-reconnect logic to the client sdk. Unfortunately what you’ve discovered above is why we’ve not implemented it yet.

If I turn off the internet connection, the timeout is not raised until…two minutes? more less.

This is due to how the native sockets are implemented by Unity engine as an abstraction on the .NET cross-compiled code. A disconnect is not always observed quickly. It varies quite a bit between OS to OS and even between versions within Android.

We’ve not found a good way to work around this issue so in most cases a small client-side counter can be checked to see whether a message hasn’t been sent (that is expected to have been) over some period of time (all game specific logic because it varies greatly). That way the game client code can force close the connection and attempt to open a new one and reconnect programmatically in the background.

I’m open to areas the code could be improved or whether we’ve missed some detail about how to find the disconnect quicker. We already implement both client-side and server-side heartbeats on the sockets. Have a look at the PingPongManager code.

You can shorten the time or add more time to be more graceful to slower connections on mobile:

var keepAliveIntervalSec = 5;
var socket = Socket.From(client, new WebSocketAdapter(keepAliveIntervalSec));

Okay with debug logs enabled on server side, I’m getting the following error:

{"level":"debug","ts":"2020-02-12T08:44:01.978Z","msg":"Error reading message from client","uid":"b015bb71-ca45-4c66-9a45-02f247e77eaa","sid":"13d8cc9e-6060-4a68-ab4a-55583e92e077","error":"read tcp 172.18.0.2:7350->62.174.75.20:44222: i/o timeout"}

Any hints for this problem?