"Error pinging database" on nakama start up but migrate up completes successfully

Heya, I’v been trying to set up nakama to run from source / binary on Windows following the steps on this page: https://heroiclabs.com/docs/install-start-server/

I’m hoping theres some detail I’m missing like version missmatch.

I’ve set up cockroach db, I’ve ran migrate up and it completed successfully. And I can see the nakama titled database in the cockroach db webui.

But when I try and start nakama itself it comes out with the following error / crash.

{
“level”:“fatal”,“ts”:“2019-08-28T16:13:16.982+0100”,“msg”:“Error pinging database”,“error”:“context deadline exceeded”,“stacktrace”:“main.dbConnect
C:/Users/xxxx/Documents/repos/git_nakama/nakama/main.go:244\nmain.main
C:/Users/xxxx/Documents/repos/git_nakama/nakama/main.go:109\nruntime.main
c:/go/src/runtime/proc.go:200”
}

I’ve been googling around and looking through forums for the past couple of hours but there does not appear to be anything close to what I am seeing.

Environment:
version 19.1.4 of cockroach as I did not spot any requirement for an older version.
And I am using the latest commit to master of nakama as well (hash: 904961fa81ea4181bf01460e2269516f54ea6490).
And go version go1.12.9 windows/amd64

1 Like

Hey @igeorgievsd. Welcome. Lets get this sorted so you can dive into Nakama on Windows for your development. What does the start command args look like you use with the binary?

Heya here is a dump of the migrate up and start up commands I am running for nakama as well as the output that follows them. I do start the DB before running starting nakama itself of course :slight_smile:.

MIGRATE UP
C:\Users\xxxx\Documents\repos\git_nakama\nakama>nakama.exe migrate up
{“level”:“info”,“ts”:“2019-08-28T16:11:29.218+0100”,“msg”:“Database connection”,“dsn”:“root@localhost:26257”}
{“level”:“info”,“ts”:“2019-08-28T16:11:50.240+0100”,“msg”:“Database information”,“version”:“CockroachDB CCL v19.1.4 (x86_64-w64-mingw32, built 2019/08/06 15:43:11, go1.11.6)”}
{“level”:“info”,“ts”:“2019-08-28T16:11:50.241+0100”,“msg”:“Using existing database”,“name”:“nakama”}
{“level”:“info”,“ts”:“2019-08-28T16:11:50.255+0100”,“msg”:“Successfully applied migration”,“count”:0}

START NAKAMA
C:\Users\xxxx\Documents\repos\git_nakama\nakama>nakama.exe
{“level”:“warn”,“ts”:“2019-08-28T16:12:55.964+0100”,“msg”:“WARNING: insecure default parameter value, change this for production!”,“param”:“console.username”}
{“level”:“warn”,“ts”:“2019-08-28T16:12:55.964+0100”,“msg”:“WARNING: insecure default parameter value, change this for production!”,“param”:“console.password”}
{“level”:“warn”,“ts”:“2019-08-28T16:12:55.965+0100”,“msg”:“WARNING: insecure default parameter value, change this for production!”,“param”:“console.signing_key”}
{“level”:“warn”,“ts”:“2019-08-28T16:12:55.965+0100”,“msg”:“WARNING: insecure default parameter value, change this for production!”,“param”:“socket.server_key”}
{“level”:“warn”,“ts”:“2019-08-28T16:12:55.966+0100”,“msg”:“WARNING: insecure default parameter value, change this for production!”,“param”:“session.encryption_key”}
{“level”:“warn”,“ts”:“2019-08-28T16:12:55.966+0100”,“msg”:“WARNING: insecure default parameter value, change this for production!”,“param”:“runtime.http_key”}
{“level”:“info”,“ts”:“2019-08-28T16:12:55.966+0100”,“msg”:“Nakama starting”}
{“level”:“info”,“ts”:“2019-08-28T16:12:55.967+0100”,“msg”:“Node”,“name”:“nakama”,“version”:“2.0.0+dev”,“runtime”:“go1.12.9”,“cpu”:16,“proc”:16}
{“level”:“info”,“ts”:“2019-08-28T16:12:55.967+0100”,“msg”:“Data directory”,“path”:“C:\Users\xxxx\Documents\repos\git_nakama\nakama\data”}
{“level”:“info”,“ts”:“2019-08-28T16:12:55.968+0100”,“msg”:“Database connections”,“dsns”:[“root@127.0.0.1:26257”]}
{“level”:“fatal”,“ts”:“2019-08-28T16:13:16.982+0100”,“msg”:“Error pinging database”,“error”:“context deadline exceeded”,“stacktrace”:“main.dbConnect\n\tC:/Users/xxxx/Documents/repos/git_nakama/nakama/main.go:244\nmain.main\n\tC:/Users/xxxx/Documents/repos/git_nakama/nakama/main.go:109\nruntime.main\n\tc:/go/src/runtime/proc.go:200”}

START COCKROACH
C:\Users\xxx\Documents\cockroach-v19.1.4.windows-6.2-amd64>cockroach.exe start --insecure
*

  • WARNING: RUNNING IN INSECURE MODE!
    • Your cluster is open for any client that can access .

  • NING: neither --listen-addr nor --advertise-addr was specified.
  • The server will advertise “xxxx” to other nodes, is this routable?
  • Consider using:
    • for local-only servers: --listen-addr=localhost
    • for multi-node clusters: --advertise-addr=<host/IP addr>

CockroachDB node starting at 2019-08-28 15:10:15.5488464 +0000 UTC (took 1.1s)
build: CCL v19.1.4 @ 2019/08/06 15:43:11 (go1.11.6)
webui: http://xxxx:8080
sql: postgresql://root@xxxx:26257?sslmode=disable
client flags: cockroach.exe --host=xxxx:26257 --insecure
logs: C:\Users\xxxx\Documents\cockroach-v19.1.4.windows-6.2-amd64\cockroach-data\logs
temp dir: C:\Users\xxxx\Documents\cockroach-v19.1.4.windows-6.2-amd64\cockroach-data\cockroach-temp032257399
external I/O path: C:\Users\xxxx\Documents\cockroach-v19.1.4.windows-6.2-amd64\cockroach-data\extern
store[0]: path=C:\Users\xxxx\Documents\cockroach-v19.1.4.windows-6.2-amd64\cockroach-data
status: restarted pre-existing node
clusterID: 630f2dc5-b045-47a7-9c82-be3d8a777c98
nodeID: 1

1 Like

I suspect this is a default database address issue. In the migration command we attempt to connect to a database at localhost, but in main server startup we use 127.0.0.1.

Can you start the server with nakama.exe --database.address "root@localhost:26257" to confirm?

I did spot that difference when I was doing my initial troubleshooting but giving it the address did not seem to have an effect.

When I just tried it now, for sanity it worked fine.
Unfortunately I do not have the cmd output from last night to verify but chances are I had a typo.

Regardless that seems to have resolved itself. Thanks :slight_smile:

As it may be relevant I was using mostly powershell yesterday. Some operations seemed to execute slower overall. Starting cockroach would take several seconds, migrate up would take 10-20 or so, but would succeed. And obviously after the 15s timeout starting nakama would fail.
I did do several restarts and eventually swapped to cmd later in the day and observed similar behaviour.
But this morning everything worked in less than a second, as I’d expect it to, and I did leave my computer on overnight so there should be no difference with my last attempts last night.

A quick update as the issue does not seem to be quite resolved only mitigated. I just restared my computer for another reason and when I went to start nakama again got the same error as before.

I ran the command a second time, on the off chance it succeeded, and it just worked and started up no problem.

As a test, I then ran the command around 10 times back to back starting and stopping nakama and doing nothing else in between. Nakama failed to start with that error two of the 10 times. Cockroach DB was running the entire time.

Command I ran: nakama.exe --database.address “root@localhost:26257” and I do not see anything different in the output.

There are a couple of things on my computer, I can think of that may cause this, but I would also expect that behaviour to be more consistent and deterministic. There are externally managed windows firewall / security settings as well as Cisco AnyConnect.

For the behaviour I just described where the server failed to start 2 / 10 times Cisco was NOT connected the entire time. With Cisco connected things seem to work fine and connect successfully.

I will keep this thread updated if I find any solid reproduction steps.

I’m not familiar with Cisco AnyConnect but it sounds like it does have some effect. I know some VPN applications can block some/all traffic on various network interfaces when they’re not connected as a security measure. Any extra info you can provide would be helpful.

In any case we’ll see what we can do to improve at least the feedback you see in the logs of what the server is doing. :+1:

From what I have seen in our case Cisco creates a new adapter but does not limit other ones. Localhost and LAN still work as expected.

Most of our team uses Windows but we do have a couple who are on Macs and it will be interesting to see the behaviour on there.

I will update the thread once / if I find out anything more concrete.

That would be good thanks. I did have a look through the source and tried to add some logs and get it to give me some more information but it did seem to be just a straight up time out as it is a direct call to the sql module which errors out. Admittedly I did not spend too long trying to debug it as it seemed like of of those things that should not break like this.