Frequent short downtimes lately?

doctortofu@reddthat.com · 1 year ago

Frequent short downtimes lately?

Tiff@reddthat.com · 1 year ago

Updates hiding in the comments again!

We are now using v0.18.3!

There was extended downtime because docker wouldn’t cooperate AT ALL.

The nginx proxy container would not resolve the DNS. So after rebuilding the containers twice and investigating the docker network settings, a “simple” reboot of the server fixed it!

Our database on the filesystem went from 33GB to 5GB! They were not kidding about the 80% reduction!
The compressed database backups went from 4GB to ~0.7GB! Even bigger space savings.
The changes to backend/frontend has resulted in less downtime when performing big queries on the database so far.
The “proxy” container is nginx, and because it utilises the configuration upstream lemmy-ui & upstream lemmy. These are DNS entries which are cached for a period of time. So if a new container comes online it doesn’t actually find the new containers because it cached all the IPs that lemmy-ui resolves too. (In this example it would have been only 1, and then we add more containers the proxy would never find them). 4.1 You can read more here: http://forum.nginx.org/read.php?2,215830,215832#msg-215832
The good news is that https://serverfault.com/a/593003 is the answer to the question. I’ll look at implementing this over the next day(s).

I get notified whenever reddthat goes down, most of the time it coincided with me banning users and removing content. So I didn’t look into it much, but honestly the uptime isn’t great. (Red is <95% uptime, which means we were down for 1 hour!).

Actually, it is terrible.

With the changes we’ve made i’ll be monitoring it over the next 48 hours and confirm that we no longer have any real issues. Then i’ll make a real announcement.

Thanks all for joining our little adventure!
Tiff

Stimmed@reddthat.com · 1 year ago

For number 4, can you set a cron job to constantly flush DNS cache?

Tiff@reddthat.com · 1 year ago

It’s the internal nginx cache. It /shouldn’t/ be a problem once I update the configuration to handle it.

We can add a resolver line with valid=5s so it will recheck every 5 seconds instead of whatever the internal docker TTL cache is.