With forewarning about a huge influx of users, you know Lemmy.ml will go down. Even if people go to https://join-lemmy.org/instances and disperse among the great instances there, the servers will go down.

Ruqqus had this issue too. Every time there was a mass exodus from Reddit, Ruqqus would go down, and hardly reap the rewards.

Even if it’s not sustainable, just for one month, I’d like to see Lemmy.ml drastically boost their server power. If we can raise money as a community, what kind of server could we get for 100$? 500$? 1,000$?

  • Leigh@lemmy.ml
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    You should use this relatively quiet time to migrate to a larger server, because when the time comes where you need to do it, you’re going to be in for a world of hurt. This is the calm before the storm–take advantage of it.

    Ultimately, you need to scale horizontally. You need to shard your database and separate out your different functions (database, front end, whatever back end applications you use, etc) onto different servers, all fronted by load balancers. That’s going to be the only way to even begin to handle increasing load. If you don’t have a small team of experienced engineers with a deep understanding of how to build for scale, and you get a sudden mass exodus of users from Reddit, you’re fucked. So if I were you, here’s what I’d do:

    1. Scale up to the largest instance type you can. If possible, switch (at least temporarily) to AWS and use something in the c6i instance family, such as the c6id.32xlarge. Billing for AWS instances is done by the hour, so you wouldn’t need to pay for an entire month up front if you only need that extra horsepower for a few days (such as when the blackouts are planned from the 12th through 14th).

    2. Because the above will do nothing but buy you time until you crash–and if you get a huge spike of users, without horizontal scaling, you WILL crash–migrate your DNS to something like Cloudflare. From there, configure workers to respond when health checks to your site fail, so that users attempting to access the site can be shown a static page directing them to something like http://join-lemmy.org or someplace, instead of simply getting 5xx errors.

    3. Once the hug of death is over, evaluate where you stand. Reduce your instance size, if you can, and start investigating what it’s going to take to scale horizontally.

    I’m not a SQL expert, but I am a principal network architect, and my day job for the last 15 years has been working on scale and automation for the world’s largest companies, including 7 years spent at AWS. In my world, websites like Reddit, as large as they are, are still considered to be of ‘average’ size. I can’t help you with database, but I’m happy to provide guidance around networking, DNS, scale, automation, security, etc.