I already get rate-limited like crazy on lemmy and there are only like 60,000 users on my instance. Is each instance really just one server or are there multiple containers running across several hosts? I’m concerned that federation will mean an inconsistent user experience. Some instances many be beefy, others will be under resourced… so the average person might think Lemmy overall is slow or error-prone.

Reddit has millions of users. How the hell is this going to scale? Does anyone have any information about Lemmy’s DB and architecture?

I found this post about Reddit’s DB from 2012. Not sure if Lemmy has a similar approach to ensure speed and reliability as the user base and traffic grows.

https://kevin.burke.dev/kevin/reddits-database-has-two-tables/

  • Netto Hikari@social.fossware.space
    link
    fedilink
    English
    arrow-up
    4
    ·
    2 years ago

    Well, I run an instance, too. It’s not big at all, but I was thinking about the issue of scaling, too. You can only scale up a single server so much…

    But on the other hand, Lemmy is still young. We’ll find solutions to that problem.

    Also, interesting article. I only took a glance at it, but having only two tables kind of suggests that Reddit is using a relational database. So, if they’re not “normalizing” everything, why not use a completely different paradigm, like what MogoDB etc. has?

    • Irisos@lemmy.umainfo.live
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      2 years ago

      The database isn’t really the problem in the current state of things. The server is because:

      • Until 0.18 there was no caching (for the UI) and the poorly implemented websockets
      • The developers have admited that they aren’t proficient in SQL, in which case, why not using an ORM instead? Sure, they aren’t perfect but they will do better than the average developer at scale.
      • There is no queue system for activityPub requests
      • Because there is no queue, user requests and federation have the same priority when it shouldn’t and one can bottleneck the other
      • Live inserts are used meaning that regardless of the DB used, performance is going to be killed since inserting data 1 at a time several times a second is a major waste of resource

      Tl;dr: It’s trying to do everything and not that well. So users suffer because they have to share resources with non-UI related tasks.

      The database suffer because it has to do an insert of 1 object X 50 times in a second when it could do it once for all 50 items.

      Federation suffers because you can’t offload it to a seperate machine farm whose job will be to receive and send ActivityPub requests and send/read data from the correct queues to do so.

      • BitOneZero @ .world@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        2 years ago

        Federation also does a lot of live HTTP connects to other peers. It looks up users for messages. The whole design is very resource intensive, one single vote, comment, post at a time. There is also a lot of boilerplate JSON overhead in sending something as simple as a single vote.