Lemmy.ml front page has been full of nginx errors, 500, 502, etc. And 404 errors coming from Lemmy.

Every new Lemmy install beings with no votes, comments, postings, users to test against. So the problems related to performance, scaling, error handling, stability under user load can not easily be matched given that we can not download the established content of communities.

Either the developers have an attitude that the logs are of low quality and not useful for identify problems in the code and design, or the importance of getting these logs in front of the technical community and trying to identify the underlying patterns of faults is being given too low of a priority.

It’s also important to make each log of failures identifiable to where in the code this specific timeout, crash, exception, resource limit is encountered. Users reporting generic messages that are non-unique only slow down server operators, programmers, database experts, etc.

There are also a number of problems testing federation given the nature of multiple servers involved and trying not to bring down servers in front of end-users. It’s absolutely critical that failures for servers to federate data be taken seriously and attempts to enhance logging activities and triangulate causes of why peer instances have missing data be track down to protocol design issues, code failures, network failures, etc. Major Lemmy sites doing large amounts of federation are an extremely valuable source of data about errors and performance. Please, for the love of god, share these logs and let us look for the underlying causes in hard to reproduce crashes and failures!

I really hope internal logging and details of the inner workings of the biggest Lemmy instances is shared more openly with more eyes on how to keep scaling the applications as the number of posts, messages, likes and votes continue to grow each and every data. Thank you.

Three recently created communities: [email protected][email protected][email protected]

    • RoundSparrow@lemmy.mlOP
      link
      fedilink
      arrow-up
      1
      arrow-down
      4
      ·
      edit-2
      1 year ago

      not in a way you mention sql cache, but will get better.

      Avoiding the SQL datababase and use caching is webapp programming 101, it is fundamental to all the crashes Lemmy is showing. We are talking the MOST BASIC thing in creating webapps. I really can’t over-emphasize this point.

      You don’t go query the site table every single time a federation incoming comment comes in.

      SELECT "local_site"."id", "local_site"."site_id", "local_site"."site_setup", "local_site"."enable_downvotes", "local_site"."enable_nsfw", "local_site"."community_creation_admin_only", "local_site"."require_email_verification", "local_site"."application_question", "local_site"."private_instance", "local_site"."default_theme", "local_site"."default_post_listing_type", "local_site"."legal_information", "local_site"."hide_modlog_mod_names", "local_site"."application_email_admins", "local_site"."slur_filter_regex", "local_site"."actor_name_max_length", "local_site"."federation_enabled", "local_site"."federation_worker_count", "local_site"."captcha_enabled", "local_site"."captcha_difficulty", "local_site"."published", "local_site"."updated", "local_site"."registration_mode", "local_site"."reports_email_admins" FROM "local_site" LIMIT $1

      And back to the very subjhect line of this posting, you SHARE YOUR CRASH LOGS when your server is crashing, why is lemmy.ml not putting the crash logs up on Girhub issues when for 30 days I’ve seen 500 errors on the front page?