Hi,

Just set up an instance on my little nuc here at home but it is sooo empty in “all communities”. Are you aware of a script that e.g. extracts all community https links from lets say lemmy.world and adds them 1 by 1 in my search field so that my instance starts federating? Thanks

  • lackthought@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 years ago

    lemmy.directory and browse.feddit.de both seem to index all lemmy communities

    I’m not sure on the technical details of how they do it but you could try asking around on those instances for some advice

    • Admiral Patrick@dubvee.org
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      2 years ago

      I’m still learning the API, but if you wanted to build your own community crawler, you could do something like this:

      • Get a list of instances. You can call /api/v3/site against your own and look at the federated_instances.linked key and query each of those. If your federation list is small, you could call that API endpoint for another instance, such as lemmy.ml or beehaw.org to get theirs.
      • For each of the domains returned, call {domain}/api/v3/community/list?type_=Local to get a JSON list of its local communities
      • Store the results however you want and then create a UI to work with them as you need.

      You probably don’t want to crawl and subscribe to every community on every instance. That would cause a lot of unnecessary traffic for both you and the instances you’re subscribed to.

      To quickly make a remote community “known” to your server, you can massage the Lemmy-UI’s search endpoint to make it work a little better.

      https://{your-instance-url}/search/q/[email protected]/type/All/sort/TopAll/listing_type/undefined/community_id/0/creator_id/0/page/1

      Replacing !community@domain.tld with the remote community address you want. Can be in either !community@domain.tld or https://domain.tld/c/community formats - both will work. Calling the search page that way will wait until there is a result before rendering rather than showing “no results” and making you click search two or three times until it appears. Same process on the backend, just cleaner in the front.

      You can do that through the backend API similarly, but you’d have to put your JWT in an auth URL param which I wouldn’t suggest against the instance’s public URL.

      Dove into the API last night to see how difficult it would be to use the REST API to create something resembling RES for old.reddit.com. That’s about as far as I got before life got in the way, so hopefully that’s useful for someone.