Join the ArchiveTeam project to help index Reddit and other sites

PoorlyShavedApe@beehaw.org · 1 year ago

Join the ArchiveTeam project to help index Reddit and other sites

ram@lemmy.ca · edit-2 1 year ago

I’ve been contributing as much as I can since I learned about this on the 9th ArchiveTeam Reddit Leaderboard entry for "rammyramram" listing 40.06 GiB or 425.96k items uploaded

Is there a reason I’m limited to only 6 items running concurrently though? I’d like to actually use some of my resources to help with this.

PoorlyShavedApe@beehaw.org · 1 year ago

I believe it is to prevent getting your IP blocked. At least that is what I got from the FAQ.

rektifier · edit-2 1 year ago

This is true. If you run the reddit-grab project directly without using the warrior (sudo docker run -d --name reddit --label=com.centurylinklabs.watchtower.enable=true --restart=unless-stopped atdr.meo.ws/archiveteam/reddit-grab --concurrent 6 yourname), you can set up to --concurrent 20, and some projects do work well with higher concurrent, but not reddit. 6 is already pushing the limit.

I’m running reddit-grab on 25 VMs on azure (trying to burn my $200 free credit that expires in 10 days) and I can only run --concurrent 4 safely on most of them. The only VMs that can run --concurrent 6 are the ones in India, which seem to be soft-ratelimited by their higher latency anyway.

Join the ArchiveTeam project to help index Reddit and other sites

Join the ArchiveTeam project to help index Reddit and other sites

ArchiveTeam Warrior