I noticed that a lot of the posts from Lemmy.World are showing with few or no votes, and often no comments. Going to the actual post shows votes and comments.
Did something change with how we/they sync up?
Reddthat: https://reddthat.com/post/26198974
We enabled the CloudFlare AI bots and Crawlers mode around 0:00 UTC (20/Sept).
This was because we had a huge number of AI scrapers that were attempting to scan the whole lemmyverse.
It successfully blocked them… While also blocking federation 😴
I’ve disabled the block. Within the next hour we should see federation traffic come through.
Sorry for the unfortunate delay in new posts!
Tiff
It happens. Appreciate the effort! I noticed a marked uptick in the lemmit bot mirroring Reddit, so I wonder if it was a coincidence or a sibling effort.
Might be to much work but you can allow a subset of traffic to bypass a CF WAF rule if the federated traffic is identifiable vs the scrapers.
Edit: I’m reading up. What I said above may not apply to the one click thing: https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/
I do support turning it on after what I read at that link.
Edit 2: From here: https://developers.cloudflare.com/bots/get-started/free/#limitations
Limitations You cannot bypass or skip Bot Fight Mode using the Skip action in WAF custom rules or using Page Rules. Skip, Bypass, and Allow actions apply to rules or rulesets running on the Ruleset Engine. While Super Bot Fight Mode rules are implemented in the Ruleset Engine, Bot Fight Mode checks are not. This is why you can skip Super Bot Fight Mode, but not Bot Fight Mode. If you need to skip Bot Fight Mode, consider using Super Bot Fight Mode.
It’s like they tried to make that confusing to read.
Possibly, as it’s one generic endpoint, but it also blocked a few other things people in the fediverse created, which are mighty helpful in diagnosis of these and other issues.
So using some AI model or whatever CF uses is probably not going to be the best thing for us as it classified a POST request as a crawler?? 🤷
I’d have to whitelist every regular endpoint as well and then it gets messy as CF only gives you so much control as a free user.
So, for the moment I’ve blocked the most annoying ones based on UserAgent.
I’d have to whitelist every regular endpoint
That’s why I started with “this might be to much work” 😆. Seems like there would be a way to do it without the automated bot blocking just using allow and deny (or challenge I guess it is here). The list would be a bitch to create by hand but shouldn’t it exist already somewhere in the federation configs? If so you could broadly allow those while blocking or challenging otherwise. I guess it comes down to how do you identify bot traffic with free, without the tool on.
Full disclosure: I have CF Enterprise experience but I’m just guessing in the Lemmy/federation part and haven’t messed with CF free.
Thank you
deleted by creator