LLM training bots are a plague

mapto@feddit.bg · 7 days ago

LLM training bots are a plague

pcouy@lemmy.pierre-couy.fr · 5 days ago

CIDR ranges (a.b.c.d/subnet_mask) contain 2^(32-subnet_mask) IP addresses. The 1.5 I’m using controls the filter’s sensitivity and can be tuned to anything between 1 and 2

Using 1 or smaller would mean that the filter gets triggered earlier for larger ranges (we want to avoid this so that a single IP can’t trick you into banning a /16)

Using 2 or more would mean you tolerate more fail/IP for larger ranges, making you ban all smaller subranges before the filter gets a chance to trigger on a larger range.

This is running locally to a single f2b instance, but should work pretty much the same with aggregated logs from multiple instances

froztbyte@awful.systems · 5 days ago

I’m aware of the construction of a CIDR prefix, I meant what are you using to categorise IPs from requests to look up mask size? whois? using published NIC/RIR data? what’s in BGP/routedumps? other?