Evidence for the DDoS attack that bigtech LLM scrapers actually are.

  • pcouy@lemmy.pierre-couy.fr
    link
    fedilink
    English
    arrow-up
    3
    ·
    5 days ago

    CIDR ranges (a.b.c.d/subnet_mask) contain 2^(32-subnet_mask) IP addresses. The 1.5 I’m using controls the filter’s sensitivity and can be tuned to anything between 1 and 2

    Using 1 or smaller would mean that the filter gets triggered earlier for larger ranges (we want to avoid this so that a single IP can’t trick you into banning a /16)

    Using 2 or more would mean you tolerate more fail/IP for larger ranges, making you ban all smaller subranges before the filter gets a chance to trigger on a larger range.

    This is running locally to a single f2b instance, but should work pretty much the same with aggregated logs from multiple instances

    • froztbyte@awful.systems
      link
      fedilink
      English
      arrow-up
      3
      ·
      5 days ago

      I’m aware of the construction of a CIDR prefix, I meant what are you using to categorise IPs from requests to look up mask size? whois? using published NIC/RIR data? what’s in BGP/routedumps? other?