Is there a simple way to severly impede webscraping and LLM data collection of my website?

Maroon@lemmy.world · edit-2 8 months ago

Is there a simple way to severly impede webscraping and LLM data collection of my website?

IphtashuFitz@lemmy.world · 9 months ago

Try using “curl -A” to specify a User-Agent string that matches Chrome or Firefox.

corroded@lemmy.world · 9 months ago

I probably should have specified I’m using libcurl, but I did try the equivalent of what you suggested. I even tried setting a list of user agents and having it cycle through. None of them work. A lot of anti-scraping methods use much more complex schemes than just validating the user agent. In some cases, even a headless browser will be blocked.