• 8 Posts
  • 607 Comments
Joined 9 months ago
cake
Cake day: May 11th, 2024

help-circle









  • I’d be surprised if anything crawled from a site using iocaine actually made it into an LLM training set. GPT 3’s initial set of 45 terabytes was reduced to 570 GB, which it was actually trained on. So yeah, there’s a lot of filtering/processing that takes place between crawl and train. Then again, they seem to have failed entirely to clean the reddit data they fed into Gemini, so /shrug








  • We must stop this science on science violence.

    -> You mean peer review?

    Lol, don’t the publications farm that out and review none of it?

    -> We must stop this science on science violence!

    I think that’s just the corrupting influence of money and power.

    -> We use good methodology to show methodology has been systemically compromised.

    [citation needed]

    This one-scene play brought to you by: God, is it only Wednesday?!