• umbrella@lemmy.ml
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    edit-2
    1 day ago

    thats precisely my point. if you have to break the law to be able to compile it yourself, its not foss.

    even if regular joes like you or me had the means to mass collect the data they did.

    • archomrade [he/him]@midwest.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      This might be controversial, but you and I both have the means to mass collect data, or find illicit datasets already collected. The kind of data collection that we don’t have access to (the kind that’s taken from your phone without your consent) isn’t really helpful for training LLM’s. But, again, if you have the means to replicate their methodology to begin with then you likely already have all of the material. You’re not going to recreate their model on consumer hardware anyway.

      They’re just not advertising where that data is (and neither should anyone here)

      if you have to break the law to be able to compile it yourself, its not foss.

      Not if you consider apps like jellyfin or plex to be FOSS, but even that comparison is apples and oranges because training a model that big isn’t something you can do on your own hardware. Just because they haven’t given you the data to alter the model doesn’t mean they haven’t given you everything you need to use it with your own data and your own hardware. I get that people inherently distrust AI companies (and Chinese companies especially, but I won’t get into that here), but I think it’s misplaced here.