Office space meme:

“If y’all could stop calling an LLM “open source” just because they published the weights… that would be great.”

  • ricecake
    link
    fedilink
    arrow-up
    5
    arrow-down
    4
    ·
    2 days ago

    Eh, it seems like it fits to me. We casually refer to all manner of data as “open source” even if we lack the ability to specifically recreate it. It might be technically more accurate to say “open data” but we usually don’t, so I can’t be too mad at these folks for also not.

    There’s huge deaths of USGS data that’s shared as open data that I absolutely cannot ever replicate.

    If we’re specifically saying that open source means you can recreate the binaries, then data is fundamentally not able to be open source, since it distinctly lacks any form of executable content.

    • Prunebutt@slrpnk.netOP
      link
      fedilink
      arrow-up
      3
      arrow-down
      3
      ·
      2 days ago

      If we’re specifically saying that open source means you can recreate the binaries, then data is fundamentally not able to be open source

      lol, are you claiming data isn’t reproducable? XD

      • ricecake
        link
        fedilink
        arrow-up
        2
        arrow-down
        3
        ·
        2 days ago

        … Did you not read the litteral next phrase in the sentence?

        since it distinctly lacks any form of executable content.

        Your definition of open source specified reproducible binaries. From context it’s clear that I took issue with your definition, not with the the notion of reproducing data.

        • Prunebutt@slrpnk.netOP
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          1 day ago

          Ok, then my definition givenwas too narrow, when I said “reproducable binaries”. If data claims to be “open source”, then it needs to supply information on how to reproduce it.

          Open data has other criteria, I’m sure.

          • ricecake
            link
            fedilink
            arrow-up
            1
            ·
            1 day ago

            my definition givenwas too narrow

            Yes, that’s what I said when you opted to take the first half of a sentence out of context.

            https://en.wikipedia.org/wiki/Open_data

            The common usage of open data is just that it’s freely shareable.
            Like I said in my initial comment, people frequently use “open source” to refer to it, but it’s such a pervasive error that it hardly worth getting too caught up on and practically doesn’t count as an error anymore.

            Some open data can’t be reproduced by anyone who has access to the data.

            • Prunebutt@slrpnk.netOP
              link
              fedilink
              arrow-up
              1
              arrow-down
              1
              ·
              1 day ago

              I was specifically addressing the use of the phrase “open source”. And the term “open data” doesn’t apply either, since it’s not a dataset that’s distributed, but rather weights of an LLM with data baked into it. That’s neither open source nor open data.