• reddig33@lemmy.world
    link
    fedilink
    arrow-up
    19
    ·
    5 months ago

    If you read the article you find this was a dataset from a nonprofit, available to anyone. The nonprofit used captions from a set of YouTube videos.

    “Most of the Pile’s datasets are accessible and open for anyone on the internet with enough space and computing power to access them.”

    That anyone included a lot of other big names in tech, not just Apple.

    Also I wasn’t aware that Apple had its own AI. I thought they were licensing stuff from others like OpenAI. I guess maybe this is some research project for an unannounced project?

    • Dudewitbow@lemmy.zip
      link
      fedilink
      arrow-up
      2
      ·
      4 months ago

      iirc some of apples meant to be ran device had open source models. that was probably done to get more users into wanting to build into it.

  • tomkatt@lemmy.world
    link
    fedilink
    English
    arrow-up
    25
    arrow-down
    6
    ·
    5 months ago

    And…?

    Not defending Apple here, but everyone with a vested interest in AI is doing it. Nobody is asking permission or respecting copyright in this race to the bottom.

      • adam_y@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        1
        ·
        4 months ago

        I know a lot of people are down voting your comment, but I want you to know they are down voting the idea that companies treat public content like public property.

        You shouldn’t be down voted for pointing that out.

        Its a problem with how we categorise content as either private or public without regard to copyright.

        It seems copyright is for big companies like Disney, but a YouTube creator isnt afforded the same protection for their creation. They are merely providing “content” no intellectual property.

        Anyway, I get what you were saying.

    • Savaran@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      5 months ago

      Right? I think people may be surprised as to what the contracts they agreed to say and whose consent on these platforms is needed. Sad but true.

  • mindbleach
    link
    fedilink
    arrow-up
    3
    arrow-down
    1
    ·
    4 months ago

    Yeah?

    AI models have been trained on every comment and JPG on the internet… and commercial movies on DVD… and every book in the library.

    Shoveling all of the content through a sluice of linear algebra is pretty dang transformative. The more they use, the less any piece matters.

    • best_username_ever
      link
      fedilink
      arrow-up
      15
      arrow-down
      2
      ·
      5 months ago

      Children don’t make millions by selling copies of all the books they skimmed.

      • thefartographer@lemm.ee
        link
        fedilink
        arrow-up
        4
        arrow-down
        2
        ·
        edit-2
        5 months ago

        Most children don’t (sick burn against the Grimm Brothers). I mean, fuck Apple and all of these companies, but they’re hoovering data from a publicly available resource using totally legal means.

        I know I’m snowballing here, but overreacting to this headline could end up supporting those who argue that web crawlers, plane-tracking bots, and the completely legal actions of Aaron Swartz that the Feds tried using to crucify him.

        Once again, fuck Apple, but the real villain in this scenario is either Google for allowing companies to train their AI models on their content, or the content creators who are still using YouTube.

        Since I can’t fault anyone who is trying to make a living by exploring Google, then I guess I’ll just add “fuck Google” to the pile.