• @[email protected]
    link
    fedilink
    3916 days ago

    Duh, it’s a ML algorithm that requires an enormous amount of feedback. It can’t get smarter than humans, because then there’s no one, or no data, who can tell if what it’s spewing is really clever or just nonsense.

    I hate what happened to common perception of “AI”. The whole amazing field of machine learning has been reduced to overhyped chatbots, with so many misconceptions repeated even by experts who should know better.

    • @[email protected]
      link
      fedilink
      416 days ago

      It can get smarter than every individual human because individuals are always less smart than a large collective and the LLMs train on the collective data of the internet.

      • @[email protected]
        link
        fedilink
        1216 days ago

        “Smarter” is the wrong way to look at it. LLMs don’t reason. They have limited ability to contextualize. They have no long term memory (in the sense of forming conclusions based on prior events).

        They potentially have access to more data than any individual human and are able to respond to requests for that data quicker.

        Which is a long way of saying that they can arguably be more knowledgeable about random topics, but that’s a separate measure from “smart,” which encompasses much, much more.

      • @[email protected]
        link
        fedilink
        116 days ago

        Except it’s dragged down by the average and sub average humans who’s data it’s trained on.

        So it’s maybe smarter then the average, MAYBE.

  • @[email protected]
    link
    fedilink
    1116 days ago

    Researchers are ringing the alarm bells, warning that companies like OpenAI and Google are rapidly running out of human-written training data for their AI models.

    There is so much more to this than the raw amount of data, this is not at all the bottle neck is seems to be. There’s a lot of room for progress when it comes to how we clean the data, how we train and the actual structures of the models.

    • 🔍🦘🛎
      link
      fedilink
      English
      416 days ago

      Yeah if AI can’t pinpoint something when it has ALL OF HUMAN KNOWLEDGE to draw from, it’s not the fault of the data set

    • @[email protected]
      link
      fedilink
      415 days ago

      Right? What happened to that whole “there are millions of pages of text being generated by all internet users every minute” thing that people used to say? Look at lemmy alone. Look how much text we are putting into the ether every day. They’re not ever going to run out of text unless people stop typing. Is this not a fake problem?

  • @mindbleach
    link
    916 days ago

    One, the subject is LLMs, and I point this out because I’ve suffered multiple cycles of opinionated meatbags saying computers can never ever become intelligent. I fully expect some dingus to shove headlines like “AI can’t get smarter!” in people’s faces, like it’s divine writ.

    Two, more training is what makes these things smarter. Data was only a major obstacle when there was next to nothing. And they didn’t just pour the new data into the old setup; every major iteration reconfigures the network. Deeper tends to be better but is slow to train. Wider is a cheap path to novel results but requires obscene amounts of memory. Naturally the companies dumping money into this (because they’ve gambled their reputation on an unproven new thingamajig) are only trying to scale up up up - and that’s why this limit appears. A lot more neat shit is going to arise from small networks. They’ll be organized with better human insight (partly derived from the experience of these big dumb money sinks) and they’ll train much more quickly on much more affordable machines.

    Three, tech is not why these idiot corporations are struggling. The tech works as engineers promised. It’s the marketing and executives who promised the moon and the stars as soon as this could almost hold a conversation. We the dorks were cautiously optimistic about the emergent properties. GPT-3 could sorta do math. Yeah yeah yeah, computers doing math doesn’t sound surprising, but the network would have to do math the way you do math.

    We the dorks also pointed out that GPT was set up so it was incapable of holding an opinion. It’d finish your side of the conversation if you left that open. And sometimes it’d do a really good job. This approach may get a lot closer to intelligent than critics are comfortable with. Every advancement in AI demonstrates how little we understand ourselves, via endless failed predictions that ‘only a sentient mind could do [blank].’

  • @[email protected]
    link
    fedilink
    English
    416 days ago

    It’s a bit bewildering because I see about 50% of these articles and 50% people saying AGI is possible within a few years. Anyone actually have a bead on the reality of the situation here?

    • @[email protected]
      link
      fedilink
      516 days ago

      Seems this is just talking about LLMs, which, frankly, are a glorified auto-correct. An actual AI that can think for itself, learn, and adjust it’s own programming would be a whole different beast.

  • AutoTL;DRB
    link
    fedilink
    English
    116 days ago

    This is the best summary I could come up with:


    Researchers are ringing the alarm bells, warning that companies like OpenAI and Google are rapidly running out of human-written training data for their AI models.

    It’s an existential threat for AI tools that rely on feasting on copious amounts of data, which has often indiscriminately been pulled from publicly available archives online.

    The controversial trend has already led to publishers, including the New York Times, suing OpenAI over copyright infringement for using their material to train AI models.

    The latest paper, authored by researchers at San Francisco-based think tank Epoch, suggests that the sheer amount of text data AI models are being trained on is growing roughly 2.5 times a year.

    Extrapolated on a graph, that means large language models like Meta’s Llama 3 or OpenAI’s GPT-4 could entirely run out of fresh data as soon as 2026, the researchers argue.

    In a paper last year, scientists at Rice and Stanford University found that feeding their models AI-generated content causes their output quality to erode.


    The original article contains 476 words, the summary contains 164 words. Saved 66%. I’m a bot and I’m open source!

  • @[email protected]
    link
    fedilink
    -116 days ago

    This should have been pretty obvious from the outset. The annoying thing is that it’s still being touted by people who should know better.

      • @[email protected]
        link
        fedilink
        114 days ago

        Just to be clear my issue with it is how commercialised it is, not the tech itself which is actually kind of interesting. This ‘wall’ though should have always been obvious and the solution isn’t just more llms or more training data, for actual advancements people need to be putting money into other areas of research.