Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds. Researchers found wild fluctuations—called drift—in the technology’s abi…::ChatGPT went from answering a simple math correctly 98% of the time to just 2%, over the course of a few months.

    • WhatAmLemmy@lemmy.world
      link
      fedilink
      English
      arrow-up
      28
      arrow-down
      1
      ·
      1 year ago

      You wildly overestimate the competency of management and the capital owners they answer to.

      I guarantee a significant % of entities will grow dependent on AI well before it’s dependable. The profit motive will be too high (source: the frequent failure that is outsourcing).

      • unconfirmedsourcesDOTgov@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        9
        ·
        1 year ago

        This is spot on. Source: 10+ years at F500 companies.

        Senior management and/or board members read one article in Forbes, or some other “business” publication, and think that they know everything they need to know about an emerging technology. Risk management is either a ☑ exercise or extremely limited in scope, usually only including threats that have already been observed and addressed in the past.

        Not enough people understand the limitations of this kind of tech, and contextualize it in the same frame as outsourcing because as long as the output mostly looks correct, the decision makers can push the blame for any issues down to the middle managers and below.

        Gonna be a wild time!

        • TheDarkKnight@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          ·
          1 year ago

          Definitely not my experience at F100, they are cautious as fuck about everything. Definitely having the right discussions and exploring all sorts of technology, but risk management remains a huge calculation in making these kind of decisions.

    • Ultraviolet@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      1 year ago

      I don’t understand why anyone even considers that. It’s a toy. A novelty, a thing you mess with when you’re bored and want to see how Hank Hill would explain the plot of Full Metal Alchemist, not something you would entrust anything significant to.

      • coolin@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        These models are black boxes right now, but presumably we could open it up and look inside to see each and every function the model is running to produce the output. If we are then able to see what it is actually doing and fix things up so we can mathematically verify what it does will be correct, I think we would be able to use it for mission critical applications. I think a more advanced LLM likes this would be great for automatically managing systems and to do science+math research.

        But yeah. For right now these things are mainly just toys for SUSSY roleplays, basic customer service, and generating boiler plate code. A verifiable LLM is still probably 2-4 years away.