Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds. Researchers found wild fluctuations—called drift—in the technology’s abi…::ChatGPT went from answering a simple math correctly 98% of the time to just 2%, over the course of a few months.

  • @[email protected]
    link
    fedilink
    English
    1111 months ago

    This. It is able to tap in to plugins and call functions though, which is what it really should be doing. For math, the Wolfram alpha plugin will always be more capable than chatGPT alone, so we should be benchmarking how often it can correctly reformat your query, call Wolfram alpha, and correctly format the result, not whether the statistical model behind chatGPT happens to use predict the right token

    • @[email protected]
      link
      fedilink
      English
      211 months ago

      It sounds like it’s time to merge Wolfram Alpha’s and ChatGPT’s capabilities together to create the ultimate calculator.