I wanted to extract some crime statistics broken by the type of crime and different populations, all of course normalized by the population size. I got a nice set of tables summarizing the data for each year that I requested.
When I shared these summaries I was told this is entirely unreliable due to hallucinations. So my question to you is how common of a problem this is?
I compared results from Chat GPT-4, Copilot and Grok and the results are the same (Gemini says the data is unavailable, btw :)
So is are LLMs reliable for research like that?
Tried the example, got 2 names that did die in the attacks, but they sure as hell weren’t developers or anywhere near the open source sphere. Also love the classic “that’s not correct” with the AI response being “ah yes, of course”. Shit has absolutely 0 reflection. I mean it makes sense, people usually have doubts in their head BEFORE they write something down. The training data completely skips the thought process, LLMs can’t learn to doubt.