Differences in link hallucination and source comprehension across different large language models

Written by

in

Mike Caulfield explores the problem of hallucinated links:

If I am being harsh here it’s because we constantly hear — based on ridiculously dumb benchmarks — that all these models are performing at “graduate level” of one sort or another. They are not, at least out of the box like this. Imagine giving a medical school student this question, and they say — yes the thing that says in the actual conclusion that the lack of sustained differences is probably due to people stopping their medication is proof that medication doesn’t work (scroll to bottom of this screenshot to see). Never mind that in the results it states quite clearly that all groups saw improvement over baseline.

https://mikecaulfield.substack.com/p/differences-in-link-hallucination

More posts