IEEE Spectrum: Large Language Models Are Improving Exponentially

Written by

in

Recent report predicts a bright future for LLMs:

That was a key motivation behind work at Model Evaluation & Threat Research (METR). The organization, based in Berkeley, Calif., “researches, develops, and runs evaluations of frontier AI systems’ ability to complete complex tasks without human input.” In March, the group released a paper called Measuring AI Ability to Complete Long Tasks, which reached a startling conclusion: According to a metric it devised, the capabilities of key LLMs are doubling every seven months. This realization leads to a second conclusion, equally stunning: By 2030, the most advanced LLMs should be able to complete, with 50 percent reliability, a software-based task that takes humans a full month of 40-hour workweeks. And the LLMs would likely be able to do many of these tasks much more quickly than humans, taking only days, or even just hours.

As a caveat — I’m not sure how many companies would be satisfied with a 50% success rate for key software. Having an AI tool complete a job that would take a human a full month would be a good thing. But let’s face it, a person still has to determine if the work was done satisfactorily. 50% isn’t a passing grade for any subject.

More posts