Blog

  • CyberGym: Evaluating AI Agents’ Cybersecurity Capabilities with Real-World Vulnerabilities at Scale

    UC Berkeley researchers release CyberGym as a benchmark for evaluating AI agents cybersecurity capabilities. The reproduction rate of identifying known bugs was low (only 11.9%), but this serves as a baseline for improvements in AI agent performance over time.

    More interestingly, the evaluation process discovered 15 new vulnerabilities that present security risks, a tangential benefit. As this is a new technique, I’d expect that teams will find these tools to be increasingly helpful over the next few years.

  • Harvard’s Library Releases Dataset from Old Books

    Using scanned material in the public domain, the Harvard Library team releases new LLM-focused dataset with over 1 million volumes (and nearly 250 billion tokens).

    Harvard has been in the news of late, and much of it for reasons I’d assume they would like to avoid. But in the midst of that, Harvard Librarians demonstrate why we’ve long admired University work. As holders of a vast wealth of history and information, they’re looking for ways to disseminate that to the world.

  • Simon Willison on Multi-Agent Systems

    High praise from Willison on Anthropic’s new multi-agent research system:

    I’ve been pretty skeptical of these until recently: why make your life more complicated by running multiple different prompts in parallel when you can usually get something useful done with a single, carefully-crafted prompt against a frontier model?

    By splitting the research into segments and serializing the work, Anthropic’s team improved the completed research but burned through a lot of tokens. Like much frontier AI work, it’s fascinating … and expensive.

  • Nvidia CEO Criticizes Anthropic’s leader over recent comments

    Nvidia CEO Jensen Huang criticized Anthropic head Dario Amodei over his recent claims that 50% of all entry-level white-collar jobs could be wiped out by artificial intelligence, causing unemployment to jump to 20% within the next five years. Huang disagreed with Amodei’s predictions when he was asked about it during VivaTech in Paris, where he said that he “pretty much disagree[s] with almost everything” the Anthropic CEO said, according to Fortune.

    Amodei (and much of the rhetoric coming from Anthropic’s team) is much more pessimistic about the future with AI. Huang, clearly, is much more optimistic. But it seems like we need to be aware of these disputes within the industry and be sober about the possibilities. Amodei and Huang may both be right.

    Nvidia CEO slams Anthropic’s chief over his claims of AI taking half of jobs and being unsafe — ‘Don’t do it in a dark room and tell me it’s safe’

  • OpenAI hits $10 billion in annual recurring revenue fueled by ChatGPT growth

    CNBC reports: OpenAI announces new revenue figures on June 9 and figures that represent a nearly 100% growth in the past year. But it’s only a fraction of what they project by 2029:

    OpenAI is also targeting $125 billion in revenue by 2029, according to a person familiar with the matter who asked not to be named because the details are confidential. The Information first reported on OpenAI’s revenue ambitions.

    That’s a lot of $20/mo. subscriptions.

  • News Sites Are Getting Crushed by Google’s New AI Tools

    Business Insider cut about 21% of its staff last month, a move CEO Barbara Peng said was aimed at helping the publication “endure extreme traffic drops outside of our control.” Organic search traffic to its websites declined by 55% between April 2022 and April 2025, according to data from Similarweb.

    Aside the the spurious and clickbaity nature of BI content, I’ve noticed how Google’s tools reduce my reliance on source content. Why click a link when the information is there?

    Aside from well-documented hallucinations, fewer clicks is ultimately helpful for searchers looking for a specific piece of information.

  • Sam Altman: The Gentle Singularity

    Altman takes a philosophical if not mystically reverent tone as he considers the future of AI. Starting with, “We are past the event horizon; the takeoff has started.” has a certain rhetorical flair to it, although it feels too exhuberant.

    Quibbles aside, there are some really interesting nuggets in the post:

    • “we have recently built systems that are smarter than people in many ways, and are able to significantly amplify the output of people using them”
    • “2025 has seen the arrival of agents that can do real cognitive work; writing computer code will never be the same. 2026 will likely see the arrival of systems that can figure out novel insights. 2027 may see the arrival of robots that can do tasks in the real world.”
    • “We already hear from scientists that they are two or three times more productive than they were before AI.”
    • “The rate of new wonders being achieved will be immense. It’s hard to even imagine today what we will have discovered by 2035;”
    • “OpenAI is a lot of things now, but before anything else, we are a superintelligence research company”

    And perhaps the piece that many of us were wondering about: electricity consumption:

    People are often curious about how much energy a ChatGPT query uses; the average query uses about 0.34 watt-hours, about what an oven would use in a little over one second, or a high-efficiency lightbulb would use in a couple of minutes. It also uses about 0.000085 gallons of water; roughly one fifteenth of a teaspoon.

  • Google offers buyouts to more workers

    MOUNTAIN VIEW, Calif. (AP) — Google has offered buyouts to another swath of its workforce across several key divisions in a fresh round of cost cutting coming ahead of a court decision that could order a breakup of its internet empire. The Mountain View, California, company confirmed the streamlining that was reported by several news outlets.

    Source: AP

    The Verge also reports, Google is offering employee buyouts in Search and other orgs:

    Google is starting to offer buyouts to US-based employees in its sprawling Search organization, along with other divisions like marketing, research, and core engineering, according to multiple employees familiar with the matter.

    Per Bloomberg last month:

    Beyond that upheaval, AI is already making gains with consumers. Cue noted that searches on Safari dipped for the first time last month, which he attributed to people using AI. Cue said he believes that AI search providers, including OpenAI, Perplexity AI Inc. and Anthropic PBC, will eventually replace standard search engines like Alphabet’s Google. He said he believes Apple will bring those options to Safari in the future.

  • Anthropic fires its AI blogger

    A week after TechCrunch profiled Anthropic’s experiment to task the company’s Claude AI models with writing blog posts, Anthropic wound down the blog and redirected the address to its homepage. Sometime over the weekend, the Claude Explains blog disappeared — along with its initial few posts.

    I read the announcement of the AI-blogging tool last week, but the blog had already disappeared. This strikes me as another example of AI tools are useful co-workers, but an experienced programmer/writer/editor is still needed.

    https://techcrunch.com/2025/06/09/anthropics-ai-generated-blog-dies-an-early-death/

  • OpenAI drops prices on o3

    It’s not a frontier model, but that’s a sizable drop for a tool that is effective in a lot of contexts.

    Update from Simon Willison on the o3 price drop:

    This is a pretty huge shake-up in LLM pricing. o3 is now priced the same as GPT 4.1, and slightly less than GPT-4o ($2.50/$10). It’s also less than Anthropic’s Claude Sonnet 4 ($3/$15) and Opus 4 ($15/$75) and sits in between Google’s Gemini 2.5 Pro for >200,00 tokens ($2.50/$15) and 2.5 Pro for <200,000 ($1.25/$10).