Blog

  • Bad Questions & Answers

    Ethan Mollick recently cited a paper that tripped up DeepSeek:

    Garbage in, garbage out. AI tools are still in their relative infancy, and it’s not surprising that confusing queries would lead to useless or misleading results.

    Simon Willison posted a similar idea but with a decided historical bent:

    On two occasions I have been asked, — “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?” In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

    — Charles Babbage, Passages from the Life of a Philosopher, 1864

    For personal use, I don’t find discoveries like this troubling. I do think that it opens countless avenues for scammers and hackers to trick systems into doing things that we may very well want to avoid.

  • Douthat: Conservatives Are Prisoners of Their Own Tax Cuts

    As a parent of three, point number 2 on Douthat’s opinion piece resonates with me:

    Second (in the voice of a social conservative), the law doesn’t do enough for family and fertility. No problem shadows the world right now like demographic collapse, and while the United States is better off than many countries, the birthrate has fallen well below replacement levels here as well. Family policy can’t reverse these trends, but public support for parents can make an important difference. Yet the law’s extension of the child tax credit leaves it below the inflation-adjusted level established in Trump’s first term.

    One of the odd parts of political haggling is the loud voices, particularly those related to tax deductions for high earners in high tax states. (Yes, the SALT deductions). It’s a small group of high earners in a small number of states. Yet, they’ve managed to be squeaky enough to expand the deduction from $10k to $40k. Well done for their lobbying!

    From Claude:

    Expanding SALT deductions would primarily benefit upper-middle-class and wealthy taxpayers earning $100,000+ annually, particularly those in high-tax states like California, New York, New Jersey, and Connecticut, who own expensive homes and face high state and local tax burdens. The benefits become increasingly concentrated among the highest earners, with the top 1% receiving disproportionate benefits from any expansion.

    Back to the child tax credit itself. At $2,200, it represents an expansion but is far below the original law (for inflation adjusted dollars). So it seems that our congress cares more about a handful of high income earners than they do for a large (and important) swath of the country: parents.

  • AI Free Agency

    From the Wall Street Journal: Mark Zuckerberg Announces New Meta ‘Superintelligence Labs’ Unit and a partial reorganization of Meta.

    Mark Zuckerberg announced a new “Superintelligence” division within Meta Platforms, officially organizing an effort that has been the subject of an intense recruiting blitz in recent months.

    Former Scale CEO Alexandr Wang will lead the team as chief AI officer, and former GitHub CEO Nat Friedman will lead the company’s work on AI products, according to an internal memo Zuckerberg sent to employees that was viewed by The Wall Street Journal. 

    This after another WSJ article last week about “the list”, designed to ameliorate Meta’s recent disappointing Llama work.

    All over Silicon Valley, the brightest minds in AI are buzzing about “The List,” a compilation of the most talented engineers and researchers in artificial intelligence that Mark Zuckerberg has spent months putting together. 

    Facebooks’ pivot from virtual reality / metaverse (Facebook -> Meta) to AI suggests that the metaverse was the wrong bet. I suspect Zuckerberg knows it, too, but this huge spending spree aligns with Zuck’s ethos, move fast and break things.

    In a world where a really good basketball player (Shai Gilgeous-Alexander) can command $285 million over four years, spending upwards of $100 million per transformative engineer seems like a relative bargain.

  • Maginative: Microsoft’s MAI-DxO Crushes Doctors at Medical Diagnosis while Cutting Costs

    Maginative reports on Microsoft’s new AI Diagnostic Orchestrator and how it outperformed doctors in a recent study. (As an aside, I always wonder about reports that use words like crush in the title. Beware of hyperbole!)

    From the report’s abstract, you’ll find exciting results:

    When paired with OpenAI’s o3 model, MAI-DxO achieves 80% diagnostic accuracy—four times higher than the 20% average of generalist physicians. MAI-DxO also reduces diagnostic costs by 20% compared to physicians, and 70% compared to off-the-shelf o3. 

    A 4x improvement in diagnostic accuracy. This is transformative stuff.

    But when considering the experimental setup:

    Physicians were explicitly instructed not to use external resources, including search engines (e.g., Google, Bing), language models (e.g., ChatGPT, Gemini, Copilot, etc), or other online sources of medical information.

    Now the results don’t seem quite so impressive. In fact, I have a hard time understanding how this report has much utility due to these extreme restrictions that don’t align with real-world practices.

  • If AI Lets Us Do More in Less Time—Why Not Shorten the Workweek?

    It’s a good question for work (particularly for white collar roles) — if workers are more productive because of AI, should the workweek be shorter?

    This question is increasingly central to debates about the future of work and closely tied to the growing interest in the four-day workweek. According to Convictional CEO Roger Kirkness, his team was able to shift to a 32-hour schedule without any pay cuts—thanks to AI. As he told his staff, “Fridays are now considered days off.” The reaction was enthusiastic. “Oh my God, I was so happy,” said engineer Nick Wechner, who noted how much more quickly he could work using AI tools.

    Aside from his contention for boss of the year award, Kirkness recognizes the key criteria for success: getting your work done. If the work can be done faster, companies can choose: (1) reduce the total number of hours worked per employee (without reducing headcount); (2) reduce headcount by a commensurate number (in Convictional’s case, 20%); (3) grow the company to do more work with a similar number of employees.

    As a worker, I’m sympathetic to the idea of shorter work weeks, but I suspect that growth is a more realistic option. Employees continue to work similar hours, but increased productivity leads to company growth (but not headcount growth).

  • Microsoft Releases Copilot Extension for VS Code

    From Microsoft:

    GitHub Copilot is an AI peer programming tool that helps you write code faster and smarter.

    GitHub Copilot adapts to your unique needs allowing you to select the best model for your project, customize chat responses with custom instructions, and utilize agent mode for AI-powered, seamlessly integrated peer programming sessions.

    Simon Willison reports, “So far this is just the extension that provides the chat component of Copilot, but the launch announcement promises that Copilot autocomplete will be coming in the near future.”

    I’ve been pessimistic about Copilot, including a post earlier today about Copilot’s misleading advertising. But we’ve seen Anthropic make impressive strides with their programming tools, so perhaps Microsoft is taking steps to make a more useful agent.

  • Bloomberg: Apple Weighs Using Anthropic or OpenAI to Power Siri in Major Reversal

    Mark Gurman reports (paywall) that Apple is considering using OpenAI or Anthropic to power Siri.

    Maginative has a little more on Apple’s failures with AI:

    This isn’t just about technology. It’s about Apple essentially admitting it can’t keep up in the most important tech race in decades.

    The backstory makes this even more dramatic. Apple originally promised enhanced Siri capabilities in 2024, then delayed them to 2025, and finally pushed them indefinitely to 2026. Some within Apple’s AI division believe the features could be scrapped altogether and rebuilt from scratch.

    I have a lot of Apple products, and I find Siri’s utility to be limited to things like “play the song, Back in Black” or “call my wife.” And the Apple Intelligence presentation from WWDC 2024 remains a black eye for the company, so I welcome this news as a helpful recognition of Apple’s position in the AI race as well as a way to make their products more useful for end users.

  • TechCrunch: Congress might block state AI laws for five years

    Senators Ted Cruz and Marsha Blackburn include a measure to limit (most) state oversight of AI laws for the next five years as part of the “Big Beautiful Bill” currently in the works. Critics (and the Senate Parliamentarian) have reduced the scope and duration of the provision to modify the measure.

    However, over the weekend, Cruz and Sen. Marsha Blackburn (R-TN), who has also criticized the bill, agreed to shorten the pause on state-based AI regulation to five years. The new language also attempts to exempt laws addressing child sexual abuse materials, children’s online safety, and an individual’s rights to their name, likeness, voice, and image. However, the amendment says the laws must not place an “undue or disproportionate burden” on AI systems — legal experts are unsure how this would impact state AI laws.

    The regulation is supported by some in the tech industry, including OpenAI CEO Sam Altman, whereas Anthropic’s leadership is opposed.

    I’m sympathetic to the aims of this bill as a patchwork of 50 state laws regulating AI would make it more difficult to innovate in this space. But I’m also aware of real-life harm (as a recent NY Times story profiled), so I’d be much more sanguine if we had federal-level regulation, a prospect that seems very unlikely considering the current political makeup.

  • The Verge: Microsoft should change its Copilot advertising, says watchdog

    The BBB critiques Microsoft’s recent advertising for Clippy, I mean Copilot, and they found quite a bit of puffery.

    From The Verge:

    Microsoft has been claiming that Copilot has productivity and return on investment (ROI) benefits for businesses that adopt the AI assistant, including that “67%, 70%, and 75% of users say they are more productive” after a certain amount of Copilot usage. “NAD found that although the study demonstrates a perception of productivity, it does not provide a good fit for the objective claim at issue,” says the watchdog in its review. “As a result, NAD recommended the claim be discontinued or modified to disclose the basis for the claim.”

    And from the original report from the BBB National Programs’ National Advertising Division:

    NAD found that although the study demonstrates a perception of productivity, it does not provide a good fit for the objective claim at issue. As a result, NAD recommended the claim be discontinued or modified to disclose the basis for the claim. 

    Aside from puffery, this aligns with my observations of Copilot. The branding is confusing, the integration with products is suspect, and the tools lags far behind other AI/LLM agents like Gemini, ChatGPT, and Claude.

  • Checking In on AI and the Big Five

    Ben Thompson writes on the Big 5 (Amazon, Apple, Google, Meta/Facebook, Microsoft) and where they stand in the AI field today.

    … [is] AI complementary to existing business models (i.e. Apple devices are better with AI) or disruptive to them (i.e. AI might be better than Search but monetize worse). A higher level question, however, is if AI simply obsoletes everything, from tech business models to all white collar work to work generally or even to life itself.

    Perhaps it is the smallness of my imagination or my appreciation of the human condition that makes me more optimistic than many about the probability of the most dire of predictions: I think they are quite low. At the same time, I think that those dismissing AI as nothing but hype are missing the boat as well. This is a big deal, even if the changes may end up fitting into the Bill Gates maxim that “We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten.”

    I tend to agree with Thompson’s predictions — change over the next decade will be significant (and hard to imagine now) and the likelihood of the dire predictions coming true is astonishingly low in the near term.

    Like Thompson, I assumed that Microsoft’s partnership with OpenAI would position them to lap the other companies listed here, but the Copilot product is persistently disappointing, especially when considering ChatGPT’s rising utility. Google Gemini, as a tool, is gaining capabilities, particularly as it relates to Veo and programming, although I think the Gemini-infused Google search results have too many embarrassing mistakes for it to be a useful tool today.