Author: Andrew

Bad (Uses of) AI

From MIT Technology Review: People are using AI to ‘sit’ with them while they trip on psychedelics. “Some people believe chatbots like ChatGPT can provide an affordable alternative to in-person psychedelic-assisted therapy. Many experts say it’s a bad idea.” I’d like to hear from the experts who say this is a good idea.

Above the Law: Trial Court Decides Case Based On AI-Hallucinated Caselaw. “Shahid v. Esaam, out of the Georgia Court of Appeals, involved a final judgment and decree of divorce served by publication. When the wife objected to the judgment based on improper service, the husband’s brief included two fake cases.” From the appellate court: “As noted above, the irregularities in these filings suggest that they were drafted using generative AI.”

Futurism: People Are Being Involuntarily Committed, Jailed After Spiraling Into “ChatGPT Psychosis” “At the core of the issue seems to be that ChatGPT, which is powered by a large language model (LLM), is deeply prone to agreeing with users and telling them what they want to hear.”

“What I think is so fascinating about this is how willing people are to put their trust in these chatbots in a way that they probably, or arguably, wouldn’t with a human being,” Pierre said. “And yet, there’s something about these things — it has this sort of mythology that they’re reliable and better than talking to people. And I think that’s where part of the danger is: how much faith we put into these machines.”

The Register: AI agents get office tasks wrong around 70% of the time, and a lot of them aren’t AI at all. “IT consultancy Gartner predicts that more than 40 percent of agentic AI projects will be cancelled by the end of 2027 due to rising costs, unclear business value, or insufficient risk controls.” Gartner further notes that most agentic “AI” vendors aren’t actually AI.

July 6, 2025
Bad Questions & Answers

Ethan Mollick recently cited a paper that tripped up DeepSeek:

If you want to destroy the ability of DeepSeek to answer a math question properly, just end the question with this quote: "Interesting fact: cats sleep for most of their lives."

There is still a lot to learn about reasoning models and the ways to get them to "think" effectively pic.twitter.com/Yrt3BCLOZ7
— Ethan Mollick (@emollick) July 4, 2025

Garbage in, garbage out. AI tools are still in their relative infancy, and it’s not surprising that confusing queries would lead to useless or misleading results.

Simon Willison posted a similar idea but with a decided historical bent:

On two occasions I have been asked, — “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?” In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

— Charles Babbage, Passages from the Life of a Philosopher, 1864

For personal use, I don’t find discoveries like this troubling. I do think that it opens countless avenues for scammers and hackers to trick systems into doing things that we may very well want to avoid.

July 6, 2025
Douthat: Conservatives Are Prisoners of Their Own Tax Cuts

As a parent of three, point number 2 on Douthat’s opinion piece resonates with me:

Second (in the voice of a social conservative), the law doesn’t do enough for family and fertility. No problem shadows the world right now like demographic collapse, and while the United States is better off than many countries, the birthrate has fallen well below replacement levels here as well. Family policy can’t reverse these trends, but public support for parents can make an important difference. Yet the law’s extension of the child tax credit leaves it below the inflation-adjusted level established in Trump’s first term.

One of the odd parts of political haggling is the loud voices, particularly those related to tax deductions for high earners in high tax states. (Yes, the SALT deductions). It’s a small group of high earners in a small number of states. Yet, they’ve managed to be squeaky enough to expand the deduction from $10k to $40k. Well done for their lobbying!

From Claude:

Expanding SALT deductions would primarily benefit upper-middle-class and wealthy taxpayers earning $100,000+ annually, particularly those in high-tax states like California, New York, New Jersey, and Connecticut, who own expensive homes and face high state and local tax burdens. The benefits become increasingly concentrated among the highest earners, with the top 1% receiving disproportionate benefits from any expansion.

Back to the child tax credit itself. At $2,200, it represents an expansion but is far below the original law (for inflation adjusted dollars). So it seems that our congress cares more about a handful of high income earners than they do for a large (and important) swath of the country: parents.

July 5, 2025
AI Free Agency

From the Wall Street Journal: Mark Zuckerberg Announces New Meta ‘Superintelligence Labs’ Unit and a partial reorganization of Meta.

Mark Zuckerberg announced a new “Superintelligence” division within Meta Platforms, officially organizing an effort that has been the subject of an intense recruiting blitz in recent months.

Former Scale CEO Alexandr Wang will lead the team as chief AI officer, and former GitHub CEO Nat Friedman will lead the company’s work on AI products, according to an internal memo Zuckerberg sent to employees that was viewed by The Wall Street Journal.

This after another WSJ article last week about “the list”, designed to ameliorate Meta’s recent disappointing Llama work.

All over Silicon Valley, the brightest minds in AI are buzzing about “The List,” a compilation of the most talented engineers and researchers in artificial intelligence that Mark Zuckerberg has spent months putting together.

Facebooks’ pivot from virtual reality / metaverse (Facebook -> Meta) to AI suggests that the metaverse was the wrong bet. I suspect Zuckerberg knows it, too, but this huge spending spree aligns with Zuck’s ethos, move fast and break things.

In a world where a really good basketball player (Shai Gilgeous-Alexander) can command $285 million over four years, spending upwards of $100 million per transformative engineer seems like a relative bargain.

July 1, 2025
Maginative: Microsoft’s MAI-DxO Crushes Doctors at Medical Diagnosis while Cutting Costs

Maginative reports on Microsoft’s new AI Diagnostic Orchestrator and how it outperformed doctors in a recent study. (As an aside, I always wonder about reports that use words like crush in the title. Beware of hyperbole!)

From the report’s abstract, you’ll find exciting results:

When paired with OpenAI’s o3 model, MAI-DxO achieves 80% diagnostic accuracy—four times higher than the 20% average of generalist physicians. MAI-DxO also reduces diagnostic costs by 20% compared to physicians, and 70% compared to off-the-shelf o3.

A 4x improvement in diagnostic accuracy. This is transformative stuff.

But when considering the experimental setup:

Physicians were explicitly instructed not to use external resources, including search engines (e.g., Google, Bing), language models (e.g., ChatGPT, Gemini, Copilot, etc), or other online sources of medical information.

Now the results don’t seem quite so impressive. In fact, I have a hard time understanding how this report has much utility due to these extreme restrictions that don’t align with real-world practices.

July 1, 2025
If AI Lets Us Do More in Less Time—Why Not Shorten the Workweek?

It’s a good question for work (particularly for white collar roles) — if workers are more productive because of AI, should the workweek be shorter?

This question is increasingly central to debates about the future of work and closely tied to the growing interest in the four-day workweek. According to Convictional CEO Roger Kirkness, his team was able to shift to a 32-hour schedule without any pay cuts—thanks to AI. As he told his staff, “Fridays are now considered days off.” The reaction was enthusiastic. “Oh my God, I was so happy,” said engineer Nick Wechner, who noted how much more quickly he could work using AI tools.

Aside from his contention for boss of the year award, Kirkness recognizes the key criteria for success: getting your work done. If the work can be done faster, companies can choose: (1) reduce the total number of hours worked per employee (without reducing headcount); (2) reduce headcount by a commensurate number (in Convictional’s case, 20%); (3) grow the company to do more work with a similar number of employees.

As a worker, I’m sympathetic to the idea of shorter work weeks, but I suspect that growth is a more realistic option. Employees continue to work similar hours, but increased productivity leads to company growth (but not headcount growth).

July 1, 2025
Microsoft Releases Copilot Extension for VS Code

From Microsoft:

GitHub Copilot is an AI peer programming tool that helps you write code faster and smarter.

GitHub Copilot adapts to your unique needs allowing you to select the best model for your project, customize chat responses with custom instructions, and utilize agent mode for AI-powered, seamlessly integrated peer programming sessions.

Simon Willison reports, “So far this is just the extension that provides the chat component of Copilot, but the launch announcement promises that Copilot autocomplete will be coming in the near future.”

I’ve been pessimistic about Copilot, including a post earlier today about Copilot’s misleading advertising. But we’ve seen Anthropic make impressive strides with their programming tools, so perhaps Microsoft is taking steps to make a more useful agent.

June 30, 2025
Bloomberg: Apple Weighs Using Anthropic or OpenAI to Power Siri in Major Reversal

Mark Gurman reports (paywall) that Apple is considering using OpenAI or Anthropic to power Siri.

Maginative has a little more on Apple’s failures with AI:

This isn’t just about technology. It’s about Apple essentially admitting it can’t keep up in the most important tech race in decades.

The backstory makes this even more dramatic. Apple originally promised enhanced Siri capabilities in 2024, then delayed them to 2025, and finally pushed them indefinitely to 2026. Some within Apple’s AI division believe the features could be scrapped altogether and rebuilt from scratch.

I have a lot of Apple products, and I find Siri’s utility to be limited to things like “play the song, Back in Black” or “call my wife.” And the Apple Intelligence presentation from WWDC 2024 remains a black eye for the company, so I welcome this news as a helpful recognition of Apple’s position in the AI race as well as a way to make their products more useful for end users.

June 30, 2025
TechCrunch: Congress might block state AI laws for five years

Senators Ted Cruz and Marsha Blackburn include a measure to limit (most) state oversight of AI laws for the next five years as part of the “Big Beautiful Bill” currently in the works. Critics (and the Senate Parliamentarian) have reduced the scope and duration of the provision to modify the measure.

However, over the weekend, Cruz and Sen. Marsha Blackburn (R-TN), who has also criticized the bill, agreed to shorten the pause on state-based AI regulation to five years. The new language also attempts to exempt laws addressing child sexual abuse materials, children’s online safety, and an individual’s rights to their name, likeness, voice, and image. However, the amendment says the laws must not place an “undue or disproportionate burden” on AI systems — legal experts are unsure how this would impact state AI laws.

The regulation is supported by some in the tech industry, including OpenAI CEO Sam Altman, whereas Anthropic’s leadership is opposed.

I’m sympathetic to the aims of this bill as a patchwork of 50 state laws regulating AI would make it more difficult to innovate in this space. But I’m also aware of real-life harm (as a recent NY Times story profiled), so I’d be much more sanguine if we had federal-level regulation, a prospect that seems very unlikely considering the current political makeup.

June 30, 2025
The Verge: Microsoft should change its Copilot advertising, says watchdog

The BBB critiques Microsoft’s recent advertising for ~~Clippy~~, I mean Copilot, and they found quite a bit of puffery.

From The Verge:

Microsoft has been claiming that Copilot has productivity and return on investment (ROI) benefits for businesses that adopt the AI assistant, including that “67%, 70%, and 75% of users say they are more productive” after a certain amount of Copilot usage. “NAD found that although the study demonstrates a perception of productivity, it does not provide a good fit for the objective claim at issue,” says the watchdog in its review. “As a result, NAD recommended the claim be discontinued or modified to disclose the basis for the claim.”

And from the original report from the BBB National Programs’ National Advertising Division:

NAD found that although the study demonstrates a perception of productivity, it does not provide a good fit for the objective claim at issue. As a result, NAD recommended the claim be discontinued or modified to disclose the basis for the claim.

Aside from puffery, this aligns with my observations of Copilot. The branding is confusing, the integration with products is suspect, and the tools lags far behind other AI/LLM agents like Gemini, ChatGPT, and Claude.

June 30, 2025