- Zero to Unicorn
- Posts
- Apple’s AI Update: Steady, Safe and Not Fooling Anyone
Apple’s AI Update: Steady, Safe and Not Fooling Anyone

This Week
Apple packed WWDC with its usual polish (and a fun F1 theme) but the showcase couldn’t mask how far frontier AI has sprinted ahead. We got a few updates worth noting: Spotlight actions, commute predictions and live translation, but Apple is playing catch-up against rivals' shipping features faster than you can update your iOS.
Also in this issue:
WWDC highlights from The Verge
CNBC’s dive into Apple’s AI missteps
How Apple’s own research exposes a collapse in large models
Read on.
Apple’s AI Update: Steady, Safe, and Not Fooling Anyone
Apple was the last major player to hold its developer event this spring. Compared to Google’s machine-gun fire of 100+ AI updates and Microsoft’s agent-first pitch, WWDC felt… restrained.
Apple even said as much, calling this year “incremental.” But if you’re one of the 2.35 billion people inside the Apple ecosystem, a few of these refinements are worth your attention.
They won’t land until September, but here’s what’s coming:
Spotlight Actions
Spotlight, Apple’s search tool on iOS and iPadOS, has always been the fastest way to find apps and files. With the new update, it does more than search. You can now run quick actions like playing music, launching podcasts, setting reminders, or adding calendar events.
Apple Intelligence adds another layer, letting you summarize documents or draft email replies without opening separate apps. It’s a small update on paper, but the one I’m most excited about. I no longer need to toggle over to ChatGPT to run AI tasks—I can do it straight from my Mac’s home screen.
Apple Wallet
As a frequent traveler, I’m especially excited about the updates to Apple Wallet. Boarding passes now integrate with Maps for airport navigation and live baggage tracking. Instead of watching the boarding gate screen cycle through updates, you can check standby lists or upgrade seats directly from your pass. Digital ID lets you store passport info for domestic TSA checkpoints, though you’ll still need the physical passport for international travel.
Apple Maps
The most interesting Maps update is its machine learning upgrade, which learns your daily routines. As an example, your iPhone will predict your daily commute and suggest alternate routes as soon as it detects a delay.
AI Under the Hood
Under the hood, AI is everywhere: live translations in messages and calls, personalized fitness coaching on Apple Watch, and Visual Intelligence for image-based search. Though on this last feature, Google’s Circle to Search still feels a step ahead.
The Apple Intelligence features are helpful and tightly integrated, but modest next to Google’s and Microsoft’s more aggressive AI pushes. Many (myself included) expected a full debut of Apple’s long-teased Siri overhaul.
Instead, they delivered small improvements: text rewriting, basic summarization, and smart replies, but nothing close to a full AI agent or Copilot rival. It seems we’ll have to make do with updates that are practical, but not groundbreaking.
My advice? Don’t cancel your ChatGPT subscription just yet.
Is Apple Losing the AI Race?
I remember the 2011 demo that promised Siri would replace all our typing with voice commands. Fourteen years on and the assistant still bungles alarms, trips over slang and asks me to repeat the simplest requests.
Apple tried to reboot the story last year with Apple Intelligence, but WWDC 2025 landed without the promised Siri overhaul, and executives asked for patience. That absence has set off renewed debate over Apple’s inevitable slide from the AI race.
While Microsoft and Google built advanced large language models, AI agents and cloud infrastructure, Apple has taken a slower, privacy-first approach. For the security conscious, it’s an admirable quality, but it might cost them in the long run.
CNBC’s fifteen-minute breakdown of how the lag unfolded and whether Apple can claw back momentum is worth watching.
👉 Find it on YouTube.
New Research Exposes AI’s Collapse Point
Apple’s new “Illusion of Thinking” study shows that AI quits the race the moment the hills get steep. Instead of working harder to solve problems, Large Reasoning Models (the “thinking” versions of LLMs, called LRMs) conserve tokens and fold early.
According to the paper published right before Apple's WWDC event, LRMs — like OpenAI o3, DeepSeek R1, Claude 3.7 Sonnet Thinking, and Google Gemini Flash Thinking — completely collapse when they're faced with increasingly complex problems.
Or, as the Apple researchers put it, while AI models perform well at math and coding, when it comes to more complex or creative problems, they only provide the illusion of thinking.
While the much-hyped LRM performed better than LLMs on medium-difficulty puzzles, they performed worse on simple puzzles, and when faced with harder puzzles, like a 5-disc Tower of Hanoi, (which I figured out in only a few minutes) they collapsed completely, giving up on the problem prematurely.
Worried what this means for your work flows and use cases? Here are five guardrails to maintain the quality of your outputs:
Stick to clear-cut tasks: for simple tasks like writing help, translations, and survey analysis, use an LLM like ChatGPT’s GPT-4.5. For medium-level problems, like math and coding, pick an LRM like ChatGPT’s o3 or Claude 3.7 Sonnet Thinking. For hard problems, skip AI altogether.
Audit any multi-step output: when the model quits mid-reasoning it still outputs something that might look good at first glance. Check any output that depends on strict logic (ad spend calculations, complex schedules, step-by-step tutorials) against a trusted source.
Break big problems into smaller prompts: If you must tackle a complex question—say a multi-channel product launch—guide the model through a series of sub-prompts (buyer personas first, message pillars next, channel mix and budget, then creative briefs and the rollout calendar). Keeping each hop simple, skirts the collapse zone.
Favor tools with retrieval-augmented generation (RAG): Pairing a model with a live search index or database improves accuracy and curbs hallucinations. When you vet vendors, confirm RAG is built in, its documentation stays fresh and outputs include citations you can audit.
Learn a quick “complexity sniff test”: Ask yourself: Does the question require reasoning across five-plus steps or unfamiliar concepts? If yes, plan to verify. Over time you will sense when an answer is probably past the model’s comfort zone.
Bottom line: LLMs and LRMs are still great personal helpers, as long as you keep their work bounded, verify anything critical and resist the temptation to hand them highly complex tasks.
What did you think of today's email?
Your feedback helps me create better emails for you! Send your thoughts to [email protected].
Reply