Tiny, Local AI Models

By Space Cadet

Video attached of Karpathy discussing cognitive cores of 1B parameter models with knowledge outsourced to RAG.

As long as the model is hooked up to an efficient db / kb, 1B params could be large enough for competent reasoning.

Playing with Gemma-4-E4B has nano-pilled me. Running household agents on affordable hardware in a few generations might reasonably handle most of a household's administrative work. Tough problems could be piped off to hosted SOTA models via APIs, but most functions do not require Nobel laureate-reasoning to solve.

Gemma-4-E4B can probably handle 30% of my API calls. I suspect Gemma-4-31B might be able to handle 60-80% but it has been arduous testing anything on 12GB of VRAM.

https://www.youtube.com/watch?v=UldqWmyUap4

Signature

"Who moves my hand, who moves my mind if the mighty Sun himself does not control his movement?"

Comments

Space Cadet

Apple would also be the ideal candidate to kick off a local AI renaissance.

It could develop an AI model repository to compete with LM Studio or Ollama (i.e., Hugging Face). It wouldn't need to train models itself (nor can it right now), it would simply sell the inference hardware and host the platform.

A simple little server that's essentially a repackaged and rebranded Mac Mini, with ~64GB of UM and an M5 or M6 chip. You could use your screened Apple devices (Mac, iPhone) to sync with your server through a UI similar to AirDrop. Just bring your device up to your AIPod and press the 'sync' button that appears.

Apple is already integrated into your finances through Apple Pay, your medical records through Health, and your communications through Mail and iMessage. A little agent could take on the administrative duties of taxes, health monitoring, and home admin.

For now, the frontier AI labs are dumping capex into infrastructure buildout with revenues trailing behind (Anthropic notwithstanding at this hour). Their business models depend on API calls from power users and enterprises to amortise their tens-of-billions-of-dollars in compute investments. If these API revenues fail to catch up and/or cheap easy capital dries up, these labs are going to face hard choices. They're all racing towards IPOs because private equity is saturated.

Apple is in a different position. By selling consumer inference products, it can make customers pay for the hardware upfront. No great amortisation responsibilities, no revenues guesswork, minimal risk. If small, local models can supplant 50 or 80 or 90% of tokens that would have been bought from frontier labs then two things come into play:

1. Frontier labs will find their API revenues severely threatened. As easy capital dries up, frontier labs will be hard pressed to make difficult capital allocation trade-offs between debt servicing, training, and product development. If revenues are sufficiently hampered for some labs, they might be forced into IP sales or acquisitions (potentially to Apple).

2. A platform that monetises small models opens the door for boutique labs to enter the market. For now, large labs (Google especially) can afford to train small open-weights models like Gemma 4 to crowd out the local space and funnel adoption towards its own cloud services. Apple could directly compete with cloud platform providers through its consumer hardware products and trust at least NVIDIA to continue releasing open weights nano models that drive GPU sales. If frontier labs try to kill small labs by releasing small models of their own, they may succeed temporarily but would have subsidised Apple's new product by spending millions, if not hundreds-of-millions of dollars on training models that drive consumer uptake of local inference hardware. Even if frontier labs do manage to crowd the small model space temporarily, they won't be able to fund these training runs indefinitely and will eventually be forced to sell their small models through Apple's platform. This would set the conditions for new entrants to begin training small models and selling them through the platform too. Either way, Apple and its customers win.

The current business model of Frontier labs makes sense if you assume that one of them is going to achieve ASI with a Decisive Strategic Advantage (DSI), which would make amortisation moot. If that fails, then these guys are in serious risk of a consumer inference hardware provider sweeping their legs. Apple does not sell compute cluster hardware so does not need to balance the capabilities of consumer hardware against enterprise hardware like NVIDIA does. It can just sell affordable devices with 64 or 128GB of UM and call it a day.

Most of the tokens that consumers need do not relate to millennium math problems or protein folding. Most can be produced by comparatively simple models wrapped inside of an agentic harness. Households don't need philosophers, they need 'gardeners' that tend to their home, business, and admin. There will be a market for extremely intelligent models that can solve hard enterprise problems. 10 or 100T parameter models will presumably exist and those labs will be able to charge $100 or $1,000 per million tokens. But 90-95% of household tokens will go towards tasks that far simpler, local models can handle.

This hasn't even touched on the privacy angle.

Signature

"Who moves my hand, who moves my mind if the mighty Sun himself does not control his movement?"