Tools: Open Source Why On-device Agentic AI Can't Keep Up

Tools: Open Source Why On-device Agentic AI Can't Keep Up

Posted on Mar 1

• Originally published at martinalderson.com

Indeed, the pace of improvements in open weights models has been spectacular - if you've got (tens of) thousands to drop on a Mac Studio cluster or a high end GPU setup, local models are genuinely useful. But for the other 99% of devices people actually carry around, every time I open llama.cpp to do some local on device work, it feels - if anything - like progress is going backwards relative to what I can do with frontier models.

There are some hard physical limits to what consumer hardware can do - and they're not going away any time soon.

For the purposes of this article, I'm referring to agentic capabilities in a personal admin capacity. Think searching emails and composing a reply and sending a calendar invite. More advanced capabilities like we see in software engineering are even harder to do on device.

While the models themselves are getting hugely more capable, there's an intrinsic problem that they require a lot of ideally fast RAM.

Right now, 16GB laptops are the most common configuration for new devices - but 8GB is still very common.

On phones, the situation is (understandably) even more constrained. Apple is still shipping phones with 8GB for the most part - the iPhone 16e and 17 ship with 8GB of RAM, and only the Pro models have 12GB. Google on their Pixel lineup is more generous, shipping 12GB on the 'standard' models, with 16GB on the Pro models.

The issue is that this RAM isn't just for on device AI models. It's also for the OS, running apps. Realistically you want at least 4GB for this - and that's cutting it fine for web browsers and other RAM heavy apps on your phone. On laptops I'd suggest you want at least 8GB of RAM for your OS and apps.

This leaves very little space for the AI capabilities themselves - perhaps 4GB on non-"Pro" models and 8GB on the Pro models. Equally even a new MacBook Air is only going to have 8GB of space in RAM for AI. And these are brand new devices. The majority of people are running multiyear old hardware.

The models present one space issue. A 3B param model (which in comparison to frontier models is tiny) requires on the order of 2GB in a highly quantised (think compressed) variant. A 7B param model - which in my experience is vastly more capable - requires more like 5GB. In comparison, full scale models are around the 1TB mark - 200-500x larger.

Source: Dev.to