Why I built a local coding agent

It started with a pretty frustrating work session. I had a private project with sensitive code, and I wanted an AI assistant to help me refactor an entire module. The problem: to use any of the "good" tools I had to paste my code into someone else's cloud service, trust they wouldn't use it to train future models, and pay per token.

It wasn't dramatic. But it also wasn't what I wanted. So I started looking for alternatives.

Why not just use X?

Short answer: I tried them all. Ollama is excellent for running models locally, but it's not a coding agent. LM Studio has a pretty interface but neither is it. The agents that existed were Python, required 47 dependencies, and none had the user experience I'd consider reasonable for someone who just wants to open an app and get to work.

I wanted something that behaved more like a tool and less like a lab experiment.

So I decided to build it. And here's where people usually ask "but isn't that too much work?" Yes. And also no. Depends on how you approach it.

Choosing the stack

First thing was deciding how to build it. I had some clear constraints:

Native desktop, not a web app. I wanted it to work without a server, without deploying anything.
Real performance, especially for the part that launches and talks to the model.
Modern UI, without spending all my development time learning to build a frontend in Rust.

The combination that convinced me was Tauri 2 + Rust for the backend and Svelte 5 for the frontend. Tauri gives you a webview where your Svelte app runs, and all the "system" logic lives in Rust: launching processes, handling files, talking to the model. Best of both worlds, more or less.

For the model engine I chose llama.cpp, specifically its HTTP server llama-server. It's a compiled binary, starts fast, exposes an OpenAI-compatible API, and supports GPU offload without touching any code. Perfect.

The first version that actually worked

The first version that truly worked end-to-end was fairly spartan. You had a text field, you sent a message, the model responded with streaming (token by token, just like cloud services), done. No tools, no agent, nothing fancy.

But it worked. The model ran on my machine, the code never left my local network, and the latency was reasonable even on CPU. That was already enough to validate that the direction was right.

The first message Aleph generated end-to-end was "Hello. I'm Agent Aleph. How can I help you?" And it was ridiculously satisfying to watch it appear token by token.

What came next

From that simple base, we started adding things: model manager, download from Hugging Face, hardware detection, tool system, agent mode... Each of those parts has its own story, and we'll tell them here.

Agent Aleph isn't perfect yet. It only works on Linux x64, has some MVP limitations, and there's a lot on the list. But it's completely local, completely yours, and every week it's a little better.

If you want to follow along, you're in the right place.