AILLMSelf-Hosting

Self-Hosting LLMs — Run AI on Your Own Machine with Ollama 🏠

April 9, 20264 min read

No API keys. No rate limits. No data leaving your laptop. Here's how to run powerful LLMs locally with Ollama, and when it actually makes sense.

Why Run AI Locally? 🤔

Three words: privacy, speed, cost.

**Privacy** — Your code never leaves your machine. No terms of service. No training on your data.
**Speed** — No network latency. Responses start in milliseconds.
**Cost** — After hardware, it's literally free. Run 10,000 queries a day, zero dollars.

I've been running local LLMs alongside cloud AI for months. Here's the honest breakdown.

⚡ Getting Started with Ollama (5 Minutes)

That's it. You now have an AI running on your machine. No API key. No account. No internet needed.

🏆 Best Models for Developers (2025)

|-------|------|-----------|----------|-------|

| Llama 3.1 8B | 4.7GB | 8GB | General coding, quick answers | ⚡⚡⚡ |

| CodeLlama 13B | 7.4GB | 16GB | Code generation, debugging | ⚡⚡ |

| Mistral 7B | 4.1GB | 8GB | Fast reasoning, chat | ⚡⚡⚡ |

| DeepSeek Coder V2 | 8.9GB | 16GB | Code-specific tasks | ⚡⚡ |

| Qwen 2.5 Coder 7B | 4.7GB | 8GB | Multi-language coding | ⚡⚡⚡ |

| Llama 3.1 70B | 40GB | 48GB+ | Claude-like quality (needs GPU) | ⚡ |

My Recommendation

🔧 Using Ollama as an API

Ollama exposes a REST API compatible with OpenAI's format:

Integrate with Your Editor

📊 Local vs Cloud — Honest Comparison

| Factor | Local (Ollama) | Cloud (Claude/GPT) |

|--------|---------------|-------------------|

| Quality (8B model) | 6/10 | 9.5/10 |

| Quality (70B model) | 8/10 | 9.5/10 |

| Speed (first token) | Instant | 500ms-2s |

| Speed (generation) | Depends on hardware | Fast and consistent |

| Privacy | 100% local | Data sent to provider |

| Cost | Free (after hardware) | $20-200/month |

| Context window | Usually 4K-32K | Up to 1M tokens |

| Tool use | Limited | Full support |

| Internet needed | No | Yes |

When Local Makes Sense

Code completion and autocomplete (speed matters, quality can be lower)
Processing sensitive/proprietary code
Offline development (planes, trains, coffee shops with bad WiFi)
High-volume repetitive tasks (formatting, linting, simple transforms)
Learning and experimentation

When Cloud Wins

Complex reasoning and architecture decisions
Multi-file refactoring
Anything requiring 100K+ token context
Tool use and agent workflows
When accuracy matters more than privacy

🎯 My Hybrid Setup

💡 **Pro tip:** Use local models as a "first pass" and cloud models as the "expert review." Process 100 files locally, then send the 5 interesting ones to Claude for deep analysis. Saves money and time.

🎬 Start Here

Install Ollama: `brew install ollama`
Pull a model: `ollama pull llama3.1:8b`
Try it: `ollama run llama3.1:8b`
If you like it, try a bigger model
Integrate into your workflow alongside cloud AI

Local AI isn't a replacement for Claude or GPT. It's a complement. Use both. Ship faster. Keep your secrets safe. 🏠