Why I Run Language Models on My Own Machine

I work across academia and industry — research projects, teaching, and applied AI development with partners who handle sensitive data and intellectual property. In every one of those contexts, there are things I simply cannot send to a cloud AI service. Student data with ethics restrictions. Unpublished research. Industry IP under NDA. Proprietary datasets. Information that has to stay on approved infrastructure, full stop.

Local language models solve this. I run them on my own hardware, nothing leaves my machine, and the output is good enough to be genuinely useful for most tasks. What started as a workaround for privacy constraints has become a core part of how I work.

Here’s what surprises me though: for those of us who work with LLMs and AI systems daily, running models locally is old news. But I’m constantly struck by how many colleagues, students, and collaborators don’t even know it’s possible. They assume you need a server farm or a cloud subscription. You don’t. Any modern Mac with Apple Silicon, most gaming PCs, even the new generation of mini PCs with shared CPU/GPU memory can run genuinely capable models. The barrier is awareness, not hardware.

The Privacy and Compliance Case

If you’re an academic, you deal with sensitive data constantly — student submissions, unpublished manuscripts, ethics applications, grant reviews, participant data. Cloud AI services process your inputs on remote servers. Their privacy policies have improved, and many now say they won’t train on your data. But “we promise not to” is different from “we can’t.” The data still leaves your machine, traverses the internet, and sits on someone else’s infrastructure.

I use cloud AI tools every day for work that doesn’t involve sensitive material. But when I’m working with student data, reviewing confidential documents, or handling unpublished research, local models are the responsible choice. The data never leaves my computer. There’s nothing to trust, because there’s nothing to send.

This applies at every level. University ethics boards increasingly have explicit policies about what can and can’t be sent to cloud AI services. Ethics approvals for human participant research specify how data must be stored and processed — consent agreements rarely anticipated cloud AI. Data-sharing agreements restrict where shared datasets can be processed. And some organisations don’t just prefer local — they require it. Government contracts, industry NDAs, and classified projects all mandate that processing stays on approved infrastructure. Local models turn all of these blockers into solved problems.

Working on the Go

I travel a lot — long-haul flights, conference venues with overloaded wifi, hotels with patchy connections. If your AI workflow depends entirely on cloud services, you’re stuck the moment connectivity disappears. With local models, I keep working regardless. Some of my most productive sessions have been on 14-hour flights with a capable local model running on my laptop, no internet required. If you travel regularly for conferences or fieldwork, your AI tools travel with you.

The Industry and Applied Case

Part of my work involves developing AI-powered systems — agent-based and LLM-based tools — for industry partners, government stakeholders, and applied research projects spanning defence, training systems, and operational support. The common thread: constraints that make cloud AI difficult or impossible.

IP and data protection. Industry partners need assurance that their proprietary data won’t leave the network. When you’re working with someone’s competitive advantage, “we’ll use a cloud API” isn’t a conversation that goes well.
Security. Some projects involve classified or restricted information that cannot leave approved infrastructure.
Robustness. Systems that stop working when the internet drops aren’t systems you can rely on.
Edge deployment. Some applications need to run on-device, in the field, without any internet connection.

These are real engineering requirements that shape how you architect AI systems from day one. You design for local capability from the start — choosing models that run efficiently on constrained hardware and building pipelines that don’t assume cloud access.

The Development and Cost Case

Even without privacy constraints, there’s a purely practical reason to run models locally: cost. If you’re building tools that embed LLMs — agents, RAG pipelines, custom workflows — you need a model you can hit thousands of times without worrying about API credits. A local model costs nothing per request.

I use local models for prototyping agent workflows, testing integrations before pointing at production APIs, and letting students experiment freely in the lab without usage limits. Once a pipeline works locally, I swap in a cloud model for production if I need the extra capability. But the development loop happens on my machine, at zero marginal cost.

What’s Good Enough Locally Now

A year ago, local models were a compromise. You’d sacrifice a lot of quality for the privacy and cost benefits. That’s changed dramatically.

Google’s Gemma 3 models are a standout — the 4B model punches well above its weight for something that runs on 8GB of RAM, and the 12B version is genuinely impressive for everyday tasks. OpenAI released GPT-OSS, their first open-source model family, with a 20B variant that runs well on a 32GB machine. Qwen 3, Llama 3.2, Phi-4, and Mistral round out a field where there’s a strong option at every size point. The quality gap between local and cloud has narrowed significantly — for summarisation, drafting, code generation, data analysis, and question answering, these models handle the majority of everyday tasks well.

Ollama makes running them trivial — install it, pull a model, and you’re running in under five minutes. Just ollama pull gemma3 and go. Ollama now includes a built-in UI as well, so you don’t need the command line if you prefer not to use it. LM Studio is another excellent option — a desktop app where you can browse, download, and chat with models. It’s a great starting point for students and colleagues who aren’t comfortable with the terminal. See our Local LLM Setup Guide for the full model comparison table and installation walkthrough for both.

For tasks that demand more capability — complex reasoning, long-form writing, nuanced analysis — I still use cloud models. The point isn’t that local replaces cloud. It’s that local covers a surprisingly large portion of your daily needs, and for the use cases where privacy, cost, or connectivity matter, it’s the only option that works.

Where to Start

If you want to try running models locally, I’ve written a step-by-step guide that covers installation, model selection, hardware requirements, and how to connect local models to your development tools.

Read the full guide: Local LLM Setup Guide

The barrier to entry is lower than you think. If you have a reasonably modern laptop and five minutes, you can have a local model running today. Start with Ollama or LM Studio and a small model. See how it fits into your workflow. You might be surprised at how often “good enough on my machine” beats “better but in the cloud.”

The best AI setup isn’t all-cloud or all-local. It’s knowing when each one matters — and having both options available. Local models give you privacy, independence, and zero marginal cost. Cloud models give you maximum capability. Use both, and you’re covered for everything.

— Michael Richardson Professor, School of Psychological Sciences Faculty of Medicine, Health and Human Sciences Macquarie University

AI Disclosure: This article was written with the assistance of AI tools, including Claude. The ideas, opinions, experiences, and workflow described are entirely my own — the AI helped with drafting, editing, and structuring the text. I use AI tools extensively and openly in my research, teaching, and writing, and I encourage others to do the same. Using AI well is a skill worth developing, not something to hide or be ashamed of.

It’s also worth acknowledging that the AI models used here — and all current LLMs — were trained on vast quantities of text written by others, largely without explicit consent. The ideas and language of countless researchers, educators, and writers are embedded in every output these models produce. Their collective intellectual labour makes tools like this possible, and that contribution deserves recognition even when it can’t be individually attributed.