Written by Martin Ågren - February 14, 2026
When using a trained AI model to produce outputs on new data. Example: ChatGPT answering your question.
When an organization or person runs an AI model on own device or on own server.
Run AI model on your own computer.
No input data leaves the computer.
No internet connection is needed to use the AI model.
Organization runs AI model on its own server.
No input data leaves the organization.
Data not leaving the organization: Very good from data security and GDPR perspective.
Note: If you connect your local LLM to external tools via MCP (or API) security decreases IF data is sent outside computer/server.
Complement with:
- Strong passwords
- Encrypted drives
- Backups (avoid data loss) - Also protected with passwords & encryption
You need powerful GPUs (graphic cards) and other powerful hardware to run AI locally on computer/server. Keep in mind: Other apps shares computer resources if you run apps simultaneously with AI inference.
NVIDIA GPUs - well suited for AI inference.
VRAM (Video Random Access Memory) = Installed video memory. The more, the better (for AI inference).
Example: "Qwen 30B Q4 A3B 2507" seems good at English, but bad at Swedish.
So for Swedish, you should try:
- Another 30B model and compare...
- 70B model which would require powerful hardware, of course
You need to download AI model(s) so you can run inference locally.
You can download via HuggingFace.co or via LM Studio (or other methods).
Examples of open weights AI models:
Deepseek, GPT-oss, Llama, Mistral, Mixtral, Qwen
"B" = Number of billion parameters weights. Example: "30B". More parameters generally improve capability, but require stronger hardware.
"GGUF" = Popular file format for AI models
"Hybrid model" = Different modes: Reasoning / Standard / Mini
"Instruct" = Fine-tuned model, prepared for conversations.
"MoE" = Mixture of Experts = Efficient way of only activating some parts of the model during inference.
"Q4" = Level of quantization. Lower number: more quantized, faster, a bit less capable.
Q4 could be a good starting point when you try a new model for local inference.
On computer or on server?
What model do you like best for local inference?
What application did you use? LM Studio?
I know several Swedish organizations that have chosen local inference on server, to protect their data. Developers nowadays are needed for this kind of AI implementations.
Some companies instruct their developers to use local inference when generating code from LLM, to protect sensitive data.