Local Inference
Running AI Models Locally on
Your Computer / Device / Own Server

Data not leaving the organization: Very good from data security and GDPR perspective.
Note: If you connect your local LLM to external tools via MCP (or API) security decreases IF data is sent outside computer/server.
Complement with:
- Strong passwords
- Encrypted drives
- Backups (avoid data loss) - Also protected with passwords & encryption

POWERFUL GPU(s) 📱

You need powerful GPUs (graphic cards) and other powerful hardware to run AI locally on computer/server. Keep in mind: Other apps shares computer resources if you run apps simultaneously with AI inference.

NVIDIA GPUs - well suited for AI inference.
VRAM (Video Random Access Memory) = Installed video memory. The more, the better (for AI inference).

Example: "Qwen 30B Q4 A3B 2507" seems good at English, but bad at Swedish.
So for Swedish, you should try:
- Another 30B model and compare...
- 70B model which would require powerful hardware, of course

You need to download AI model(s) so you can run inference locally.
You can download via HuggingFace.co or via LM Studio (or other methods).
Examples of open weights AI models:
Deepseek, GPT-oss, Llama, Mistral, Mixtral, Qwen

"B" = Number of billion parameters weights. Example: "30B". More parameters generally improve capability, but require stronger hardware.
"GGUF" = Popular file format for AI models
"Hybrid model" = Different modes: Reasoning / Standard / Mini
"Instruct" = Fine-tuned model, prepared for conversations.
"MoE" = Mixture of Experts = Efficient way of only activating some parts of the model during inference.
"Q4" = Level of quantization. Lower number: more quantized, faster, a bit less capable.
Q4 could be a good starting point when you try a new model for local inference.

I know several Swedish organizations that have chosen local inference on server, to protect their data. Developers nowadays are needed for this kind of AI implementations.
Some companies instruct their developers to use local inference when generating code from LLM, to protect sensitive data.

LM Studio

Page updated

Google Sites

Report abuse