LM Studio
Application - Run LLMs Locally on Computer

Tip: if downloading model is slow or stopped:
Try cancel/retry to trigger faster download.
You don't need to restart entire download (download resumes)

Perform these 4 steps:

1: Click "Discover"

2: Search for a preferred AI model

3: Read info

4: Download preferred quantization

First Time Loading the Model ❗

Choose GPU offload value
Aim for about 1 GB free GPU VRAM
Toggle "Remember settings"
Tip: Screenshot these settings

First time loading the model

AI models = Qwen 30B - THREE Different 🤖

When you search for "Qwen 30B" = Three different staff picks:

"Qwen3 Coder 30B"
- Coding model, instruct.
- Download Q4 / Q6 / Q8
"Qwen3 30B A3B 2507"
- General model, instruct.
- "2507" is newer than "A3B" + improved
- Download Q4 / Q6 / Q8
("Qwen3 30B A3B")
- (Older than "2507" + Not as good)

LLM Storage Folder - Folder Names 📁

If you both Q4 and Q6 in same folder:
LM Studio could be confused & not list Q6 inside LM Studio.
Solution: Move Q6 to separate folder. Example for Windows 11:
C:\Users\Name\.lmstudio\models\lmstudio-community\Qwen3-CODER-Q6-30B

Speed Can Vary 📈

Inference can be slower sometimes...

Reason 1:
If you are using other apps taking resources from GPU.
Reason 2:
Buggy LM Studio? Try restarting LM Studio (has worked for me)

My Experiment, Sep 2025 ✨

New computer (NVIDIA GPU 16 GB) + LM Studio + Qwen Coder:

Computer 🖥️

Desktop computer, components: 20 000 SEK

GPU: NVIDIA GeForce 16GB
RTX 5060 Ti X3
CPU: AMD Ryzen 9 - 3.3(4,8) GHz
16 cores - Cache: 72 MB - 5900XT 72MB
RAM: 64 GB
(2x32GB) DDR4 3200 MHz
SSD (C): About 7000 MB/s.
R/W: 7300/6800 MB/s. 1TB.
Win 11.
LM Studio for local inference.

Conclusion (short version)💡

Fast inference speed 👍
English = Good 👍
Swedish = NOT good ❌ - Solution... ❓
Coding = Good ❓ (Tried some simple coding tasks...)

GPU Offload ⚙️

How to roughly calculate GPU offload, if needed:
Aim for 1 GB unused VRAM on GPU.
Example for GPU 16 GB VRAM + Qwen 3 Coder 30B Q6:

About 15 GB for GPU make sense - The rest for CPU.

"My models" >> Size of AI model: 25,1 GB
"My models" >> Gear icon >> "GPU offload": 48 layers in total: Choose how many to load in GPU.
Calculate: 29 of 48 layers = 0,604 of total size handled by GPU.
0,604 of size 25,1 GB = 15,16 GB loaded in GPU.
Offload the rest to CPU.

1: Results for Qwen 3 GENERAL 30B Q4 👍

Speed 👍🚀
- Really fast: 35 tok/s + 0,2 s to first token
Capability, prompting in English 👍
- ChatGPT level (feels like) when I compared.
Capability, prompting in Swedish ❌
- Rather BAD output (not so good at Swedish)

2: Results for Qwen 3 GENERAL 30B Q6 👍

Speed 👍
- Fast: 20 tok/s + 0.2 s to first token
- Faster than all ChatGPT models (it seems).
Capability, prompting in English 👍
- ChatGPT level, when I compared.
Capability, prompting in Swedish 👍❓
- Seems pretty good...

3: Results for Qwen 3 CODER 30B Q4 👍❓

Note: I have only had little time to test this model, since I've only had this computer a couple of days.

Speed 👍
- Fast: 27 tok/s + 0.1 s to first token
- Faster than all ChatGPT models (it seems).
Coding Capability 👍❓
Ok, so far, but I have not tried any advanced coding.

4: Results for Qwen 3 CODER 30B Q6 👍❓

Note: I have only had little time to test this model, since I've only had this computer a couple of days.

Speed 👍
- Fast: 22 tok/s + 0.2 s to first token
- Faster than all ChatGPT models (it seems).
Coding Capability 👍❓
Ok, so far, but I have not tried any advanced coding.

Page updated

Google Sites

Report abuse

LM StudioApplication - Run LLMs Locally on Computer

Overview of GUI 👀

LM Studio - What Is It? 🤖

Pros (with LM Studio)👍

Cons (with LM Studio) ❌

Install LM Studio 👨‍💻

Download an AI Model 🔍