Table of contents
Application for running local inference on your computer.Β
Company: Element Labs
No need to build your own UI for local inference
Free
Simple to use
Simple to download AI models from within application
Sometimes buggy download of AI models:
See solution under "Download..." down below.Β
Visit LMstudio.AI >> Click button: "Download..."
Run installer file
Tip: Place shortcut file in Autostart folder.Β
Tip: if downloading model is slow or stopped:
Try cancel/retry to trigger faster download.
You don't need to restart entire download (download resumes)
Perform these 4 steps:Β
1: Click "Discover"Β
2: Search for a preferred AI model
3: Read infoΒ
4: Download preferred quantizationΒ
Choose GPU offload value
Aim for about 1 GB free GPU VRAM
Toggle "Remember settings"
Tip: Screenshot these settings
First time loading the model
When you search for "Qwen 30B" = Three different staff picks:Β
"Qwen3 Coder 30B"
- Coding model, instruct.
- Download Q4 / Q6 / Q8
"Qwen3 30B A3B 2507"
- General model, instruct.
- "2507" is newer than "A3B" + improved
- Download Q4 / Q6 / Q8
("Qwen3 30B A3B")
- (Older than "2507" + Not as good)
If you both Q4 and Q6 in same folder:
LM Studio could be confused & not list Q6 inside LM Studio.
Solution: Move Q6 to separate folder. Example for Windows 11:
C:\Users\Name\.lmstudio\models\lmstudio-community\Qwen3-CODER-Q6-30B
Inference can be slower sometimes...
Reason 1:
If you are using other apps taking resources from GPU.Β
Reason 2:
Buggy LM Studio? Try restarting LM Studio (has worked for me)
New computer (NVIDIA GPU 16 GB) + LM Studio + Qwen Coder:Β
Desktop computer, components: 20 000 SEK
GPU: NVIDIA GeForce 16GB
RTX 5060 Ti X3
CPU: AMD Ryzen 9 - 3.3(4,8) GHz
16 cores - Cache: 72 MB - 5900XT 72MBΒ
RAM: 64 GB
(2x32GB) DDR4 3200 MHzΒ
SSD (C): About 7000 MB/s.
R/W: 7300/6800 MB/s. 1TB.Β
Win 11.Β
LM Studio for local inference.Β
Fast inference speed π
English = Good π
Swedish = NOT good β - Solution... β
Coding = Good β (Tried some simple coding tasks...)
How to roughly calculate GPU offload, if needed:
Aim for 1 GB unused VRAM on GPU.
Example for GPU 16 GB VRAM + Qwen 3 Coder 30B Q6:Β
About 15 GB for GPU make sense - The rest for CPU.Β
"My models" >> Size of AI model: 25,1 GB
"My models" >> Gear icon >> "GPU offload": 48 layers in total: Choose how many to load in GPU.Β
Calculate: 29 of 48 layers = 0,604 of total size handled by GPU.Β
0,604 of size 25,1 GB = 15,16 GB loaded in GPU.
Offload the rest to CPU.Β
Speed ππ
- Really fast: 35 tok/s + 0,2 s to first token
Capability, prompting in English π
- ChatGPT level (feels like) when I compared.Β
Capability, prompting in Swedish β
- Rather BAD output (not so good at Swedish)
Speed π
- Fast: 20 tok/s + 0.2 s to first token
- Faster than all ChatGPT models (it seems).Β
Capability, prompting in English π
- ChatGPT level, when I compared.Β
Capability, prompting in Swedish πβ
- Seems pretty good...
Note: I have only had little time to test this model, since I've only had this computer a couple of days.Β
Speed π
- Fast: 27 tok/s + 0.1 s to first token
- Faster than all ChatGPT models (it seems).Β
Coding Capability πβ
Ok, so far, but I have not tried any advanced coding.Β
Note: I have only had little time to test this model, since I've only had this computer a couple of days.Β
Speed π
- Fast: 22 tok/s + 0.2 s to first token
- Faster than all ChatGPT models (it seems).Β
Coding Capability πβ
Ok, so far, but I have not tried any advanced coding.Β