Opencode
De Mathux
Find a suitable model that fit in RAM
https://ollama.com/search list model size on disk, not on VRAM.
https://www.canirun.ai/model/llama3.1-8b may help but Ollama may not give access to all quatization
Also find a model that can run tools (and think?) ( this may help https://www.canirun.ai/ )
Model take VRAM size, but context size also. As stated here, we may have to increase context size to 16k-32k.
https://opencode.ai/docs/fr/providers/#ollama
Increase model's context size
ollama run qwen3.5:2b
>>> /set parameter num_ctx 32768
Set parameter 'num_ctx' to '32768'
>>> /save qwen3.5:2b-32k
Created new model 'qwen3.5:2b-32k'
Check if it fits in VRAM
Run the model:
ollama run qwen3.5:2b-32k
Then check CPU vs GPU usage. It should be 100% GPU to stay fast
ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
qwen3.5:2b-32k 094e78c5fe51 5.1 GB 100% GPU 32768 4 minutes from now
Configure Opencode
in ~/.config/opencode/config.json
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"ollama": {
"name": "Ollama (spacemarine)",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://192.168.0.24:11434/v1"
},
"models": {
"qwen3.5:2b-32k": {
"tools": true
}
}
}
}
}
More
Quantization:
From https://smcleod.net/2024/07/understanding-ai/llm-quantisation-through-interactive-visualisations/
| Quant Type | Size | Quality | Performance (CUDA) | Performance (Metal) | Notes |
|---|---|---|---|---|---|
| IQ1_XS | Smallest | Unusable | Excellent | OK | Basically a jabbering idiot |
| Q2_K_S | Smallest | Unusable | Excellent | Excellent | Likely generates lots of errors, not very useful |
| Q2_K_M | Smallest | Very-Very-Low | Excellent | Excellent | Likely generates lots of errors, not very useful |
| IQ2_XXS | Very Small | Very-Low | Excellent | OK | Surprisingly usable for the GPU poor if you have CUDA |
| IQ2_XS | Very Small | Low | Very Good | Not Great | Surprisingly usable for the GPU poor if you have CUDA |
| Q3_K_S | Small | Low | Excellent | Excellent | Usable and quick but has had a few head injuries |
| Q4_0 | Small | Medium-Low | Excellent | Excellent | Legacy Quant Type - Not recommended |
| IQ3_XXS | Small | Medium-Low | Very Good | Poor | As good as K4_K_S but smaller |
| Q4_K_S | Medium-Small | Medium-Low | Excellent | Excellent | You may as well use Q4_K_M, or IQ3_X(X)S if you have CUDA |
| Q5_1 | Medium | Medium-Low | Excellent | Excellent | Legacy Quant Type - Not recommended |
| Q4_K_M | Medium | Medium | Excellent | Excellent | Balanced mid range quant |
| Q5_K_S | Medium-Large | Medium | Excellent | Excellent | Slightly better than Q4_K_M |
| Q5_K_M | Medium-Large | Medium-High | Excellent | Excellent | A nice little upgrade from Q4_K_M |
| Q6_K | Large | Very-High | Very Good | Very Good | Best all-rounder, quality-to-size ratio for systems with enough VRAM |
| Q8_0 | Very Large | Overkill | Good | Good | Large file size, usually overkill and practically indistinguishable from full precision for inference |