« Opencode » : différence entre les versions
De Mathux
Page créée avec « === Find a suitable model that fit in RAM === https://ollama.com/search list model size on disk, not on VRAM. Also find a model that can think and run tools ( this may help https://www.canirun.ai/ ) Model take VRAM size, but context size also. As stated here, we may have to increase context size to 16k-32k. https://opencode.ai/docs/fr/providers/#ollama ==== Increase model's context size ==== <syntaxhighlight lang="bash"> ollama run qwen3.5:2b >>> /set pa... » |
Aucun résumé des modifications |
||
| Ligne 1 : | Ligne 1 : | ||
=== Find a suitable model that fit in RAM === | === Find a suitable model that fit in RAM === | ||
https://ollama.com/search list model size on disk, not on VRAM. | https://ollama.com/search list model size on disk, not on VRAM. | ||
https://www.canirun.ai/model/llama3.1-8b may help but Ollama may not give access to all quatization | |||
Also find a model that can think and run tools ( this may help https://www.canirun.ai/ ) | Also find a model that can think and run tools ( this may help https://www.canirun.ai/ ) | ||
| Ligne 52 : | Ligne 54 : | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== More === | |||
Quantization: | |||
From https://smcleod.net/2024/07/understanding-ai/llm-quantisation-through-interactive-visualisations/ | |||
{| class="wikitable" | |||
!Quant Type | |||
!Size | |||
!Quality | |||
!Performance (CUDA) | |||
!Performance (Metal) | |||
!Notes | |||
|- | |||
|IQ1_XS | |||
|Smallest | |||
|Unusable | |||
|Excellent | |||
|OK | |||
|Basically a jabbering idiot | |||
|- | |||
|Q2_K_S | |||
|Smallest | |||
|Unusable | |||
|Excellent | |||
|Excellent | |||
|Likely generates lots of errors, not very useful | |||
|- | |||
|Q2_K_M | |||
|Smallest | |||
|Very-Very-Low | |||
|Excellent | |||
|Excellent | |||
|Likely generates lots of errors, not very useful | |||
|- | |||
|IQ2_XXS | |||
|Very Small | |||
|Very-Low | |||
|Excellent | |||
|OK | |||
|Surprisingly usable for the GPU poor if you have CUDA | |||
|- | |||
|IQ2_XS | |||
|Very Small | |||
|Low | |||
|Very Good | |||
|Not Great | |||
|Surprisingly usable for the GPU poor if you have CUDA | |||
|- | |||
|Q3_K_S | |||
|Small | |||
|Low | |||
|Excellent | |||
|Excellent | |||
|Usable and quick but has had a few head injuries | |||
|- | |||
|Q4_0 | |||
|Small | |||
|Medium-Low | |||
|Excellent | |||
|Excellent | |||
|Legacy Quant Type - Not recommended | |||
|- | |||
|IQ3_XXS | |||
|Small | |||
|Medium-Low | |||
|Very Good | |||
|Poor | |||
|As good as K4_K_S but smaller | |||
|- | |||
|Q4_K_S | |||
|Medium-Small | |||
|Medium-Low | |||
|Excellent | |||
|Excellent | |||
|You may as well use Q4_K_M, or IQ3_X(X)S if you have CUDA | |||
|- | |||
|Q5_1 | |||
|Medium | |||
|Medium-Low | |||
|Excellent | |||
|Excellent | |||
|Legacy Quant Type - Not recommended | |||
|- | |||
|Q4_K_M | |||
|Medium | |||
|Medium | |||
|Excellent | |||
|Excellent | |||
|Balanced mid range quant | |||
|- | |||
|Q5_K_S | |||
|Medium-Large | |||
|Medium | |||
|Excellent | |||
|Excellent | |||
|Slightly better than Q4_K_M | |||
|- | |||
|Q5_K_M | |||
|Medium-Large | |||
|Medium-High | |||
|Excellent | |||
|Excellent | |||
|A nice little upgrade from Q4_K_M | |||
|- | |||
|Q6_K | |||
|Large | |||
|Very-High | |||
|Very Good | |||
|Very Good | |||
|Best all-rounder, quality-to-size ratio for systems with enough VRAM | |||
|- | |||
|Q8_0 | |||
|Very Large | |||
|Overkill | |||
|Good | |||
|Good | |||
|Large file size, usually overkill and practically indistinguishable from full precision for inference | |||
|} | |||
Version du 18 mars 2026 à 11:00
Find a suitable model that fit in RAM
https://ollama.com/search list model size on disk, not on VRAM.
https://www.canirun.ai/model/llama3.1-8b may help but Ollama may not give access to all quatization
Also find a model that can think and run tools ( this may help https://www.canirun.ai/ )
Model take VRAM size, but context size also. As stated here, we may have to increase context size to 16k-32k.
https://opencode.ai/docs/fr/providers/#ollama
Increase model's context size
ollama run qwen3.5:2b
>>> /set parameter num_ctx 32768
Set parameter 'num_ctx' to '32768'
>>> /save qwen3.5:2b-32k
Created new model 'qwen3.5:2b-32k'
Check if it fits in VRAM
Run the model:
ollama run qwen3.5:2b-32k
Then check CPU vs GPU usage. It should be 100% GPU to stay fast
ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
qwen3.5:2b-32k 094e78c5fe51 5.1 GB 100% GPU 32768 4 minutes from now
Configure Opencode
in ~/.config/opencode/config.json
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"ollama": {
"name": "Ollama (spacemarine)",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://192.168.0.24:11434/v1"
},
"models": {
"qwen3.5:2b-32k": {
"tools": true
}
}
}
}
}
More
Quantization:
From https://smcleod.net/2024/07/understanding-ai/llm-quantisation-through-interactive-visualisations/
| Quant Type | Size | Quality | Performance (CUDA) | Performance (Metal) | Notes |
|---|---|---|---|---|---|
| IQ1_XS | Smallest | Unusable | Excellent | OK | Basically a jabbering idiot |
| Q2_K_S | Smallest | Unusable | Excellent | Excellent | Likely generates lots of errors, not very useful |
| Q2_K_M | Smallest | Very-Very-Low | Excellent | Excellent | Likely generates lots of errors, not very useful |
| IQ2_XXS | Very Small | Very-Low | Excellent | OK | Surprisingly usable for the GPU poor if you have CUDA |
| IQ2_XS | Very Small | Low | Very Good | Not Great | Surprisingly usable for the GPU poor if you have CUDA |
| Q3_K_S | Small | Low | Excellent | Excellent | Usable and quick but has had a few head injuries |
| Q4_0 | Small | Medium-Low | Excellent | Excellent | Legacy Quant Type - Not recommended |
| IQ3_XXS | Small | Medium-Low | Very Good | Poor | As good as K4_K_S but smaller |
| Q4_K_S | Medium-Small | Medium-Low | Excellent | Excellent | You may as well use Q4_K_M, or IQ3_X(X)S if you have CUDA |
| Q5_1 | Medium | Medium-Low | Excellent | Excellent | Legacy Quant Type - Not recommended |
| Q4_K_M | Medium | Medium | Excellent | Excellent | Balanced mid range quant |
| Q5_K_S | Medium-Large | Medium | Excellent | Excellent | Slightly better than Q4_K_M |
| Q5_K_M | Medium-Large | Medium-High | Excellent | Excellent | A nice little upgrade from Q4_K_M |
| Q6_K | Large | Very-High | Very Good | Very Good | Best all-rounder, quality-to-size ratio for systems with enough VRAM |
| Q8_0 | Very Large | Overkill | Good | Good | Large file size, usually overkill and practically indistinguishable from full precision for inference |