« Opencode » : différence entre les versions

De Mathux
Mathieu (discussion | contributions)
Page créée avec « === Find a suitable model that fit in RAM === https://ollama.com/search list model size on disk, not on VRAM. Also find a model that can think and run tools ( this may help https://www.canirun.ai/ ) Model take VRAM size, but context size also. As stated here, we may have to increase context size to 16k-32k. https://opencode.ai/docs/fr/providers/#ollama ==== Increase model's context size ==== <syntaxhighlight lang="bash"> ollama run qwen3.5:2b >>> /set pa... »
 
Mathieu (discussion | contributions)
Aucun résumé des modifications
Ligne 1 : Ligne 1 :
=== Find a suitable model that fit in RAM ===
=== Find a suitable model that fit in RAM ===
https://ollama.com/search list model size on disk, not on VRAM.
https://ollama.com/search list model size on disk, not on VRAM.
https://www.canirun.ai/model/llama3.1-8b may help but Ollama may not give access to all quatization


Also find a model that can think and run tools ( this may help https://www.canirun.ai/ )
Also find a model that can think and run tools ( this may help https://www.canirun.ai/ )
Ligne 52 : Ligne 54 :


</syntaxhighlight>
</syntaxhighlight>
=== More ===
Quantization:
From https://smcleod.net/2024/07/understanding-ai/llm-quantisation-through-interactive-visualisations/
{| class="wikitable"
!Quant Type
!Size
!Quality
!Performance (CUDA)
!Performance (Metal)
!Notes
|-
|IQ1_XS
|Smallest
|Unusable
|Excellent
|OK
|Basically a jabbering idiot
|-
|Q2_K_S
|Smallest
|Unusable
|Excellent
|Excellent
|Likely generates lots of errors, not very useful
|-
|Q2_K_M
|Smallest
|Very-Very-Low
|Excellent
|Excellent
|Likely generates lots of errors, not very useful
|-
|IQ2_XXS
|Very Small
|Very-Low
|Excellent
|OK
|Surprisingly usable for the GPU poor if you have CUDA
|-
|IQ2_XS
|Very Small
|Low
|Very Good
|Not Great
|Surprisingly usable for the GPU poor if you have CUDA
|-
|Q3_K_S
|Small
|Low
|Excellent
|Excellent
|Usable and quick but has had a few head injuries
|-
|Q4_0
|Small
|Medium-Low
|Excellent
|Excellent
|Legacy Quant Type - Not recommended
|-
|IQ3_XXS
|Small
|Medium-Low
|Very Good
|Poor
|As good as K4_K_S but smaller
|-
|Q4_K_S
|Medium-Small
|Medium-Low
|Excellent
|Excellent
|You may as well use Q4_K_M, or IQ3_X(X)S if you have CUDA
|-
|Q5_1
|Medium
|Medium-Low
|Excellent
|Excellent
|Legacy Quant Type - Not recommended
|-
|Q4_K_M
|Medium
|Medium
|Excellent
|Excellent
|Balanced mid range quant
|-
|Q5_K_S
|Medium-Large
|Medium
|Excellent
|Excellent
|Slightly better than Q4_K_M
|-
|Q5_K_M
|Medium-Large
|Medium-High
|Excellent
|Excellent
|A nice little upgrade from Q4_K_M
|-
|Q6_K
|Large
|Very-High
|Very Good
|Very Good
|Best all-rounder, quality-to-size ratio for systems with enough VRAM
|-
|Q8_0
|Very Large
|Overkill
|Good
|Good
|Large file size, usually overkill and practically indistinguishable from full precision for inference
|}

Version du 18 mars 2026 à 11:00

Find a suitable model that fit in RAM

https://ollama.com/search list model size on disk, not on VRAM.

https://www.canirun.ai/model/llama3.1-8b may help but Ollama may not give access to all quatization

Also find a model that can think and run tools ( this may help https://www.canirun.ai/ )


Model take VRAM size, but context size also. As stated here, we may have to increase context size to 16k-32k.

https://opencode.ai/docs/fr/providers/#ollama

Increase model's context size

ollama run qwen3.5:2b   

>>> /set parameter num_ctx 32768

Set parameter 'num_ctx' to '32768'

>>> /save qwen3.5:2b-32k

Created new model 'qwen3.5:2b-32k'

Check if it fits in VRAM

Run the model:

ollama run qwen3.5:2b-32k

Then check CPU vs GPU usage. It should be 100% GPU to stay fast

ollama ps
NAME              ID              SIZE      PROCESSOR    CONTEXT    UNTIL              
qwen3.5:2b-32k    094e78c5fe51    5.1 GB    100% GPU     32768      4 minutes from now

Configure Opencode

in ~/.config/opencode/config.json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "name": "Ollama (spacemarine)",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://192.168.0.24:11434/v1"
      },
      "models": {
        "qwen3.5:2b-32k": {
          "tools": true
        }
      }
    }
  }
}

More

Quantization:

From https://smcleod.net/2024/07/understanding-ai/llm-quantisation-through-interactive-visualisations/

Quant Type Size Quality Performance (CUDA) Performance (Metal) Notes
IQ1_XS Smallest Unusable Excellent OK Basically a jabbering idiot
Q2_K_S Smallest Unusable Excellent Excellent Likely generates lots of errors, not very useful
Q2_K_M Smallest Very-Very-Low Excellent Excellent Likely generates lots of errors, not very useful
IQ2_XXS Very Small Very-Low Excellent OK Surprisingly usable for the GPU poor if you have CUDA
IQ2_XS Very Small Low Very Good Not Great Surprisingly usable for the GPU poor if you have CUDA
Q3_K_S Small Low Excellent Excellent Usable and quick but has had a few head injuries
Q4_0 Small Medium-Low Excellent Excellent Legacy Quant Type - Not recommended
IQ3_XXS Small Medium-Low Very Good Poor As good as K4_K_S but smaller
Q4_K_S Medium-Small Medium-Low Excellent Excellent You may as well use Q4_K_M, or IQ3_X(X)S if you have CUDA
Q5_1 Medium Medium-Low Excellent Excellent Legacy Quant Type - Not recommended
Q4_K_M Medium Medium Excellent Excellent Balanced mid range quant
Q5_K_S Medium-Large Medium Excellent Excellent Slightly better than Q4_K_M
Q5_K_M Medium-Large Medium-High Excellent Excellent A nice little upgrade from Q4_K_M
Q6_K Large Very-High Very Good Very Good Best all-rounder, quality-to-size ratio for systems with enough VRAM
Q8_0 Very Large Overkill Good Good Large file size, usually overkill and practically indistinguishable from full precision for inference