« Opencode » : différence entre les versions

Modification suivante →

VisuelWikicode

Version du 18 mars 2026 à 12:00

Find a suitable model that fit in RAM

https://ollama.com/search list model size on disk, not on VRAM.

https://www.canirun.ai/model/llama3.1-8b may help but Ollama may not give access to all quatization

Also find a model that can think and run tools ( this may help https://www.canirun.ai/ )

Model take VRAM size, but context size also. As stated here, we may have to increase context size to 16k-32k.

https://opencode.ai/docs/fr/providers/#ollama

Increase model's context size

ollama run qwen3.5:2b   

>>> /set parameter num_ctx 32768

Set parameter 'num_ctx' to '32768'

>>> /save qwen3.5:2b-32k

Created new model 'qwen3.5:2b-32k'

Check if it fits in VRAM

Run the model:

ollama run qwen3.5:2b-32k

Then check CPU vs GPU usage. It should be 100% GPU to stay fast

ollama ps
NAME              ID              SIZE      PROCESSOR    CONTEXT    UNTIL              
qwen3.5:2b-32k    094e78c5fe51    5.1 GB    100% GPU     32768      4 minutes from now

Configure Opencode

in ~/.config/opencode/config.json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "name": "Ollama (spacemarine)",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://192.168.0.24:11434/v1"
      },
      "models": {
        "qwen3.5:2b-32k": {
          "tools": true
        }
      }
    }
  }
}

More

Quantization:

From https://smcleod.net/2024/07/understanding-ai/llm-quantisation-through-interactive-visualisations/

Quant Type	Size	Quality	Performance (CUDA)	Performance (Metal)	Notes
IQ1_XS	Smallest	Unusable	Excellent	OK	Basically a jabbering idiot
Q2_K_S	Smallest	Unusable	Excellent	Excellent	Likely generates lots of errors, not very useful
Q2_K_M	Smallest	Very-Very-Low	Excellent	Excellent	Likely generates lots of errors, not very useful
IQ2_XXS	Very Small	Very-Low	Excellent	OK	Surprisingly usable for the GPU poor if you have CUDA
IQ2_XS	Very Small	Low	Very Good	Not Great	Surprisingly usable for the GPU poor if you have CUDA
Q3_K_S	Small	Low	Excellent	Excellent	Usable and quick but has had a few head injuries
Q4_0	Small	Medium-Low	Excellent	Excellent	Legacy Quant Type - Not recommended
IQ3_XXS	Small	Medium-Low	Very Good	Poor	As good as K4_K_S but smaller
Q4_K_S	Medium-Small	Medium-Low	Excellent	Excellent	You may as well use Q4_K_M, or IQ3_X(X)S if you have CUDA
Q5_1	Medium	Medium-Low	Excellent	Excellent	Legacy Quant Type - Not recommended
Q4_K_M	Medium	Medium	Excellent	Excellent	Balanced mid range quant
Q5_K_S	Medium-Large	Medium	Excellent	Excellent	Slightly better than Q4_K_M
Q5_K_M	Medium-Large	Medium-High	Excellent	Excellent	A nice little upgrade from Q4_K_M
Q6_K	Large	Very-High	Very Good	Very Good	Best all-rounder, quality-to-size ratio for systems with enough VRAM
Q8_0	Very Large	Overkill	Good	Good	Large file size, usually overkill and practically indistinguishable from full precision for inference

@@ Ligne 1 : / Ligne 1 : @@
 === Find a suitable model that fit in RAM ===
 https://ollama.com/search list model size on disk, not on VRAM.
+https://www.canirun.ai/model/llama3.1-8b may help but Ollama may not give access to all quatization
 Also find a model that can think and run tools ( this may help https://www.canirun.ai/ )
@@ Ligne 52 : / Ligne 54 : @@
 </syntaxhighlight>
+=== More ===
+Quantization:
+From https://smcleod.net/2024/07/understanding-ai/llm-quantisation-through-interactive-visualisations/
+{| class="wikitable"
+!Quant Type
+!Size
+!Quality
+!Performance (CUDA)
+!Performance (Metal)
+!Notes
+|-
+|IQ1_XS
+|Smallest
+|Unusable
+|Excellent
+|OK
+|Basically a jabbering idiot
+|-
+|Q2_K_S
+|Smallest
+|Unusable
+|Excellent
+|Excellent
+|Likely generates lots of errors, not very useful
+|-
+|Q2_K_M
+|Smallest
+|Very-Very-Low
+|Excellent
+|Excellent
+|Likely generates lots of errors, not very useful
+|-
+|IQ2_XXS
+|Very Small
+|Very-Low
+|Excellent
+|OK
+|Surprisingly usable for the GPU poor if you have CUDA
+|-
+|IQ2_XS
+|Very Small
+|Low
+|Very Good
+|Not Great
+|Surprisingly usable for the GPU poor if you have CUDA
+|-
+|Q3_K_S
+|Small
+|Low
+|Excellent
+|Excellent
+|Usable and quick but has had a few head injuries
+|-
+|Q4_0
+|Small
+|Medium-Low
+|Excellent
+|Excellent
+|Legacy Quant Type - Not recommended
+|-
+|IQ3_XXS
+|Small
+|Medium-Low
+|Very Good
+|Poor
+|As good as K4_K_S but smaller
+|-
+|Q4_K_S
+|Medium-Small
+|Medium-Low
+|Excellent
+|Excellent
+|You may as well use Q4_K_M, or IQ3_X(X)S if you have CUDA
+|-
+|Q5_1
+|Medium
+|Medium-Low
+|Excellent
+|Excellent
+|Legacy Quant Type - Not recommended
+|-
+|Q4_K_M
+|Medium
+|Medium
+|Excellent
+|Excellent
+|Balanced mid range quant
+|-
+|Q5_K_S
+|Medium-Large
+|Medium
+|Excellent
+|Excellent
+|Slightly better than Q4_K_M
+|-
+|Q5_K_M
+|Medium-Large
+|Medium-High
+|Excellent
+|Excellent
+|A nice little upgrade from Q4_K_M
+|-
+|Q6_K
+|Large
+|Very-High
+|Very Good
+|Very Good
+|Best all-rounder, quality-to-size ratio for systems with enough VRAM
+|-
+|Q8_0
+|Very Large
+|Overkill
+|Good
+|Good
+|Large file size, usually overkill and practically indistinguishable from full precision for inference
+|}

Anonyme

Rechercher

« Opencode » : différence entre les versions

Espaces de noms

Plus

Actions de la page

Version du 18 mars 2026 à 12:00

Sommaire

Find a suitable model that fit in RAM

Increase model's context size

Check if it fits in VRAM

Configure Opencode

More

Navigation

Navigation

Outils wiki

Outils wiki

Anonyme

Rechercher

« Opencode » : différence entre les versions

Version du 18 mars 2026 à 12:00

Find a suitable model that fit in RAM

Increase model's context size

Check if it fits in VRAM

Configure Opencode

More

Navigation

Outils wiki

Outils de la page

« Opencode » : différence entre les versions