Ollama enables you to easily run large language models (LLMs) locally. It supports Llama 3, Mistral, Gemma and many others.
❄️You can now perform LLM inference with Ollama in services-flake!https://t.co/rtHIYdnPfb pic.twitter.com/1hBqMyViEm
— NixOS Asia (@nixos_asia) June 12, 2024
Getting Started
# In `perSystem.process-compose.<name>`
{
services.ollama."ollama1".enable = true;
}
Acceleration
By default Ollama uses the CPU for inference. To enable GPU acceleration:
CUDA
For NVIDIA GPUs.
Firstly, allow unfree packages:
# Inside perSystem = { system, ... }: { ...
{
imports = [
"${inputs.nixpkgs}/nixos/modules/misc/nixpkgs.nix"
];
nixpkgs = {
hostPlatform = system;
# Required for CUDA
config.allowUnfree = true;
};
}
And then enable CUDA acceleration:
# In `perSystem.process-compose.<name>`
{
services.ollama."ollama1" = {
enable = true;
acceleration = "cuda";
};
}
ROCm
For Radeon GPUs.
# In `perSystem.process-compose.<name>`
{
services.ollama."ollama1" = {
enable = true;
acceleration = "rocm";
};
}