Ollama

Ollama enables you to easily run large language models (LLMs) locally. It supports Llama 3, Mistral, Gemma and many others.

Getting Started

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1".enable = true;
}

Acceleration

By default Ollama uses the CPU for inference. To enable GPU acceleration:

Note

NixOS provides documentation for configuring both Nvidia and AMD GPUs drivers. However, if you are using any other distribution, refer to their respective documentation.

CUDA

For NVIDIA GPUs.

Firstly, allow unfree packages:

# Inside perSystem = { system, ... }: { ...
{
  imports = [
    "${inputs.nixpkgs}/nixos/modules/misc/nixpkgs.nix"
  ];
  nixpkgs = {
    hostPlatform = system;
    # Required for CUDA
    config.allowUnfree = true;
  };
}

And then enable CUDA acceleration:

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1" = {
    enable = true;
    acceleration = "cuda";
  };
}

ROCm

For Radeon GPUs.

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1" = {
    enable = true;
    acceleration = "rocm";
  };
}
Links to this page