Ollama

Ollama enables you to easily run large language models (LLMs) locally. It supports Llama 3, Mistral, Gemma and many others.

❄️You can now perform LLM inference with Ollama in services-flake!https://t.co/rtHIYdnPfb pic.twitter.com/1hBqMyViEm
— NixOS Asia (@nixos_asia) June 12, 2024

Getting Started

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1".enable = true;
}

Acceleration

By default Ollama uses the CPU for inference. To enable GPU acceleration:

Note

NixOS provides documentation for configuring both Nvidia and AMD GPUs drivers. However, if you are using any other distribution, refer to their respective documentation.

CUDA

For NVIDIA GPUs.

Firstly, allow unfree packages:

# Inside perSystem = { system, ... }: { ...
{
  imports = [
    "${inputs.nixpkgs}/nixos/modules/misc/nixpkgs.nix"
  ];
  nixpkgs = {
    hostPlatform = system;
    # Required for CUDA
    config.allowUnfree = true;
  };
}

And then enable CUDA acceleration:

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1" = {
    enable = true;
    acceleration = "cuda";
  };
}

ROCm

For Radeon GPUs.

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1" = {
    enable = true;
    acceleration = "rocm";
  };
}