Where are my downloaded models stored?

Models are stored in the '~/.ollama' directory on your host machine, ensuring they persist even if the container is removed or updated.

What should I do if a model is too slow?

Check if GPU acceleration is active using 'ujust ollama status'. If your VRAM is limited, try pulling a smaller or more highly quantized model version.

How do I enable GPU acceleration for Ollama?

You can enable hardware acceleration by running 'ujust ollama config --gpu-type' followed by 'nvidia', 'amd', or 'intel' depending on your hardware.

Can I access the local Ollama server from other applications?

Yes, by default the server exposes an API on port 11434, which is compatible with tools like LangChain, Continue, and other AI clients.

Ollama Local LLM Manager

Name: Ollama Local LLM Manager
Author: atrawog

byatrawog

Ciencia de Datos y ML

Manages local Ollama inference servers using Podman Quadlet to provide GPU-accelerated LLM capabilities.

Acerca de

The Ollama skill provides a streamlined interface for deploying and managing local Large Language Model (LLM) inference servers on Bazzite. It utilizes Podman Quadlet for efficient containerization and supports a single-instance design that optimizes shared GPU memory across loaded models. Whether you need to configure hardware acceleration for NVIDIA, AMD, or Intel GPUs, pull the latest open-source models like Llama 3 and Mistral, or integrate a local API into your development workflow, this skill automates the complex setup and management tasks involved in running local AI.

Características Principales

Persistent model storage and workspace volume mounting
Multi-vendor GPU acceleration support (NVIDIA, AMD, Intel, and CPU)
Streamlined model lifecycle management (Pull, List, Run, Delete)
Automated Podman Quadlet container management
Configurable API port, bind address, and context window size
0 GitHub stars

Casos de Uso

Running private LLMs locally to ensure data privacy during development
Testing and comparing different open-source models without cloud inference costs
Providing a local backend for AI-integrated coding tools and applications

Acerca de

Características Principales

Persistent model storage and workspace volume mounting
Multi-vendor GPU acceleration support (NVIDIA, AMD, Intel, and CPU)
Streamlined model lifecycle management (Pull, List, Run, Delete)
Automated Podman Quadlet container management
Configurable API port, bind address, and context window size
0 GitHub stars

Casos de Uso

Running private LLMs locally to ensure data privacy during development
Testing and comparing different open-source models without cloud inference costs
Providing a local backend for AI-integrated coding tools and applications