Skip to content

Installation

Step 1 — Core package

pip install urban-worm

Step 2 — Choose an inference backend

GPU-specific PyTorch must be installed before the unsloth extra, otherwise pip falls back to a slow CPU-only build.

pip install torch --index-url https://download.pytorch.org/whl/cu124
pip install "urban-worm[unsloth]"
pip install torch          # MPS is enabled by default on macOS
pip install "urban-worm[unsloth]"

Supported checkpoints

unsloth/Qwen3-VL-3B-Instruct, unsloth/Qwen3-VL-8B-Instruct, unsloth/gemma-3-4b-it, unsloth/Qwen2-VL-2B-Instruct, unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit. Any vision model that unsloth.FastVisionModel can load should work.


Ollama — lightweight (no GPU required)

Install the Ollama application first:

curl -fsSL https://ollama.com/install.sh | sh
pip install "urban-worm[ollama]"
brew install ollama
pip install "urban-worm[ollama]"

Download the installer from ollama.com, then:

pip install "urban-worm[ollama]"

llama.cpp — CLI-based

The llama-mtmd-cli binary must be installed separately:

# macOS / Linux
brew install llama.cpp

# Windows
winget install llama.cpp

Then install the Python binding:

pip install "urban-worm[llamacpp]"
CMAKE_ARGS="-DGGML_CUDA=on" pip install "urban-worm[llamacpp]"
CMAKE_ARGS="-DGGML_METAL=on" pip install "urban-worm[llamacpp]"

Cloud APIs (Claude / GPT-4o / Gemini)

pip install "urban-worm[api]"

Optional extras

Extra What it adds
audio pydub — needed for audio slicing (get_sound_from_location)
all All inference backends + API providers (no audio)
all,audio Everything
dev Pytest, ruff, build tools
pip install "urban-worm[all]"
pip install "urban-worm[all,audio]"

GPU torch + [all]

Pre-install the CUDA torch wheel before running pip install "urban-worm[all]". See the Unsloth tab above for the one-liner.


Dev install from source

pip install -e git+https://github.com/billbillbilly/urbanworm.git#egg=urban-worm
pip install "urban-worm[dev]"