Self-hosted AI coding assistant. An opensource / on-prem alternative to GitHub Copilot.
Warning
Tabby is still in the alpha phase
Features
- Self-contained, with no need for a DBMS or cloud service
- Web UI for visualizing and configuration models and MLOps.
- OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE).
- Consumer level GPU supports (FP-16 weight loading with various optimization).
Demo
Get started
Docker
The easiest way of getting started is using the docker image:
# Create data dir and grant owner to 1000 (Tabby run as uid 1000 in container) mkdir -p data/hf_cache && chown -R 1000 data docker run -it --rm -v ./data:/data -v ./data/hf_cache:/home/app/.cache/huggingface -p 5000:5000 -e MODEL_NAME=TabbyML/J-350M tabbyml/tabby
To use the GPU backend (triton) for a faster inference speed:
docker run --gpus all -it --rm -v ./data:/data -v ./data/hf_cache:/home/app/.cache/huggingface -p 5000:5000 -e MODEL_NAME=TabbyML/J-350M -e MODEL_BACKEND=triton tabbyml/tabby
Note: To use GPUs, you need to install the NVIDIA Container Toolkit. We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
You can then query the server using /v1/completions
endpoint:
curl -X POST http://localhost:5000/v1/completions -H 'Content-Type: application/json' --data '{ "prompt": "def binarySearch(arr, left, right, x):n mid = (left +" }'
We also provides an interactive playground in admin panel localhost:5000/_admin
Skypilot
See deployment/skypilot/README.md
API documentation
Tabby opens an FastAPI server at localhost:5000, which embeds an OpenAPI documentation of the HTTP API.
Development
Go to development
directory.
or
make dev-triton # Turn on triton backend (for cuda env developers)