llamaproxy.net

Proxy requests to publicly accessible Ollama instances discovered across the internet

We scan the internet for open Ollama deployments and automatically proxy your requests to available models. No need to run your own inference hardware - just use our distributed network of public nodes.

API Endpoints

POST /api/generate

Generate text completions using distributed Ollama models

{ "model": "llama2:latest", "prompt": "Explain quantum computing", "stream": false }
POST /api/chat

Chat completions with conversation context

{ "model": "llama2:latest", "messages": [ {"role": "user", "content": "Hello!"} ] }
GET /api/tags

List all available models across the network

{ "models": [ {"name": "llama2:latest"}, {"name": "codellama:7b"}, {"name": "mistral:7b"} ] }
GET /api/ps

Show currently running models with resource usage

{ "models": [ { "name": "llama2:latest", "size": 3826793677, "size_vram": 0, "servers_running": 2 } ] }
terminal — bash — 80x24
user@machine:~$ curl -X POST https://api.llamaproxy.net/api/generate -H "Content-Type: application/json" -d '{"model": "llama2:latest", "prompt": "What is the meaning of life?"}' {"model":"llama2:latest","response":"The meaning of life is a profound philosophical question that has been pondered by humans throughout history..."} user@machine:~$ curl https://api.llamaproxy.net/api/tags {"models":[{"name":"llama2:latest"},{"name":"codellama:7b"},{"name":"mistral:7b"},{"name":"neural-chat:7b"}]} user@machine:~$ curl -X POST https://api.llamaproxy.net/api/chat -H "Content-Type: application/json" -d '{"model": "llama2:latest", "messages": [{"role": "user", "content": "Hello!"}]}' {"model":"llama2:latest","message":{"role":"assistant","content":"Hello! How can I help you today?"}} user@machine:~$ curl https://api.llamaproxy.net/api/ps {"models":[{"name":"llama2:latest","size":3826793677,"size_vram":0,"servers_running":2},{"name":"qwen2.5:1.5b","size":986061892,"servers_running":1}]} user@machine:~$