llamaproxy.net - Distributed Ollama Proxy Network

API Endpoints

POST /api/generate

Generate text completions using distributed Ollama models

{
  "model": "llama2:latest",
  "prompt": "Explain quantum computing",
  "stream": false
}

POST /api/chat

Chat completions with conversation context

{
  "model": "llama2:latest",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ]
}

GET /api/tags

List all available models across the network

{
  "models": [
    {"name": "llama2:latest"},
    {"name": "codellama:7b"},
    {"name": "mistral:7b"}
  ]
}

GET /api/ps

Show currently running models with resource usage

{
  "models": [
    {
      "name": "llama2:latest",
      "size": 3826793677,
      "size_vram": 0,
      "servers_running": 2
    }
  ]
}

terminal — bash — 80x24

user@machine:~$ curl -X POST https://api.llamaproxy.net/api/generate -H "Content-Type: application/json" -d '{"model": "llama2:latest", "prompt": "What is the meaning of life?"}' {"model":"llama2:latest","response":"The meaning of life is a profound philosophical question that has been pondered by humans throughout history..."} user@machine:~$ curl https://api.llamaproxy.net/api/tags {"models":[{"name":"llama2:latest"},{"name":"codellama:7b"},{"name":"mistral:7b"},{"name":"neural-chat:7b"}]} user@machine:~$ curl -X POST https://api.llamaproxy.net/api/chat -H "Content-Type: application/json" -d '{"model": "llama2:latest", "messages": [{"role": "user", "content": "Hello!"}]}' {"model":"llama2:latest","message":{"role":"assistant","content":"Hello! How can I help you today?"}} user@machine:~$ curl https://api.llamaproxy.net/api/ps {"models":[{"name":"llama2:latest","size":3826793677,"size_vram":0,"servers_running":2},{"name":"qwen2.5:1.5b","size":986061892,"servers_running":1}]} user@machine:~$ █