API Documentation

For the most up-to-date API reference, check the System API Documentation page inside Herdsman.

API documentation entry

OpenAI-Compatible API

OpenAI-compatible API service. No authentication is required.

Endpoint

Base URL

http://localhost:8080/v1

Endpoint Examples

// AI Model API
POST http://localhost:8080/v1/chat/completions

// Anthropic API
POST http://localhost:8080/v1/anthropic/messages

List Models

Get a list of all available models.

Method: GET
Endpoint: /v1/models

Request Example

GET /v1/models

Response Example

{
  "object": "list",
  "data": [
    {
      "id": "llama3-8b",
      "object": "model",
      "created": 1677858242,
      "owned_by": "Herdsman",
      "status": "running"
    }
  ]
}

Chat Completion

Send a chat request and receive an AI response.

Method: POST
Endpoint: /v1/chat/completions

Parameters

Parameter	Type	Required	Description
`model`	string	✓	Model name
`messages`	array	✓	Chat messages list
`temperature`	number	✗	Sampling temperature
`max_tokens`	number	✗	Maximum number of tokens to generate
`top_p`	number	✗	Nucleus sampling probability
`stream`	boolean	✗	Whether to use streaming response
`reasoning_effort`	string	✗	OpenAI Chat Completions-compatible reasoning level. Allowed values: `low`, `medium`, `high`. Maps to llama.cpp template args.
`thinking_enabled`	boolean	✗	Herdsman compatibility extension: enables thinking mode for supported models. Maps to llama.cpp `enable_thinking`.
`thinking_tokens`	number	✗	Herdsman compatibility extension: thinking token budget. Maps to llama.cpp `reasoning_budget`.

Request Example

POST /v1/chat/completions
Content-Type: application/json

{
  "model": "llama3-8b",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "reasoning_effort": "high"
}

Response Example

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677858242,
  "model": "llama3-8b",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'm doing well, thank you! How can I help you today?"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 17,
    "total_tokens": 30
  }
}

Embedding API

Convert text into a vector representation.

Method: POST
Endpoint: /v1/embeddings

Parameters

Parameter	Type	Required	Description
`model`	string	✓	Model name
`input`	string/array	✓	Input text or array of texts
`encoding_format`	string	✗	Embedding encoding format

Request Example

POST /v1/embeddings
Content-Type: application/json

{
  "model": "llama3-8b",
  "input": "Hello world"
}

Response Example

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        0.0023064255,
        -0.009327664,
        0.015790065
      ]
    }
  ],
  "model": "llama3-8b",
  "usage": {
    "prompt_tokens": 2,
    "total_tokens": 2
  }
}

Rerank API

Reorder documents by their relevance to a query.

Method: POST
Endpoint: /v1/rerank

Parameters

Parameter	Type	Required	Description
`model`	string	✓	Model name
`query`	string	✓	Query text
`documents`	array	✓	List of documents to rerank
`top_n`	number	✗	Maximum number of results to return

Anthropic API

Anthropic Messages

Chat with Anthropic-compatible models.

Method: POST
Endpoint: /v1/anthropic/messages

Note: This endpoint implements a subset of the standard Anthropic Messages API. Parameters such as system, stop_sequences, top_p, top_k, stream, tools, and metadata are not currently supported.

Parameters

Parameter	Type	Required	Description
`model`	string	✓	Model name
`messages`	array	✓	Chat messages list
`temperature`	number	✗	Sampling temperature
`max_tokens`	number	✗	Maximum number of tokens to generate

Request Example

POST /v1/anthropic/messages
Content-Type: application/json

{
  "model": "claude-3-opus-20240229",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1000
}

AI Model API

Image Generation API

Generate images from text prompts.

Method: POST
Endpoint: /v1/images/generations

Parameters

Parameter	Type	Required	Description
`prompt`	string	✓	Image generation prompt
`model`	string	✗	Model name
`n`	number	✗	Number of images to generate
`size`	string	✗	Image size

Image Edits API

Edit an existing image.

Method: POST
Endpoint: /v1/images/edits

Parameters

Parameter	Type	Required	Description
`image`	file	✓	Image file or image data
`prompt`	string	✓	Image generation prompt
`mask`	file	✗	Mask image file
`model`	string	✗	Model name
`n`	number	✗	Number of images to generate
`size`	string	✗	Image size

Image-to-Image API

Generate new images derived from an existing image.

Method: POST
Endpoint: /v1/images/img2img

Parameters

Parameter	Type	Required	Description
`image`	file	✓	Image file or image data
`prompt`	string	✓	Image generation prompt
`model`	string	✗	Model name
`n`	number	✗	Number of images to generate
`size`	string	✗	Image size

OCR API

Recognize text in an image. Returns full-page text, per-line results, confidence scores, and bounding box coordinates.

Method: POST
Endpoint: /v1/ocr

Supported Models

Model Name	Description
`paddleocr-ppocrv5-server`	PaddleOCR PP-OCRv5 Server text detection and recognition model.

Parameters

Parameter	Type	Required	Description
`model`	string	✓	OCR model name. Currently supported: `paddleocr-ppocrv5-server`.
`image_base64`	string	✓	Image base64 data. Raw base64 or `data:image/...;base64,` data-URI format.

Request Example

POST /v1/ocr
Content-Type: application/json

{
  "model": "paddleocr-ppocrv5-server",
  "image_base64": "data:image/png;base64,iVBORw0KGgo..."
}

Response Example

{
  "text": "Full recognized page text",
  "lines": [
    {
      "text": "Single recognized text line",
      "score": 0.98,
      "box": [[12, 20], [180, 20], [180, 42], [12, 42]]
    }
  ],
  "image_width": 640,
  "image_height": 360,
  "elapsed_ms": 1327
}

Image Cache API

Retrieve cached image files.

Method: GET
Endpoint: /v1/images/cache/:filename

Audio Transcriptions API

Convert speech into text.

Method: POST
Endpoint: /v1/audio/transcriptions

Parameters

Parameter	Type	Required	Description
`model`	string	✓	Model name
`audio`	string	JSON required	Audio input as a local path, URL, or `data:audio/...;base64` payload
`file`	file	Multipart required	Audio file uploaded via `multipart/form-data`
`language`	string	✗	Language code

Request Example

POST /v1/audio/transcriptions
Content-Type: multipart/form-data

model=whisper-base&file=@audio.wav&language=zh

POST /v1/audio/transcriptions
Content-Type: application/json

{
  "model": "whisper-base",
  "audio": "data:audio/wav;base64,UklGRi...",
  "language": "auto"
}

Response Example

{
  "text": "This is a speech recognition result",
  "language": "zh",
  "duration": 3.42
}

Streaming Audio Transcriptions API

Perform real-time speech recognition over WebSocket.

Method: GET
Endpoint: /v1/audio/transcriptions/stream?model={model}

Parameters

Parameter	Type	Required	Description
`model`	query string	✓	Name of a model that supports real-time ASR

Request Example

GET /v1/audio/transcriptions/stream?model=sherpa-onnx-streaming-zipformer-zh-14m
Upgrade: websocket

// client -> server: PCM16 / 16k mono binary frames
// server -> client: {"text":"Realtime recognition result","is_final":false}

GET /v1/audio/transcriptions/stream?model=funasr
Upgrade: websocket

// FunASR uses its native WebSocket audio protocol and is intended for real-time Chinese ASR.

Audio Speech API

Convert text into speech.

Method: POST
Endpoint: /v1/audio/speech

Supported Models

Model Name	Description
`qwen3-tts-customvoice`	Preset-speaker mode. Use `voice` or `speaker` to select a voice.
`qwen3-tts-voicedesign`	Voice design mode. Use `voice_description` to describe the voice style.
`qwen3-tts-voiceclone`	Voice clone mode. Use `ref_audio` (and optional `ref_text`) as reference.

Parameters

Parameter	Type	Required	Description
`model`	string	✓	Model name
`input`	string	✓	Input text
`voice`	string	✗	Voice type
`speaker`	string	✗	Speaker ID; falls back to `voice` when omitted
`voice_description`	string	✗	Natural-language voice description for VoiceDesign mode
`ref_audio`	string	✗	Reference audio for VoiceClone mode — path, URL, or base64 payload
`ref_text`	string	✗	Reference audio transcript for VoiceClone mode
`language`	string	✗	Language
`speed`	number	✗	Speech rate
`stream`	boolean	✗	When `true`, returns a `stream_url` instead of a one-shot synthesis result
`frames`	number	✗	Optional maximum audio frame count for Qwen-TTS

Request Example

POST /v1/audio/speech
Content-Type: application/json

{
  "model": "qwen3-tts-customvoice",
  "input": "This is a text-to-speech test",
  "voice": "Cherry",
  "language": "Chinese",
  "speed": 1.0
}

{
  "model": "qwen3-tts-voicedesign",
  "input": "This is a text-to-speech test",
  "voice_description": "Warm, natural, medium-paced voice for Chinese podcast narration",
  "language": "Chinese"
}

{
  "model": "qwen3-tts-voiceclone",
  "input": "This is a text-to-speech test",
  "ref_audio": "data:audio/wav;base64,UklGRi...",
  "ref_text": "Transcript of the reference audio",
  "language": "Chinese"
}

Response Example

{
  "audio_url": "/audio/20260516_abc123.wav",
  "sample_rate": 24000,
  "duration": 2.38
}

Streaming Audio Speech API

Create a streaming speech job, then fetch the audio stream from the returned stream_url.

Method: GET
Endpoint: /v1/audio/speech/stream/:token

Request Example

POST /v1/audio/speech
Content-Type: application/json

{
  "model": "edge-tts",
  "input": "This is a text-to-speech test",
  "voice": "zh-CN-YunxiNeural",
  "stream": true
}

// Response
{
  "stream_url": "/v1/audio/speech/stream/550e8400-e29b-41d4-a716-446655440000"
}

GET /v1/audio/speech/stream/550e8400-e29b-41d4-a716-446655440000

Response Example

// Binary audio stream response
// Content-Type: audio/mpeg | audio/wav | application/octet-stream
// Transfer-Encoding: chunked

Audio Service Info

Get audio capability information for a model.

Method: GET
Endpoint: /v1/audio/info?model={model}

Parameters

Parameter	Type	Required	Description
`model`	query string	✓	Audio model name (e.g., `qwen3-tts-customvoice`, `whisper-base`, `edge-tts`)

Request Example

GET /v1/audio/info?model=qwen3-tts-customvoice

GET /v1/audio/info?model=whisper-base

GET /v1/audio/info?model=funasr

Response Example

{
  "tts_supported_languages": [
    "Chinese",
    "English"
  ],
  "supported_speakers": [
    "Cherry",
    "Ethan"
  ]
}

{
  "asr_supported_languages": [
    "zh",
    "en",
    "ja"
  ]
}

{
  "asr_supported_languages": [
    "zh",
    "zh-CN"
  ]
}

#API Documentation

#OpenAI-Compatible API

#Endpoint

#Base URL

#Endpoint Examples

#List Models

#Request Example

#Response Example

#Chat Completion

#Parameters

#Request Example

#Response Example

#Embedding API

#Parameters

#Request Example

#Response Example

#Rerank API

#Parameters

#Anthropic API

#Anthropic Messages

#Parameters

#Request Example

#AI Model API

#Image Generation API

#Parameters

#Image Edits API

#Parameters

#Image-to-Image API

#Parameters

#OCR API

#Supported Models

#Parameters

#Request Example

#Response Example

#Image Cache API

#Audio Transcriptions API

#Parameters

#Request Example

#Response Example

#Streaming Audio Transcriptions API

#Parameters

#Request Example

#Audio Speech API

#Supported Models

#Parameters

#Request Example

#Response Example

#Streaming Audio Speech API

#Request Example

#Response Example

#Audio Service Info

#Parameters

#Request Example

#Response Example

API Documentation

OpenAI-Compatible API

Endpoint

Base URL

Endpoint Examples

List Models

Request Example

Response Example

Chat Completion

Parameters

Request Example

Response Example

Embedding API

Parameters

Request Example

Response Example

Rerank API

Parameters

Anthropic API

Anthropic Messages

Parameters

Request Example

AI Model API

Image Generation API

Parameters

Image Edits API

Parameters

Image-to-Image API

Parameters

OCR API

Supported Models

Parameters

Request Example

Response Example

Image Cache API

Audio Transcriptions API

Parameters

Request Example

Response Example

Streaming Audio Transcriptions API

Parameters

Request Example

Audio Speech API

Supported Models

Parameters

Request Example

Response Example

Streaming Audio Speech API

Request Example

Response Example

Audio Service Info

Parameters

Request Example

Response Example