HTTP Server¶
The midigpt[http] extra adds a stateless REST API for generation. Every request carries the full score and generation parameters — the server holds no session state between calls. The only persistent state is the loaded model and device.
Setup¶
Local checkpoint¶
Pretrained from HuggingFace Hub¶
# By short name (downloads once, cached in ~/.cache/huggingface/)
midigpt-http --pretrained yellow --port 8000
# By repo ID + filename
midigpt-http --pretrained Metacreation/MIDI-GPT --hf-filename yellow.pt --port 8000
Device selection¶
midigpt-http --pretrained yellow --device cuda # explicit GPU
midigpt-http --pretrained yellow --device mps # Apple Silicon
midigpt-http --pretrained yellow --device auto # auto-detect (default)
midigpt-http --pretrained yellow --device cpu # force CPU
CLI reference¶
| Flag | Default | Description |
|---|---|---|
--ckpt PATH |
— | Local .pt bundle or checkpoint directory (mutually exclusive with --pretrained) |
--pretrained NAME_OR_REPO |
— | Short name (yellow) or HuggingFace repo ID |
--hf-filename FILE |
— | Filename within the HF repo (required when --pretrained is a full repo ID) |
--device DEVICE |
auto | cpu, cuda, mps, or auto |
--host HOST |
0.0.0.0 |
Bind address |
--port PORT |
8000 |
TCP port |
--log-level |
INFO |
DEBUG, INFO, WARNING, ERROR |
Endpoints¶
Interactive docs are available at http://localhost:8000/docs once the server is running.
GET /health¶
Liveness probe.
GET /info¶
Returns the loaded checkpoint label, model capabilities, and available attribute controls.
{
"checkpoint": "yellow",
"capabilities": {
"note_density": true,
"min_polyphony": true,
"max_polyphony": true,
"min_note_duration": true,
"max_note_duration": true,
"tension": false,
"supports_token_mask": true,
"supports_attention_mask": true,
"supports_attention_approx": true,
"supports_attention_skip": true,
"supports_remove": true
},
"attributes": {
"note_density": 10,
"min_polyphony": 10,
"max_polyphony": 10,
"min_note_duration": 10,
"max_note_duration": 10
}
}
POST /generate¶
Generate or infill music. Pass the full score and generation parameters; receive the result score back.
Request body
score— aScoreserialised withScore.to_dict()(see Inference API)request— aGenerationRequestdict (see below)
Response
{
"score": { ... },
"timing": {
"model_forward_s": 0.42,
"encode_s": 0.01,
"decode_s": 0.01,
"gen_count": 4
}
}
Error codes
| Status | Meaning |
|---|---|
400 |
Malformed score or request dict |
422 |
RequestValidationError — structurally invalid generation request |
500 |
Inference failure |
Client example¶
The server accepts plain JSON — no midigpt dependency needed on the client side.
curl -s -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{
"score": {
"resolution": 480, "tempo": 500000,
"tracks": [{
"instrument": 0, "track_type": "melodic",
"bars": [
{"ts_numerator": 4, "ts_denominator": 4, "notes": []},
{"ts_numerator": 4, "ts_denominator": 4, "notes": []},
{"ts_numerator": 4, "ts_denominator": 4, "notes": []},
{"ts_numerator": 4, "ts_denominator": 4, "notes": []}
]
}]
},
"request": {
"tracks": [{"id": 0, "bars": [0, 1, 2, 3]}],
"config": {"model_dim": 4}
}
}'
The response score field contains the filled-in notes in the same JSON shape.
Concurrency¶
Only one inference call runs at a time (GPU cannot parallelise). Additional /generate requests queue behind the active one; /health and /info remain responsive throughout.