Developer protocol

Receive room audio with plain HTTP.

Hummic records meeting audio locally and delivers it to a user-owned endpoint. File mode uploads complete recordings over HTTP multipart; Agent mode streams live audio over WebSocket. The resumable upload path is designed around tus.

Current firmware

File mode: one recording, one upload.

Implement one endpoint that accepts multipart/form-data. If the response status is 2xx, the device marks the recording as uploaded. Authentication can be a bearer token configured in the setup portal.

Method
POST
Body
multipart/form-data
Audio
Ogg Opus, 16 kHz, 2 channels
Auth
Authorization: Bearer <token>
Retry
2xx succeeds, 401/403 fails, 408/429/5xx retries

Status

What ships today, what is planned.

File upload
Shipping  Current firmware contract (P0). The Quickstart below runs against it.
Resumable (tus)
Planned  P1 direction for long recordings and unstable WiFi.
Agent mode (WS)
Preview  Designed, not in the current firmware.

File mode contract

Minimum compatible receiver.

POST /upload
Authorization: Bearer hummic-dev-token
X-Hummic-Device-Id: meeting-room-01
X-Hummic-Recording-Id: R00000001
Content-Type: multipart/form-data

[email protected]
device_id=meeting-room-01
recording_id=R00000001
started_at=boot+12s
ended_at=boot+42s
format=opus
sample_rate=16000
channels=2
segment_index=0
segment_count=1
FieldType / sampleDescription
file binary · R00000001.ogg Required Ogg Opus file payload (16 kHz, 2ch).
device_id string · meeting-room-01 User-configured device name. Also sent as X-Hummic-Device-Id.
recording_id string · R00000001 Unique local recording id. Also sent as X-Hummic-Recording-Id.
started_at string · boot+12s Recording start marker from device metadata.
ended_at string · boot+42s Recording end marker from device metadata.
format string · opus Audio container. Currently opus for new firmware.
sample_rate int · 16000 Samples per second. Currently 16000.
channels int · 2 Channel count. Currently 2.
segment_index int · 0 Zero-based index within a segmented recording. Currently 0.
segment_count int · 1 Total segments for the recording. Currently 1.

Agent mode contract

Stream live audio over WebSocket.

Preview  Agent mode is designed but is not in the current firmware — File mode above is what ships today, and the frames below are the target contract. Agent mode opens a single WebSocket and exchanges JSON control frames plus binary audio frames within one session. The device sends audio and button events; the agent runtime returns short text replies shown on the device screen. Audio frames are raw binary; all other messages are JSON with a type discriminator.

GET /agent/live
Upgrade: websocket
Authorization: Bearer hummic-dev-token
X-Hummic-Device-Id: meeting-room-01

# 1. device opens session
client -> {"type":"session_start","device_id":"meeting-room-01",
           "format":"pcm_s16le","sample_rate":16000,"channels":1}
agent  -> {"type":"session_state","state":"ready","session_id":"s-8f1c"}

# 2. device streams audio (binary frames) + control
client => <binary audio_chunk: 20ms PCM>
client -> {"type":"button_event","action":"press"}

# 3. agent replies (shown on screen)
agent  -> {"type":"text_reply","text":"Logged the action item.","final":true}
agent  -> {"type":"display_hint","line1":"Listening","line2":"Room 01"}

# 4. close
client -> {"type":"session_end"}
agent  -> {"type":"session_state","state":"closed"}
MessageDirectionDescription
session_start client → agent Opens a session. Declares format, sample_rate, channels.
audio_chunk client → agent Binary frame, ~20 ms of PCM. Sent continuously while recording.
button_event client → agent Physical button action: press / long_press / release.
status client → agent Periodic device heartbeat: battery, WiFi RSSI, buffer state.
text_reply agent → client Short reply rendered on screen. final:false for streamed partials.
session_state agent → client Lifecycle: ready / busy / closed (+ session_id).
display_hint agent → client Up to two short lines to show on the device screen.
session_end client → agent Closes the session cleanly; agent confirms with session_state: closed.

Quickstart

Receive your first recording in five minutes.

# 1. get the zero-dependency reference receiver (stdlib only, Python 3.8+)
curl -O https://hummic.ai/examples/upload-server/server.py

# 2. run it (pick any token — this becomes the device's bearer token)
python3 server.py --host 0.0.0.0 --port 8789 --token hummic-dev-token

# 3. confirm the endpoint is alive
curl -fsS http://127.0.0.1:8789/health
# -> {"ok":true}

Now prove the upload contract end to end with a throwaway file — the device sends exactly this request:

# make a test file (any bytes work — the receiver does not validate audio)
head -c 65536 /dev/urandom > R00000001.ogg
# or a real Opus file: ffmpeg -f lavfi -i sine=frequency=440:duration=5 -c:a libopus R00000001.ogg

curl --fail --max-time 30 --retry 3 --retry-delay 2 \
  -H "Authorization: Bearer hummic-dev-token" \
  -H "X-Hummic-Device-Id: meeting-room-01" \
  -H "X-Hummic-Recording-Id: R00000001" \
  -F "[email protected]" \
  -F "device_id=meeting-room-01" \
  -F "recording_id=R00000001" \
  -F "format=opus" -F "sample_rate=16000" -F "channels=2" \
  http://127.0.0.1:8789/upload
# -> {"ok":true,"upload_id":"R00000001"}

On the device, set Upload URL to http://<computer-ip>:8789/upload, Authorization Token to hummic-dev-token, and Device Name to something stable like meeting-room-01. The --max-time and --retry flags mirror the device's own timeout and backoff.

Prefer to generate from the contract? openapi.yaml imports into Postman, or scaffold a client: npx @openapitools/openapi-generator-cli generate -i https://hummic.ai/openapi.yaml -g python

Errors & retry

Status codes, response bodies, backoff.

The device classifies the HTTP status into three outcomes: success, permanent failure, and retryable. Return a JSON body so failures are diagnosable, but the device only requires the status code.

StatusOutcomeDevice behavior
200 / 201 / 204SuccessRecording marked uploaded; removed from queue.
400 / 422Permanent failureMarked failed; not retried. Fix payload/endpoint.
401 / 403Permanent failureMarked failed; not retried. Check bearer token.
408 / 429RetryableBackoff and retry; honors Retry-After if present.
500 / 502 / 503 / 504RetryableBackoff and retry until queue limit reached.
timeout / no routeRetryableCache locally; retry when WiFi recovers.
# success response body (optional, ignored except status)
HTTP/1.1 200 OK
Content-Type: application/json

{"ok": true, "recording_id": "R00000001", "stored_as": "s3://bucket/R00000001.ogg"}

# failure response body (recommended for diagnosis)
HTTP/1.1 422 Unprocessable Entity
Content-Type: application/json

{"ok": false, "error": "missing_field", "field": "recording_id"}

Backoff: retryable failures use exponential backoff starting at 2s, doubling to a 60s cap, with jitter. A Retry-After header (seconds or HTTP-date) overrides the computed delay. Recordings stay cached on device until a 2xx or a permanent failure clears them.

Standards direction

Use the boring standard first. Upgrade when uploads need it.

P0 file upload

multipart/form-data is the current firmware contract and is specified by RFC 7578. It is the simplest way to accept a file plus metadata in every common web framework.

P1 resumable upload

tus is the preferred open protocol for resumable uploads over HTTP. It is the natural next step for long recordings, unstable WiFi, and large files.

Object storage

S3-compatible multipart upload or presigned URLs are useful when recordings should go directly into object storage instead of through an app server.

Live agents

WebSocket carries the current Agent mode; WebRTC is reserved for lower-latency live audio. Neither is needed for the first reliable file-upload path.