Broadcast Interface
Step 2: Server config - broadcast interface
1) Deploying the broadcast channel backbone service (optional)
When scaling the OPAL Server to multiple workers and/or multiple containers, we use a broadcast channel to sync between all the instances of OPAL Server. In other words, communication on the broadcast channel is communication between OPAL servers, and is not related to the OPAL client.
Under the hood, our interface to the broadcast channel backbone service is implemented by encode/broadcaster.
At the moment, the supported broadcast channel backbones are:
- Postgres LISTEN/NOTIFY
- Redis
- Kafka
Deploying the actual service used for broadcast (e.g., Redis) is outside the scope of this tutorial. The easiest way is to use a managed service (e.g., AWS RDS, AWS ElastiCache, etc.), but you can also deploy your own dockers.
When running in production, you should run with multiple workers per server instance (i.e., container/node), if not multiple containers, and thus deploying the backbone service becomes mandatory for production environments.
2) Declaring the broadcast uri environment variable
Declaring the broadcast uri is optional, depending on whether you deployed a broadcast backbone service and are also running with more than one OPAL server instance (multiple workers or multiple nodes). If you are running with multiple server instances (you should for production), declaring the broadcast uri is mandatory.
| Env Var Name | Function |
|---|---|
| OPAL_BROADCAST_URI |
|
3) Declaring the number of uvicorn workers
As we mentioned in the previous section, each container can run multiple workers, and if you use more than one, you need a broadcast channel.
This is how you define the number of workers (pay attention: this env var is not prefixed with OPAL_):
| Env Var Name | Function |
|---|---|
| UVICORN_NUM_WORKERS | the number of workers in a single container (example value: 4) |
4) Broadcaster reconnection (resilience)
If the broadcast backbone (Postgres/Redis/Kafka) briefly drops — for example during a managed-database failover or restart — OPAL servers reconnect to it automatically with bounded exponential backoff, instead of dropping their connected clients. This is enabled by default; the following server-side env vars (all prefixed with OPAL_) tune it:
| Env Var Name | Default | Function |
|---|---|---|
| OPAL_BROADCAST_RECONNECT_ENABLED | true | Reconnect the broadcaster reader on a backbone disconnect instead of dropping all client connections. Set to false for the legacy behavior. |
| OPAL_BROADCAST_RECONNECT_MAX_RETRIES | 0 | Maximum consecutive reconnect attempts before giving up and letting the worker restart. 0 means retry forever. |
| OPAL_BROADCAST_RECONNECT_BACKOFF_MIN_SECONDS | 0.5 | Minimum backoff (seconds) between reconnect attempts. |
| OPAL_BROADCAST_RECONNECT_BACKOFF_MAX_SECONDS | 30 | Maximum backoff (seconds) between reconnect attempts. |
| OPAL_BROADCAST_REPLAY_BUFFER_SIZE | 10000 | Max number of outbound broadcasts buffered while the backbone is down and replayed on reconnect (0 disables buffering). On overflow the oldest are dropped. |
| OPAL_BROADCAST_RESYNC_ON_RECONNECT | true | After a backbone gap, force this worker's clients to reconnect so they re-fetch full policy + data state. Set to false to rely only on best-effort replay. |
| OPAL_BROADCAST_RESYNC_SETTLE_SECONDS | 2 | Grace period after a reconnect before replaying buffered broadcasts and resyncing clients, to let peer servers re-subscribe. |
Consistency across the outage is handled in two layers. While the backbone is unreachable, client websocket connections are kept alive but cross-server fan-out is paused. On reconnect:
- Replay buffer — broadcasts that failed to reach the backbone during the outage are replayed, so peer servers that have re-subscribed catch up without a client refetch. This is best-effort: the backbone keeps no replay of its own, so a peer that is slow to re-subscribe may miss a replayed message.
- Resync (the guarantee) — each server forces its own clients to reconnect and re-fetch the full policy/data state. Because every server experienced the same gap, every server reconciles its own clients and the fleet converges to current truth. Updates missed during the gap are therefore reconciled even if the replay did not reach a peer in time.