Skip to main content

Broadcast Interface

Step 2: Server config - broadcast interface

1) Deploying the broadcast channel backbone service (optional)

When scaling the OPAL Server to multiple workers and/or multiple containers, we use a broadcast channel to sync between all the instances of OPAL Server. In other words, communication on the broadcast channel is communication between OPAL servers, and is not related to the OPAL client.

Under the hood, our interface to the broadcast channel backbone service is implemented by encode/broadcaster.

At the moment, the supported broadcast channel backbones are:

  • Postgres LISTEN/NOTIFY
  • Redis
  • Kafka

Deploying the actual service used for broadcast (e.g., Redis) is outside the scope of this tutorial. The easiest way is to use a managed service (e.g., AWS RDS, AWS ElastiCache, etc.), but you can also deploy your own dockers.

When running in production, you should run with multiple workers per server instance (i.e., container/node), if not multiple containers, and thus deploying the backbone service becomes mandatory for production environments.

2) Declaring the broadcast uri environment variable

Declaring the broadcast uri is optional, depending on whether you deployed a broadcast backbone service and are also running with more than one OPAL server instance (multiple workers or multiple nodes). If you are running with multiple server instances (you should for production), declaring the broadcast uri is mandatory.

Env Var NameFunction
OPAL_BROADCAST_URI
  • Broadcast channel backend.
  • The format of the broadcaster URI string is specified

    here

    .

  • Example value: OPAL_BROADCAST_URI=postgres://localhost/mydb

3) Declaring the number of uvicorn workers

As we mentioned in the previous section, each container can run multiple workers, and if you use more than one, you need a broadcast channel.

This is how you define the number of workers (pay attention: this env var is not prefixed with OPAL_):

Env Var NameFunction
UVICORN_NUM_WORKERSthe number of workers in a single container (example value: 4)

4) Broadcaster reconnection (resilience)

If the broadcast backbone (Postgres/Redis/Kafka) briefly drops — for example during a managed-database failover or restart — OPAL servers reconnect to it automatically with bounded exponential backoff, instead of dropping their connected clients. This is enabled by default; the following server-side env vars (all prefixed with OPAL_) tune it:

Env Var NameDefaultFunction
OPAL_BROADCAST_RECONNECT_ENABLEDtrueReconnect the broadcaster reader on a backbone disconnect instead of dropping all client connections. Set to false for the legacy behavior.
OPAL_BROADCAST_RECONNECT_MAX_RETRIES0Maximum consecutive reconnect attempts before giving up and letting the worker restart. 0 means retry forever.
OPAL_BROADCAST_RECONNECT_BACKOFF_MIN_SECONDS0.5Minimum backoff (seconds) between reconnect attempts.
OPAL_BROADCAST_RECONNECT_BACKOFF_MAX_SECONDS30Maximum backoff (seconds) between reconnect attempts.
OPAL_BROADCAST_REPLAY_BUFFER_SIZE10000Max number of outbound broadcasts buffered while the backbone is down and replayed on reconnect (0 disables buffering). On overflow the oldest are dropped.
OPAL_BROADCAST_RESYNC_ON_RECONNECTtrueAfter a backbone gap, force this worker's clients to reconnect so they re-fetch full policy + data state. Set to false to rely only on best-effort replay.
OPAL_BROADCAST_RESYNC_SETTLE_SECONDS2Grace period after a reconnect before replaying buffered broadcasts and resyncing clients, to let peer servers re-subscribe.

Consistency across the outage is handled in two layers. While the backbone is unreachable, client websocket connections are kept alive but cross-server fan-out is paused. On reconnect:

  • Replay buffer — broadcasts that failed to reach the backbone during the outage are replayed, so peer servers that have re-subscribed catch up without a client refetch. This is best-effort: the backbone keeps no replay of its own, so a peer that is slow to re-subscribe may miss a replayed message.
  • Resync (the guarantee) — each server forces its own clients to reconnect and re-fetch the full policy/data state. Because every server experienced the same gap, every server reconciles its own clients and the fleet converges to current truth. Updates missed during the gap are therefore reconciled even if the replay did not reach a peer in time.