Rate this Page

★ ★ ★ ★ ★

monarch.config#

The monarch.config module provides utilities for managing Monarch’s runtime configuration.

Configuration values can be set programmatically via configure() or configured(), or through environment variables (HYPERACTOR_*, MONARCH_*). Programmatic configuration takes precedence over environment variables and defaults.

Configuration API#

monarch.config exposes a small, process-wide API. All helpers talk to the same layered configuration store, so changes are immediately visible to every thread in the process.

configure: Apply overrides to the Runtime layer. Values are validated eagerly; a ValueError is raised for unknown keys and TypeError for wrong types. configure is additive, so you typically pair it with clear_runtime_config() in long-running processes.
configured: Context manager sugar that snapshots the current Runtime layer, applies overrides, yields the merged config, then restores the snapshot. Because the Runtime layer is global, the overrides apply to every thread until the context exits. This makes configured ideal for tests or short-lived blocks where you can guarantee single-threaded execution.
get_global_config: Return the fully merged configuration (defaults + environment + file + runtime). Useful for introspection or for passing a frozen view to other components.
get_runtime_config: Return only the currently active Runtime layer. This is what configure manipulates and what configured snapshots.
clear_runtime_config: Reset the Runtime layer to an empty mapping. Environment and file values remain untouched.

monarch.config.configure(**kwargs)[source]#

Configure Hyperactor runtime defaults for this process.

This updates the Runtime configuration layer from Python, setting transports, logging behavior, timeouts, and other runtime parameters.

All duration parameters accept humantime strings like "30s", "5m", "2h", or "1h 30m".

Parameters:

configuration (Logging) –

default_transport: Default channel transport for actor communication.
Can be a ChannelTransport enum or explicit address string.
behavior (Basic logging) – enable_log_forwarding: Forward child stdout/stderr through the mesh. enable_file_capture: Persist child stdout/stderr to per-host files. tail_log_lines: Number of log lines to retain in memory.
delivery (Message encoding and) – codec_max_frame_length: Maximum serialized message size in bytes. message_delivery_timeout: Max delivery time (humantime).
timeouts (Host mesh) – host_spawn_ready_timeout: Max host bootstrapping time (humantime). mesh_proc_spawn_max_idle: Max idle time while spawning procs (humantime).
handling (Hyperactor timeouts and message) – process_exit_timeout: Timeout for process exit (humantime). message_ack_time_interval: Time interval for message acknowledgments (humantime). message_ack_every_n_messages: Acknowledge every N messages. message_ttl_default: Default message time-to-live. split_max_buffer_size: Maximum buffer size for message splitting (bytes). split_max_buffer_age: Maximum age for split message buffers (humantime). stop_actor_timeout: Timeout for stopping actors (humantime). cleanup_timeout: Timeout for cleanup operations (humantime). remote_allocator_heartbeat_interval: Heartbeat interval for remote allocator (humantime). default_encoding: Default message encoding (Encoding.Bincode, Encoding.Json, or Encoding.Multipart). channel_net_rx_buffer_full_check_interval: Network receive buffer check interval (humantime). message_latency_sampling_rate: Sampling rate for message latency tracking (0.0 to 1.0). enable_dest_actor_reordering_buffer: Enable reordering buffer in dest actor.
configuration – mesh_bootstrap_enable_pdeathsig: Enable parent-death signal for spawned processes. mesh_terminate_concurrency: Maximum concurrent terminations during shutdown. mesh_terminate_timeout: Timeout per child during graceful termination (humantime).
buffering (Runtime and) – shared_asyncio_runtime: Share asyncio runtime across actors. small_write_threshold: Threshold below which writes are copied (bytes).
configuration – max_cast_dimension_size: Maximum dimension size for cast operations.
allocation (Remote) – remote_alloc_bind_to_inaddr_any: Bind remote allocators to INADDR_ANY. remote_alloc_bootstrap_addr: Bootstrap address for remote allocators. remote_alloc_allowed_port_range: Allowed port range as slice(start, stop).
configuration – read_log_buffer: Buffer size for reading logs (bytes). force_file_log: Force file-based logging regardless of environment. prefix_with_rank: Prefix log lines with rank information.
timeouts –
actor_spawn_max_idle: Maximum idle time while spawning actors (humantime). get_actor_state_max_idle: Maximum idle time for actor state queries (humantime). supervision_watchdog_timeout: Watchdog timeout for the actor-mesh supervision stream; prolonged

silence is interpreted as the controller being unreachable (humantime).
timeouts – proc_stop_max_idle: Maximum idle time while stopping procs (humantime). get_proc_state_max_idle: Maximum idle time for proc state queries (humantime).
admin (Mesh) –

mesh_admin_addr: Default socket address for the mesh admin HTTP server
(e.g. "[::]:1729", "0.0.0.0:8080").
attach (Mesh) –

mesh_attach_config_timeout: Timeout for the config-push barrier
during attach_to_workers() (humantime, default "10s"). Best-effort: if exceeded, a warning is logged and attach continues.
**kwargs (ConfigureKwargsType) – Reserved for future configuration keys exposed by Rust bindings.

monarch.config.configured(**overrides)[source]#

Temporarily apply Python-side config overrides for this process.

This context manager:

snapshots the current Runtime configuration layer (get_runtime_config()),
applies the given overrides via configure(**overrides), and
yields the merged view of config (get_global_config()), including defaults, env, file, and Runtime.

On exit it restores the previous Runtime layer by:

clearing all Runtime entries, and
re-applying the saved snapshot.

configured alters the global configuration; thus other threads will be subject to the overridden configuration while the context manager is active.

Thus: this is intended for tests, which run as single threads; per-test overrides do not leak into other tests.

Parameters:

**overrides (ConfigureKwargsType) – Configuration key-value pairs to override for the duration of the context.

Yields:

Dict[str, Any] –

The merged global configuration including all: layers (defaults, environment, file, and runtime).

Example

>>> from monarch.config import configured
>>> with configured(enable_log_forwarding=True, tail_log_lines=100):
...     # Configuration is temporarily overridden
...     assert get_global_config()["enable_log_forwarding"] is True
>>> # Configuration is automatically restored after the context

monarch.config.get_global_config()[source]#

Return a merged view of all configuration layers.

The resulting dict includes defaults, environment overrides, file-based settings, and the current Runtime layer. Mutating the returned dict does not change the active configuration; use configure() instead.

monarch.config.get_runtime_config()[source]#

Return a snapshot of just the Runtime layer configuration.

Useful for snapshot/restore flows (see configured()) or for inspecting which keys were last set via Python.

monarch.config.clear_runtime_config()[source]#

Remove every key from the Runtime configuration layer.

Environment variables, config files, and defaults are untouched. This is typically paired with configure() to reset overrides in long-lived processes.

Configuration Keys#

The following configuration keys are available for use with configure() and configured():

Performance and Transport#

codec_max_frame_length

Maximum frame length for message codec (in bytes).

Type: int
Default: 10 * 1024 * 1024 * 1024 (10 GiB)
Environment: HYPERACTOR_CODEC_MAX_FRAME_LENGTH

Controls the maximum size of serialized messages. Exceeding this limit will cause supervision errors.

from monarch.config import configured

# Allow larger messages for bulk data transfer
oneHundredGiB = 100 * 1024 * 1024 * 1024
with configured(codec_max_frame_length=oneHundredGiB):
    # Send large chunks
    result = actor.process_chunks.call_one(large_data).get()

default_transport

Default channel transport mechanism for inter-actor communication.

Type: ChannelTransport enum
Default: ChannelTransport.Unix
Environment: HYPERACTOR_DEFAULT_TRANSPORT

Available transports:

ChannelTransport.Unix - Unix domain sockets (local only)
ChannelTransport.TcpWithLocalhost - TCP over localhost
ChannelTransport.TcpWithHostname - TCP with hostname resolution
ChannelTransport.MetaTlsWithHostname - Meta TLS (Meta internal only)

from monarch._rust_bindings.monarch_hyperactor.channel import (
    ChannelTransport,
)
from monarch.config import configured

with configured(default_transport=ChannelTransport.TcpWithLocalhost):
    # Actors will communicate via TCP
    mesh = this_host().spawn_procs(per_host={"workers": 4})

Timeouts#

message_delivery_timeout

Maximum time to wait for message delivery before timing out.

Type: str (duration format, e.g., "30s", "5m")
Default: "30s"
Environment: HYPERACTOR_MESSAGE_DELIVERY_TIMEOUT

Uses humantime format. Examples: "30s", "5m", "1h 30m".

from monarch.config import configured

# Increase timeout for slow operations
with configured(message_delivery_timeout="5m"):
    result = slow_actor.heavy_computation.call_one().get()

host_spawn_ready_timeout

Maximum time to wait for spawned hosts to become ready.

Type: str (duration format)
Default: "30s"
Environment: HYPERACTOR_HOST_SPAWN_READY_TIMEOUT

from monarch.config import configured

# Allow more time for remote host allocation
with configured(host_spawn_ready_timeout="5m"):
    hosts = HostMesh.allocate(...)

mesh_proc_spawn_max_idle

Maximum idle time between status updates while spawning processes in a mesh.

Type: str (duration format)
Default: "30s"
Environment: HYPERACTOR_MESH_PROC_SPAWN_MAX_IDLE

During proc mesh spawning, each process being created sends status updates to the controller. If no update arrives within this timeout, the spawn operation fails. This prevents hung or stuck process creation from waiting indefinitely.

process_exit_timeout

Timeout for waiting on process exit during shutdown.

Type: str (duration format)
Default: "10s"
Environment: HYPERACTOR_PROCESS_EXIT_TIMEOUT

stop_actor_timeout

Timeout for gracefully stopping actors.

Type: str (duration format)
Default: "10s"
Environment: HYPERACTOR_STOP_ACTOR_TIMEOUT

cleanup_timeout

Timeout for cleanup operations during shutdown.

Type: str (duration format)
Default: "3s"
Environment: HYPERACTOR_CLEANUP_TIMEOUT

actor_spawn_max_idle

Maximum idle time between updates while spawning actors in a proc mesh.

Type: str (duration format)
Default: "30s"
Environment: HYPERACTOR_MESH_ACTOR_SPAWN_MAX_IDLE

get_actor_state_max_idle

Maximum idle time for actor state queries.

Type: str (duration format)
Default: "1m"
Environment: HYPERACTOR_MESH_GET_ACTOR_STATE_MAX_IDLE

supervision_watchdog_timeout

Liveness timeout for the actor-mesh supervision stream.

Type: str (duration format)
Default: "2m"
Environment: HYPERACTOR_MESH_SUPERVISION_WATCHDOG_TIMEOUT

During actor-mesh supervision, the controller is expected to periodically publish on the subscription stream (including benign updates). If no supervision message is observed within this timeout, the controller is assumed to be unreachable and the mesh transitions to an unhealthy state.

This timeout is a watchdog against indefinite silence rather than a message-delivery guarantee, and may conservatively treat a quiet but healthy controller as failed. Increase this value in environments with long startup times or extended periods of inactivity (e.g., opt mode with PAR extraction).

proc_stop_max_idle

Maximum idle time between updates while stopping procs.

Type: str (duration format)
Default: "30s"
Environment: HYPERACTOR_MESH_PROC_STOP_MAX_IDLE

get_proc_state_max_idle

Maximum idle time for proc state queries.

Type: str (duration format)
Default: "1m"
Environment: HYPERACTOR_MESH_GET_PROC_STATE_MAX_IDLE

mesh_terminate_timeout

Timeout per child during graceful mesh termination.

Type: str (duration format)
Default: "10s"
Environment: HYPERACTOR_MESH_TERMINATE_TIMEOUT

Logging#

enable_log_forwarding

Enable forwarding child process stdout/stderr over the mesh log channel.

Type: bool
Default: False
Environment: HYPERACTOR_MESH_ENABLE_LOG_FORWARDING

When True, child process output is forwarded to LogForwardActor for centralized logging. When False, child processes inherit parent stdio.

from monarch.config import configured

with configured(enable_log_forwarding=True):
    # Child process logs will be forwarded
    mesh = this_host().spawn_procs(per_host={"workers": 4})

enable_file_capture

Enable capturing child process output to log files on disk.

Type: bool
Default: False
Environment: HYPERACTOR_MESH_ENABLE_FILE_CAPTURE

When True, child process output is written to host-scoped log files. Can be combined with enable_log_forwarding for both streaming and persistent logs.

tail_log_lines

Number of recent log lines to retain in memory per process.

Type: int
Default: 0
Environment: HYPERACTOR_MESH_TAIL_LOG_LINES

Maintains a rotating in-memory buffer of the most recent log lines for debugging. Independent of file capture.

from monarch.config import configured

# Keep last 100 lines for debugging
with configured(tail_log_lines=100):
    mesh = this_host().spawn_procs(per_host={"workers": 4})

read_log_buffer

Buffer size for reading logs (in bytes).

Type: int
Default: 100
Environment: HYPERACTOR_READ_LOG_BUFFER

force_file_log

Force file-based logging regardless of environment.

Type: bool
Default: False
Environment: HYPERACTOR_FORCE_FILE_LOG

prefix_with_rank

Prefix log lines with rank information.

Type: bool
Default: True
Environment: HYPERACTOR_PREFIX_WITH_RANK

Message Handling#

message_ack_time_interval

Time interval for message acknowledgments.

Type: str (duration format)
Default: "500ms"
Environment: HYPERACTOR_MESSAGE_ACK_TIME_INTERVAL

message_ack_every_n_messages

Acknowledge every N messages.

Type: int
Default: 1000
Environment: HYPERACTOR_MESSAGE_ACK_EVERY_N_MESSAGES

message_ttl_default

Default message time-to-live (number of hops).

Type: int
Default: 64
Environment: HYPERACTOR_MESSAGE_TTL_DEFAULT

split_max_buffer_size

Maximum buffer size for message splitting (number of fragments).

Type: int
Default: 5
Environment: HYPERACTOR_SPLIT_MAX_BUFFER_SIZE

split_max_buffer_age

Maximum age for split message buffers.

Type: str (duration format)
Default: "50ms"
Environment: HYPERACTOR_SPLIT_MAX_BUFFER_AGE

channel_net_rx_buffer_full_check_interval

Network receive buffer check interval.

Type: str (duration format)
Default: "5s"
Environment: HYPERACTOR_CHANNEL_NET_RX_BUFFER_FULL_CHECK_INTERVAL

message_latency_sampling_rate

Sampling rate for message latency tracking (0.0 to 1.0).

Type: float
Default: 0.01
Environment: HYPERACTOR_MESSAGE_LATENCY_SAMPLING_RATE

A value of 0.01 means 1% of messages are sampled. Use 1.0 for 100% sampling (all messages) or 0.0 to disable sampling.

enable_dest_actor_reordering_buffer

Enable reordering buffer in dest actor.

Type: bool
Default: False
Environment: HYPERACTOR_ENABLE_DEST_ACTOR_REORDERING_BUFFER

Message Encoding#

default_encoding

Default message encoding format.

Type: Encoding enum
Default: Encoding.Multipart
Environment: HYPERACTOR_DEFAULT_ENCODING (accepts "bincode", "serde_json", or "serde_multipart")

Supported values:

Encoding.Bincode - Bincode serialization (compact binary format via the bincode crate)
Encoding.Json - JSON serialization (via serde_json)
Encoding.Multipart - Zero-copy multipart encoding that separates large binary fields from the message body, enabling efficient transmission via vectored I/O (default)

Example usage:

from monarch.config import Encoding, configure
configure(default_encoding=Encoding.Bincode)

Mesh Bootstrap#

mesh_bootstrap_enable_pdeathsig

Enable parent-death signal for spawned processes.

Type: bool
Default: True
Environment: HYPERACTOR_MESH_BOOTSTRAP_ENABLE_PDEATHSIG

When True, child processes receive SIGTERM if their parent dies, preventing orphaned processes.

mesh_terminate_concurrency

Maximum concurrent terminations during mesh shutdown.

Type: int
Default: 16
Environment: HYPERACTOR_MESH_TERMINATE_CONCURRENCY

Runtime and Buffering#

shared_asyncio_runtime

Share asyncio runtime across actors.

Type: bool
Default: False
Environment: MONARCH_HYPERACTOR_SHARED_ASYNCIO_RUNTIME

small_write_threshold

Threshold below which writes are copied (in bytes).

Type: int
Default: 256
Environment: MONARCH_HYPERACTOR_SMALL_WRITE_THRESHOLD

Writes smaller than this threshold are copied into a contiguous buffer. Writes at or above this size are stored as zero-copy references.

Actor Configuration#

actor_queue_dispatch

Enable queue-based dispatch for actor message handling.

Type: bool
Default: False
Environment: HYPERACTOR_ACTOR_QUEUE_DISPATCH

When True, actor messages are dispatched through a queue rather than directly. This can improve throughput in high-message-volume scenarios.

Mesh Configuration#

max_cast_dimension_size

Maximum dimension size for cast operations.

Type: int
Default: usize::MAX (platform-dependent)
Environment: HYPERACTOR_MESH_MAX_CAST_DIMENSION_SIZE

Mesh Admin#

mesh_admin_addr

Default socket address for the mesh admin HTTP server.

Type: str
Default: "[::]:1729"
Environment: HYPERACTOR_MESH_ADMIN_ADDR

Parsed as a SocketAddr (e.g. "[::]:1729", "0.0.0.0:8080"). Used as the bind address when no explicit address is provided to MeshAdminAgent, and as the default address assumed by admin clients connecting via mast_conda:///.

Mesh Attach#

mesh_attach_config_timeout

Timeout for the config-push barrier during attach_to_workers().

Type: str (duration format)
Default: "10s"
Environment: HYPERACTOR_MESH_ATTACH_CONFIG_TIMEOUT

When attaching to pre-existing workers (simple bootstrap), the client pushes its propagatable config to each host agent and waits for confirmation. If the barrier does not complete within this duration, a warning is logged and attach continues without blocking.

Remote Allocation#

remote_allocator_heartbeat_interval

Heartbeat interval for remote allocator.

Type: str (duration format)
Default: "5m"
Environment: HYPERACTOR_REMOTE_ALLOCATOR_HEARTBEAT_INTERVAL

Validation and Error Handling#

configure and configured validate input immediately:

Unknown keys raise ValueError.
Type mismatches raise TypeError (for example, passing a string instead of ChannelTransport for default_transport, a non-bool to logging flags, or an integer instead of a string for duration parameters).
Invalid values raise TypeError (for example, invalid encoding names, invalid port ranges, or malformed duration strings).
Duration strings must follow humantime syntax; invalid strings trigger TypeError with a message that highlights the bad value.

Normalization#

Duration values are normalized when read from get_global_config(). For instance, setting host_spawn_ready_timeout="300s" yields "5m" when you read it back. This matches the behavior exercised in monarch/python/tests/test_config.py and helps keep logs and telemetry consistent.

Examples#

Basic Configuration#

from monarch.config import configure, get_global_config

# Set configuration values
configure(enable_log_forwarding=True, tail_log_lines=100)

# Read current configuration
config = get_global_config()
print(config["enable_log_forwarding"])  # True
print(config["tail_log_lines"])  # 100

Temporary Configuration (Testing)#

from monarch.config import configured

def test_with_custom_config():
    # Configuration is scoped to this context
    with configured(
        enable_log_forwarding=True,
        message_delivery_timeout="1m"
    ) as config:
        # Config is active here
        assert config["enable_log_forwarding"] is True

    # Config is automatically restored after the context

Nested Overrides#

from monarch.config import configured

with configured(default_transport=ChannelTransport.TcpWithLocalhost):
    # Inner config overrides logging knobs only; default_transport
    # stays put.
    with configured(
        enable_log_forwarding=True,
        tail_log_lines=50,
    ) as config:
        assert (
            config["default_transport"]
            == ChannelTransport.TcpWithLocalhost
        )
        assert config["enable_log_forwarding"]

# After both contexts exit the process is back to the previous settings.

Duration Formats#

from monarch.config import configured

# Various duration formats are supported
with configured(
    message_delivery_timeout="90s",        # 1m 30s
    host_spawn_ready_timeout="5m",         # 5 minutes
    mesh_proc_spawn_max_idle="1h 30m",     # 1 hour 30 minutes
):
    # Timeouts are active
    pass

Environment Variable Override#

Configuration can also be set via environment variables:

# Set codec max frame length to 100 GiB
export HYPERACTOR_CODEC_MAX_FRAME_LENGTH=107374182400

# Enable log forwarding
export HYPERACTOR_MESH_ENABLE_LOG_FORWARDING=true

# Set message delivery timeout to 5 minutes
export HYPERACTOR_MESSAGE_DELIVERY_TIMEOUT=5m

Environment variables are read during initialization and can be overridden programmatically.