monarch.config#
The monarch.config module provides utilities for managing Monarch’s
runtime configuration.
Configuration values can be set programmatically via configure()
or configured(), or through environment variables
(HYPERACTOR_*, MONARCH_*). Programmatic configuration takes
precedence over environment variables and defaults.
Configuration API#
monarch.config exposes a small, process-wide API. All helpers talk to
the same layered configuration store, so changes are immediately visible to
every thread in the process.
configureApply overrides to the Runtime layer. Values are validated eagerly; a
ValueErroris raised for unknown keys andTypeErrorfor wrong types.configureis additive, so you typically pair it withclear_runtime_config()in long-running processes.configuredContext manager sugar that snapshots the current Runtime layer, applies overrides, yields the merged config, then restores the snapshot. Because the Runtime layer is global, the overrides apply to every thread until the context exits. This makes
configuredideal for tests or short-lived blocks where you can guarantee single-threaded execution.get_global_configReturn the fully merged configuration (defaults + environment + file + runtime). Useful for introspection or for passing a frozen view to other components.
get_runtime_configReturn only the currently active Runtime layer. This is what
configuremanipulates and whatconfiguredsnapshots.clear_runtime_configReset the Runtime layer to an empty mapping. Environment and file values remain untouched.
- monarch.config.configure(*, default_transport=None, enable_log_forwarding=None, enable_file_capture=None, tail_log_lines=None, codec_max_frame_length=None, message_delivery_timeout=None, host_spawn_ready_timeout=None, mesh_proc_spawn_max_idle=None, process_exit_timeout=None, message_ack_time_interval=None, message_ack_every_n_messages=None, message_ttl_default=None, split_max_buffer_size=None, split_max_buffer_age=None, stop_actor_timeout=None, cleanup_timeout=None, remote_allocator_heartbeat_interval=None, default_encoding=None, channel_net_rx_buffer_full_check_interval=None, message_latency_sampling_rate=None, enable_client_seq_assignment=None, mesh_bootstrap_enable_pdeathsig=None, mesh_terminate_concurrency=None, mesh_terminate_timeout=None, shared_asyncio_runtime=None, small_write_threshold=None, max_cast_dimension_size=None, remote_alloc_bind_to_inaddr_any=None, remote_alloc_bootstrap_addr=None, remote_alloc_allowed_port_range=None, read_log_buffer=None, force_file_log=None, prefix_with_rank=None, actor_spawn_max_idle=None, get_actor_state_max_idle=None, supervision_liveness_timeout=None, proc_stop_max_idle=None, get_proc_state_max_idle=None, **kwargs)[source]#
Configure Hyperactor runtime defaults for this process.
This updates the Runtime configuration layer from Python, setting transports, logging behavior, timeouts, and other runtime parameters.
All duration parameters accept humantime strings like
"30s","5m","2h", or"1h 30m".- Parameters:
configuration (Logging) –
- default_transport: Default channel transport for actor communication.
Can be a ChannelTransport enum or explicit address string.
behavior (Basic logging) – enable_log_forwarding: Forward child stdout/stderr through the mesh. enable_file_capture: Persist child stdout/stderr to per-host files. tail_log_lines: Number of log lines to retain in memory.
delivery (Message encoding and) – codec_max_frame_length: Maximum serialized message size in bytes. message_delivery_timeout: Max delivery time (humantime).
timeouts (Host mesh) – host_spawn_ready_timeout: Max host bootstrapping time (humantime). mesh_proc_spawn_max_idle: Max idle time while spawning procs (humantime).
handling (Hyperactor timeouts and message) – process_exit_timeout: Timeout for process exit (humantime). message_ack_time_interval: Time interval for message acknowledgments (humantime). message_ack_every_n_messages: Acknowledge every N messages. message_ttl_default: Default message time-to-live. split_max_buffer_size: Maximum buffer size for message splitting (bytes). split_max_buffer_age: Maximum age for split message buffers (humantime). stop_actor_timeout: Timeout for stopping actors (humantime). cleanup_timeout: Timeout for cleanup operations (humantime). remote_allocator_heartbeat_interval: Heartbeat interval for remote allocator (humantime). default_encoding: Default message encoding (Encoding.Bincode, Encoding.Json, or Encoding.Multipart). channel_net_rx_buffer_full_check_interval: Network receive buffer check interval (humantime). message_latency_sampling_rate: Sampling rate for message latency tracking (0.0 to 1.0). enable_client_seq_assignment: Enable client-side sequence assignment.
configuration – mesh_bootstrap_enable_pdeathsig: Enable parent-death signal for spawned processes. mesh_terminate_concurrency: Maximum concurrent terminations during shutdown. mesh_terminate_timeout: Timeout per child during graceful termination (humantime).
buffering (Runtime and) – shared_asyncio_runtime: Share asyncio runtime across actors. small_write_threshold: Threshold below which writes are copied (bytes).
configuration – max_cast_dimension_size: Maximum dimension size for cast operations.
allocation (Remote) – remote_alloc_bind_to_inaddr_any: Bind remote allocators to INADDR_ANY. remote_alloc_bootstrap_addr: Bootstrap address for remote allocators. remote_alloc_allowed_port_range: Allowed port range as slice(start, stop).
configuration – read_log_buffer: Buffer size for reading logs (bytes). force_file_log: Force file-based logging regardless of environment. prefix_with_rank: Prefix log lines with rank information.
timeouts –
actor_spawn_max_idle: Maximum idle time while spawning actors (humantime). get_actor_state_max_idle: Maximum idle time for actor state queries (humantime). supervision_liveness_timeout: Liveness timeout for the actor-mesh supervision stream; prolonged
silence is interpreted as the controller being unreachable (humantime).
timeouts – proc_stop_max_idle: Maximum idle time while stopping procs (humantime). get_proc_state_max_idle: Maximum idle time for proc state queries (humantime).
**kwargs (object) – Reserved for future configuration keys exposed by Rust bindings.
- monarch.config.configured(**overrides)[source]#
Temporarily apply Python-side config overrides for this process.
- This context manager:
snapshots the current Runtime configuration layer (get_runtime_config()),
applies the given overrides via configure(**overrides), and
yields the merged view of config (get_global_config()), including defaults, env, file, and Runtime.
- On exit it restores the previous Runtime layer by:
clearing all Runtime entries, and
re-applying the saved snapshot.
configured alters the global configuration; thus other threads will be subject to the overriden configuration while the context manager is active.
Thus: this is intended for tests, which run as single threads; per-test overrides do not leak into other tests.
- Parameters:
**overrides – Configuration key-value pairs to override for the duration of the context.
- Yields:
Dict[str, Any] –
- The merged global configuration including all
layers (defaults, environment, file, and runtime).
Example
>>> from monarch.config import configured >>> with configured(enable_log_forwarding=True, tail_log_lines=100): ... # Configuration is temporarily overridden ... assert get_global_config()["enable_log_forwarding"] is True >>> # Configuration is automatically restored after the context
- monarch.config.get_global_config()[source]#
Return a merged view of all configuration layers.
The resulting dict includes defaults, environment overrides, file-based settings, and the current Runtime layer. Mutating the returned dict does not change the active configuration; use
configure()instead.
- monarch.config.get_runtime_config()[source]#
Return a snapshot of just the Runtime layer configuration.
Useful for snapshot/restore flows (see
configured()) or for inspecting which keys were last set via Python.
- monarch.config.clear_runtime_config()[source]#
Remove every key from the Runtime configuration layer.
Environment variables, config files, and defaults are untouched. This is typically paired with
configure()to reset overrides in long-lived processes.
Configuration Keys#
The following configuration keys are available for use with
configure() and configured():
Performance and Transport#
codec_max_frame_lengthMaximum frame length for message codec (in bytes).
Type:
intDefault:
10 * 1024 * 1024 * 1024(10 GiB)Environment:
HYPERACTOR_CODEC_MAX_FRAME_LENGTH
Controls the maximum size of serialized messages. Exceeding this limit will cause supervision errors.
from monarch.config import configured # Allow larger messages for bulk data transfer oneHundredGiB = 100 * 1024 * 1024 * 1024 with configured(codec_max_frame_length=oneHundredGiB): # Send large chunks result = actor.process_chunks.call_one(large_data).get()
default_transportDefault channel transport mechanism for inter-actor communication.
Type:
ChannelTransportenumDefault:
ChannelTransport.UnixEnvironment:
HYPERACTOR_DEFAULT_TRANSPORT
Available transports:
ChannelTransport.Unix- Unix domain sockets (local only)ChannelTransport.TcpWithLocalhost- TCP over localhostChannelTransport.TcpWithHostname- TCP with hostname resolutionChannelTransport.MetaTlsWithHostname- Meta TLS (Meta internal only)
from monarch._rust_bindings.monarch_hyperactor.channel import ( ChannelTransport, ) from monarch.config import configured with configured(default_transport=ChannelTransport.TcpWithLocalhost): # Actors will communicate via TCP mesh = this_host().spawn_procs(per_host={"workers": 4})
Timeouts#
message_delivery_timeoutMaximum time to wait for message delivery before timing out.
Type:
str(duration format, e.g.,"30s","5m")Default:
"30s"Environment:
HYPERACTOR_MESSAGE_DELIVERY_TIMEOUT
Uses humantime format. Examples:
"30s","5m","1h 30m".from monarch.config import configured # Increase timeout for slow operations with configured(message_delivery_timeout="5m"): result = slow_actor.heavy_computation.call_one().get()
host_spawn_ready_timeoutMaximum time to wait for spawned hosts to become ready.
Type:
str(duration format)Default:
"30s"Environment:
HYPERACTOR_HOST_SPAWN_READY_TIMEOUT
from monarch.config import configured # Allow more time for remote host allocation with configured(host_spawn_ready_timeout="5m"): hosts = HostMesh.allocate(...)
mesh_proc_spawn_max_idleMaximum idle time between status updates while spawning processes in a mesh.
Type:
str(duration format)Default:
"30s"Environment:
HYPERACTOR_MESH_PROC_SPAWN_MAX_IDLE
During proc mesh spawning, each process being created sends status updates to the controller. If no update arrives within this timeout, the spawn operation fails. This prevents hung or stuck process creation from waiting indefinitely.
process_exit_timeoutTimeout for waiting on process exit during shutdown.
Type:
str(duration format)Default:
"10s"Environment:
HYPERACTOR_PROCESS_EXIT_TIMEOUT
stop_actor_timeoutTimeout for gracefully stopping actors.
Type:
str(duration format)Default:
"10s"Environment:
HYPERACTOR_STOP_ACTOR_TIMEOUT
cleanup_timeoutTimeout for cleanup operations during shutdown.
Type:
str(duration format)Default:
"3s"Environment:
HYPERACTOR_CLEANUP_TIMEOUT
actor_spawn_max_idleMaximum idle time between updates while spawning actors in a proc mesh.
Type:
str(duration format)Default:
"30s"Environment:
HYPERACTOR_MESH_ACTOR_SPAWN_MAX_IDLE
get_actor_state_max_idleMaximum idle time for actor state queries.
Type:
str(duration format)Default:
"1m"Environment:
HYPERACTOR_MESH_GET_ACTOR_STATE_MAX_IDLE
supervision_liveness_timeoutLiveness timeout for the actor-mesh supervision stream.
Type:
str(duration format)Default:
"30s"Environment:
HYPERACTOR_MESH_SUPERVISION_LIVENESS_TIMEOUT
During actor-mesh supervision, the controller is expected to periodically publish on the subscription stream (including benign updates). If no supervision message is observed within this timeout, the controller is assumed to be unreachable and the mesh transitions to an unhealthy state.
This timeout is a watchdog against indefinite silence rather than a message-delivery guarantee, and may conservatively treat a quiet but healthy controller as failed. Increase this value in environments with long startup times or extended periods of inactivity (e.g., opt mode with PAR extraction).
proc_stop_max_idleMaximum idle time between updates while stopping procs.
Type:
str(duration format)Default:
"30s"Environment:
HYPERACTOR_MESH_PROC_STOP_MAX_IDLE
get_proc_state_max_idleMaximum idle time for proc state queries.
Type:
str(duration format)Default:
"1m"Environment:
HYPERACTOR_MESH_GET_PROC_STATE_MAX_IDLE
mesh_terminate_timeoutTimeout per child during graceful mesh termination.
Type:
str(duration format)Default:
"10s"Environment:
HYPERACTOR_MESH_TERMINATE_TIMEOUT
Logging#
enable_log_forwardingEnable forwarding child process stdout/stderr over the mesh log channel.
Type:
boolDefault:
FalseEnvironment:
HYPERACTOR_MESH_ENABLE_LOG_FORWARDING
When
True, child process output is forwarded toLogForwardActorfor centralized logging. WhenFalse, child processes inherit parent stdio.from monarch.config import configured with configured(enable_log_forwarding=True): # Child process logs will be forwarded mesh = this_host().spawn_procs(per_host={"workers": 4})
enable_file_captureEnable capturing child process output to log files on disk.
Type:
boolDefault:
FalseEnvironment:
HYPERACTOR_MESH_ENABLE_FILE_CAPTURE
When
True, child process output is written to host-scoped log files. Can be combined withenable_log_forwardingfor both streaming and persistent logs.tail_log_linesNumber of recent log lines to retain in memory per process.
Type:
intDefault:
0Environment:
HYPERACTOR_MESH_TAIL_LOG_LINES
Maintains a rotating in-memory buffer of the most recent log lines for debugging. Independent of file capture.
from monarch.config import configured # Keep last 100 lines for debugging with configured(tail_log_lines=100): mesh = this_host().spawn_procs(per_host={"workers": 4})
read_log_bufferBuffer size for reading logs (in bytes).
Type:
intDefault:
100Environment:
HYPERACTOR_READ_LOG_BUFFER
force_file_logForce file-based logging regardless of environment.
Type:
boolDefault:
FalseEnvironment:
HYPERACTOR_FORCE_FILE_LOG
prefix_with_rankPrefix log lines with rank information.
Type:
boolDefault:
TrueEnvironment:
HYPERACTOR_PREFIX_WITH_RANK
Message Handling#
message_ack_time_intervalTime interval for message acknowledgments.
Type:
str(duration format)Default:
"500ms"Environment:
HYPERACTOR_MESSAGE_ACK_TIME_INTERVAL
message_ack_every_n_messagesAcknowledge every N messages.
Type:
intDefault:
1000Environment:
HYPERACTOR_MESSAGE_ACK_EVERY_N_MESSAGES
message_ttl_defaultDefault message time-to-live (number of hops).
Type:
intDefault:
64Environment:
HYPERACTOR_MESSAGE_TTL_DEFAULT
split_max_buffer_sizeMaximum buffer size for message splitting (number of fragments).
Type:
intDefault:
5Environment:
HYPERACTOR_SPLIT_MAX_BUFFER_SIZE
split_max_buffer_ageMaximum age for split message buffers.
Type:
str(duration format)Default:
"50ms"Environment:
HYPERACTOR_SPLIT_MAX_BUFFER_AGE
channel_net_rx_buffer_full_check_intervalNetwork receive buffer check interval.
Type:
str(duration format)Default:
"5s"Environment:
HYPERACTOR_CHANNEL_NET_RX_BUFFER_FULL_CHECK_INTERVAL
message_latency_sampling_rateSampling rate for message latency tracking (0.0 to 1.0).
Type:
floatDefault:
0.01Environment:
HYPERACTOR_MESSAGE_LATENCY_SAMPLING_RATE
A value of
0.01means 1% of messages are sampled. Use1.0for 100% sampling (all messages) or0.0to disable sampling.enable_client_seq_assignmentEnable client-side sequence assignment for messages.
Type:
boolDefault:
FalseEnvironment:
HYPERACTOR_ENABLE_CLIENT_SEQ_ASSIGNMENT
Message Encoding#
default_encodingDefault message encoding format.
Type:
EncodingenumDefault:
Encoding.MultipartEnvironment:
HYPERACTOR_DEFAULT_ENCODING(accepts"bincode","serde_json", or"serde_multipart")
Supported values:
Encoding.Bincode- Bincode serialization (compact binary format via thebincodecrate)Encoding.Json- JSON serialization (viaserde_json)Encoding.Multipart- Zero-copy multipart encoding that separates large binary fields from the message body, enabling efficient transmission via vectored I/O (default)
Example usage:
from monarch.config import Encoding, configure configure(default_encoding=Encoding.Bincode)
Mesh Bootstrap#
mesh_bootstrap_enable_pdeathsigEnable parent-death signal for spawned processes.
Type:
boolDefault:
TrueEnvironment:
HYPERACTOR_MESH_BOOTSTRAP_ENABLE_PDEATHSIG
When
True, child processes receive SIGTERM if their parent dies, preventing orphaned processes.mesh_terminate_concurrencyMaximum concurrent terminations during mesh shutdown.
Type:
intDefault:
16Environment:
HYPERACTOR_MESH_TERMINATE_CONCURRENCY
Runtime and Buffering#
shared_asyncio_runtimeShare asyncio runtime across actors.
Type:
boolDefault:
FalseEnvironment:
MONARCH_HYPERACTOR_SHARED_ASYNCIO_RUNTIME
small_write_thresholdThreshold below which writes are copied (in bytes).
Type:
intDefault:
256Environment:
MONARCH_HYPERACTOR_SMALL_WRITE_THRESHOLD
Writes smaller than this threshold are copied into a contiguous buffer. Writes at or above this size are stored as zero-copy references.
Mesh Configuration#
max_cast_dimension_sizeMaximum dimension size for cast operations.
Type:
intDefault:
usize::MAX(platform-dependent)Environment:
HYPERACTOR_MESH_MAX_CAST_DIMENSION_SIZE
Remote Allocation#
remote_allocator_heartbeat_intervalHeartbeat interval for remote allocator.
Type:
str(duration format)Default:
"5m"Environment:
HYPERACTOR_REMOTE_ALLOCATOR_HEARTBEAT_INTERVAL
Validation and Error Handling#
configure and configured validate input immediately:
Unknown keys raise
ValueError.Type mismatches raise
TypeError(for example, passing a string instead ofChannelTransportfordefault_transport, a non-bool to logging flags, or an integer instead of a string for duration parameters).Invalid values raise
TypeError(for example, invalid encoding names, invalid port ranges, or malformed duration strings).Duration strings must follow humantime syntax; invalid strings trigger
TypeErrorwith a message that highlights the bad value.
Normalization#
Duration values are normalized when read from get_global_config(). For
instance, setting host_spawn_ready_timeout="300s" yields "5m" when you
read it back. This matches the behavior exercised in
monarch/python/tests/test_config.py and helps keep logs and telemetry
consistent.
Examples#
Basic Configuration#
from monarch.config import configure, get_global_config
# Set configuration values
configure(enable_log_forwarding=True, tail_log_lines=100)
# Read current configuration
config = get_global_config()
print(config["enable_log_forwarding"]) # True
print(config["tail_log_lines"]) # 100
Temporary Configuration (Testing)#
from monarch.config import configured
def test_with_custom_config():
# Configuration is scoped to this context
with configured(
enable_log_forwarding=True,
message_delivery_timeout="1m"
) as config:
# Config is active here
assert config["enable_log_forwarding"] is True
# Config is automatically restored after the context
Nested Overrides#
from monarch.config import configured
with configured(default_transport=ChannelTransport.TcpWithLocalhost):
# Inner config overrides logging knobs only; default_transport
# stays put.
with configured(
enable_log_forwarding=True,
tail_log_lines=50,
) as config:
assert (
config["default_transport"]
== ChannelTransport.TcpWithLocalhost
)
assert config["enable_log_forwarding"]
# After both contexts exit the process is back to the previous settings.
Duration Formats#
from monarch.config import configured
# Various duration formats are supported
with configured(
message_delivery_timeout="90s", # 1m 30s
host_spawn_ready_timeout="5m", # 5 minutes
mesh_proc_spawn_max_idle="1h 30m", # 1 hour 30 minutes
):
# Timeouts are active
pass
Environment Variable Override#
Configuration can also be set via environment variables:
# Set codec max frame length to 100 GiB
export HYPERACTOR_CODEC_MAX_FRAME_LENGTH=107374182400
# Enable log forwarding
export HYPERACTOR_MESH_ENABLE_LOG_FORWARDING=true
# Set message delivery timeout to 5 minutes
export HYPERACTOR_MESSAGE_DELIVERY_TIMEOUT=5m
Environment variables are read during initialization and can be overridden programmatically.
See Also#
Getting Started - Getting started guide
monarch.actor - Actor API documentation