monarch.config ============== .. currentmodule:: monarch.config The ``monarch.config`` module provides utilities for managing Monarch's runtime configuration. Configuration values can be set programmatically via :func:`configure` or :func:`configured`, or through environment variables (``HYPERACTOR_*``, ``MONARCH_*``). Programmatic configuration takes precedence over environment variables and defaults. Configuration API ================= ``monarch.config`` exposes a small, process-wide API. All helpers talk to the same layered configuration store, so changes are immediately visible to every thread in the process. ``configure`` Apply overrides to the Runtime layer. Values are validated eagerly; a ``ValueError`` is raised for unknown keys and ``TypeError`` for wrong types. ``configure`` is additive, so you typically pair it with :func:`clear_runtime_config` in long-running processes. ``configured`` Context manager sugar that snapshots the current Runtime layer, applies overrides, yields the merged config, then restores the snapshot. Because the Runtime layer is global, the overrides apply to every thread until the context exits. This makes ``configured`` ideal for tests or short-lived blocks where you can guarantee single-threaded execution. ``get_global_config`` Return the fully merged configuration (defaults + environment + file + runtime). Useful for introspection or for passing a frozen view to other components. ``get_runtime_config`` Return only the currently active Runtime layer. This is what ``configure`` manipulates and what ``configured`` snapshots. ``clear_runtime_config`` Reset the Runtime layer to an empty mapping. Environment and file values remain untouched. .. autofunction:: configure .. autofunction:: configured .. autofunction:: get_global_config .. autofunction:: get_runtime_config .. autofunction:: clear_runtime_config Configuration Keys ================== The following configuration keys are available for use with :func:`configure` and :func:`configured`: Performance and Transport -------------------------- ``codec_max_frame_length`` Maximum frame length for message codec (in bytes). - **Type**: ``int`` - **Default**: ``10 * 1024 * 1024 * 1024`` (10 GiB) - **Environment**: ``HYPERACTOR_CODEC_MAX_FRAME_LENGTH`` Controls the maximum size of serialized messages. Exceeding this limit will cause supervision errors. .. code-block:: python from monarch.config import configured # Allow larger messages for bulk data transfer oneHundredGiB = 100 * 1024 * 1024 * 1024 with configured(codec_max_frame_length=oneHundredGiB): # Send large chunks result = actor.process_chunks.call_one(large_data).get() ``default_transport`` Default channel transport mechanism for inter-actor communication. - **Type**: ``ChannelTransport`` enum - **Default**: ``ChannelTransport.Unix`` - **Environment**: ``HYPERACTOR_DEFAULT_TRANSPORT`` Available transports: - ``ChannelTransport.Unix`` - Unix domain sockets (local only) - ``ChannelTransport.TcpWithLocalhost`` - TCP over localhost - ``ChannelTransport.TcpWithHostname`` - TCP with hostname resolution - ``ChannelTransport.MetaTlsWithHostname`` - Meta TLS (Meta internal only) .. code-block:: python from monarch._rust_bindings.monarch_hyperactor.channel import ( ChannelTransport, ) from monarch.config import configured with configured(default_transport=ChannelTransport.TcpWithLocalhost): # Actors will communicate via TCP mesh = this_host().spawn_procs(per_host={"workers": 4}) Timeouts -------- ``message_delivery_timeout`` Maximum time to wait for message delivery before timing out. - **Type**: ``str`` (duration format, e.g., ``"30s"``, ``"5m"``) - **Default**: ``"30s"`` - **Environment**: ``HYPERACTOR_MESSAGE_DELIVERY_TIMEOUT`` Uses `humantime `_ format. Examples: ``"30s"``, ``"5m"``, ``"1h 30m"``. .. code-block:: python from monarch.config import configured # Increase timeout for slow operations with configured(message_delivery_timeout="5m"): result = slow_actor.heavy_computation.call_one().get() ``host_spawn_ready_timeout`` Maximum time to wait for spawned hosts to become ready. - **Type**: ``str`` (duration format) - **Default**: ``"30s"`` - **Environment**: ``HYPERACTOR_HOST_SPAWN_READY_TIMEOUT`` .. code-block:: python from monarch.config import configured # Allow more time for remote host allocation with configured(host_spawn_ready_timeout="5m"): hosts = HostMesh.allocate(...) ``mesh_proc_spawn_max_idle`` Maximum idle time between status updates while spawning processes in a mesh. - **Type**: ``str`` (duration format) - **Default**: ``"30s"`` - **Environment**: ``HYPERACTOR_MESH_PROC_SPAWN_MAX_IDLE`` During proc mesh spawning, each process being created sends status updates to the controller. If no update arrives within this timeout, the spawn operation fails. This prevents hung or stuck process creation from waiting indefinitely. ``process_exit_timeout`` Timeout for waiting on process exit during shutdown. - **Type**: ``str`` (duration format) - **Default**: ``"10s"`` - **Environment**: ``HYPERACTOR_PROCESS_EXIT_TIMEOUT`` ``stop_actor_timeout`` Timeout for gracefully stopping actors. - **Type**: ``str`` (duration format) - **Default**: ``"10s"`` - **Environment**: ``HYPERACTOR_STOP_ACTOR_TIMEOUT`` ``cleanup_timeout`` Timeout for cleanup operations during shutdown. - **Type**: ``str`` (duration format) - **Default**: ``"3s"`` - **Environment**: ``HYPERACTOR_CLEANUP_TIMEOUT`` ``actor_spawn_max_idle`` Maximum idle time between updates while spawning actors in a proc mesh. - **Type**: ``str`` (duration format) - **Default**: ``"30s"`` - **Environment**: ``HYPERACTOR_MESH_ACTOR_SPAWN_MAX_IDLE`` ``get_actor_state_max_idle`` Maximum idle time for actor state queries. - **Type**: ``str`` (duration format) - **Default**: ``"1m"`` - **Environment**: ``HYPERACTOR_MESH_GET_ACTOR_STATE_MAX_IDLE`` ``supervision_watchdog_timeout`` Liveness timeout for the actor-mesh supervision stream. - **Type**: ``str`` (duration format) - **Default**: ``"2m"`` - **Environment**: ``HYPERACTOR_MESH_SUPERVISION_WATCHDOG_TIMEOUT`` During actor-mesh supervision, the controller is expected to periodically publish on the subscription stream (including benign updates). If no supervision message is observed within this timeout, the controller is assumed to be unreachable and the mesh transitions to an unhealthy state. This timeout is a watchdog against indefinite silence rather than a message-delivery guarantee, and may conservatively treat a quiet but healthy controller as failed. Increase this value in environments with long startup times or extended periods of inactivity (e.g., opt mode with PAR extraction). ``proc_stop_max_idle`` Maximum idle time between updates while stopping procs. - **Type**: ``str`` (duration format) - **Default**: ``"30s"`` - **Environment**: ``HYPERACTOR_MESH_PROC_STOP_MAX_IDLE`` ``get_proc_state_max_idle`` Maximum idle time for proc state queries. - **Type**: ``str`` (duration format) - **Default**: ``"1m"`` - **Environment**: ``HYPERACTOR_MESH_GET_PROC_STATE_MAX_IDLE`` ``mesh_terminate_timeout`` Timeout per child during graceful mesh termination. - **Type**: ``str`` (duration format) - **Default**: ``"10s"`` - **Environment**: ``HYPERACTOR_MESH_TERMINATE_TIMEOUT`` Logging ------- ``enable_log_forwarding`` Enable forwarding child process stdout/stderr over the mesh log channel. - **Type**: ``bool`` - **Default**: ``False`` - **Environment**: ``HYPERACTOR_MESH_ENABLE_LOG_FORWARDING`` When ``True``, child process output is forwarded to ``LogForwardActor`` for centralized logging. When ``False``, child processes inherit parent stdio. .. code-block:: python from monarch.config import configured with configured(enable_log_forwarding=True): # Child process logs will be forwarded mesh = this_host().spawn_procs(per_host={"workers": 4}) ``enable_file_capture`` Enable capturing child process output to log files on disk. - **Type**: ``bool`` - **Default**: ``False`` - **Environment**: ``HYPERACTOR_MESH_ENABLE_FILE_CAPTURE`` When ``True``, child process output is written to host-scoped log files. Can be combined with ``enable_log_forwarding`` for both streaming and persistent logs. ``tail_log_lines`` Number of recent log lines to retain in memory per process. - **Type**: ``int`` - **Default**: ``0`` - **Environment**: ``HYPERACTOR_MESH_TAIL_LOG_LINES`` Maintains a rotating in-memory buffer of the most recent log lines for debugging. Independent of file capture. .. code-block:: python from monarch.config import configured # Keep last 100 lines for debugging with configured(tail_log_lines=100): mesh = this_host().spawn_procs(per_host={"workers": 4}) ``read_log_buffer`` Buffer size for reading logs (in bytes). - **Type**: ``int`` - **Default**: ``100`` - **Environment**: ``HYPERACTOR_READ_LOG_BUFFER`` ``force_file_log`` Force file-based logging regardless of environment. - **Type**: ``bool`` - **Default**: ``False`` - **Environment**: ``HYPERACTOR_FORCE_FILE_LOG`` ``prefix_with_rank`` Prefix log lines with rank information. - **Type**: ``bool`` - **Default**: ``True`` - **Environment**: ``HYPERACTOR_PREFIX_WITH_RANK`` Message Handling ---------------- ``message_ack_time_interval`` Time interval for message acknowledgments. - **Type**: ``str`` (duration format) - **Default**: ``"500ms"`` - **Environment**: ``HYPERACTOR_MESSAGE_ACK_TIME_INTERVAL`` ``message_ack_every_n_messages`` Acknowledge every N messages. - **Type**: ``int`` - **Default**: ``1000`` - **Environment**: ``HYPERACTOR_MESSAGE_ACK_EVERY_N_MESSAGES`` ``message_ttl_default`` Default message time-to-live (number of hops). - **Type**: ``int`` - **Default**: ``64`` - **Environment**: ``HYPERACTOR_MESSAGE_TTL_DEFAULT`` ``split_max_buffer_size`` Maximum buffer size for message splitting (number of fragments). - **Type**: ``int`` - **Default**: ``5`` - **Environment**: ``HYPERACTOR_SPLIT_MAX_BUFFER_SIZE`` ``split_max_buffer_age`` Maximum age for split message buffers. - **Type**: ``str`` (duration format) - **Default**: ``"50ms"`` - **Environment**: ``HYPERACTOR_SPLIT_MAX_BUFFER_AGE`` ``channel_net_rx_buffer_full_check_interval`` Network receive buffer check interval. - **Type**: ``str`` (duration format) - **Default**: ``"5s"`` - **Environment**: ``HYPERACTOR_CHANNEL_NET_RX_BUFFER_FULL_CHECK_INTERVAL`` ``message_latency_sampling_rate`` Sampling rate for message latency tracking (0.0 to 1.0). - **Type**: ``float`` - **Default**: ``0.01`` - **Environment**: ``HYPERACTOR_MESSAGE_LATENCY_SAMPLING_RATE`` A value of ``0.01`` means 1% of messages are sampled. Use ``1.0`` for 100% sampling (all messages) or ``0.0`` to disable sampling. ``enable_dest_actor_reordering_buffer`` Enable reordering buffer in dest actor. - **Type**: ``bool`` - **Default**: ``False`` - **Environment**: ``HYPERACTOR_ENABLE_DEST_ACTOR_REORDERING_BUFFER`` Message Encoding ---------------- ``default_encoding`` Default message encoding format. - **Type**: ``Encoding`` enum - **Default**: ``Encoding.Multipart`` - **Environment**: ``HYPERACTOR_DEFAULT_ENCODING`` (accepts ``"bincode"``, ``"serde_json"``, or ``"serde_multipart"``) Supported values: - ``Encoding.Bincode`` - Bincode serialization (compact binary format via the ``bincode`` crate) - ``Encoding.Json`` - JSON serialization (via ``serde_json``) - ``Encoding.Multipart`` - Zero-copy multipart encoding that separates large binary fields from the message body, enabling efficient transmission via vectored I/O (default) Example usage:: from monarch.config import Encoding, configure configure(default_encoding=Encoding.Bincode) Mesh Bootstrap -------------- ``mesh_bootstrap_enable_pdeathsig`` Enable parent-death signal for spawned processes. - **Type**: ``bool`` - **Default**: ``True`` - **Environment**: ``HYPERACTOR_MESH_BOOTSTRAP_ENABLE_PDEATHSIG`` When ``True``, child processes receive SIGTERM if their parent dies, preventing orphaned processes. ``mesh_terminate_concurrency`` Maximum concurrent terminations during mesh shutdown. - **Type**: ``int`` - **Default**: ``16`` - **Environment**: ``HYPERACTOR_MESH_TERMINATE_CONCURRENCY`` Runtime and Buffering ---------------------- ``shared_asyncio_runtime`` Share asyncio runtime across actors. - **Type**: ``bool`` - **Default**: ``False`` - **Environment**: ``MONARCH_HYPERACTOR_SHARED_ASYNCIO_RUNTIME`` ``small_write_threshold`` Threshold below which writes are copied (in bytes). - **Type**: ``int`` - **Default**: ``256`` - **Environment**: ``MONARCH_HYPERACTOR_SMALL_WRITE_THRESHOLD`` Writes smaller than this threshold are copied into a contiguous buffer. Writes at or above this size are stored as zero-copy references. Actor Configuration ------------------- ``actor_queue_dispatch`` Enable queue-based dispatch for actor message handling. - **Type**: ``bool`` - **Default**: ``False`` - **Environment**: ``HYPERACTOR_ACTOR_QUEUE_DISPATCH`` When ``True``, actor messages are dispatched through a queue rather than directly. This can improve throughput in high-message-volume scenarios. Mesh Configuration ------------------ ``max_cast_dimension_size`` Maximum dimension size for cast operations. - **Type**: ``int`` - **Default**: ``16`` - **Environment**: ``HYPERACTOR_MESH_MAX_CAST_DIMENSION_SIZE`` Mesh Admin ---------- ``mesh_admin_addr`` Default socket address for the mesh admin HTTP server. - **Type**: ``str`` - **Default**: ``"[::]:1729"`` - **Environment**: ``HYPERACTOR_MESH_ADMIN_ADDR`` Parsed as a ``SocketAddr`` (e.g. ``"[::]:1729"``, ``"0.0.0.0:8080"``). Used as the bind address when no explicit address is provided to ``MeshAdminAgent``, and as the default address assumed by admin clients connecting via ``mast_conda:///``. Mesh Attach ----------- ``mesh_attach_config_timeout`` Timeout for the config-push barrier during ``attach_to_workers()``. - **Type**: ``str`` (duration format) - **Default**: ``"10s"`` - **Environment**: ``HYPERACTOR_MESH_ATTACH_CONFIG_TIMEOUT`` When attaching to pre-existing workers (simple bootstrap), the client pushes its propagatable config to each host agent and waits for confirmation. If the barrier does not complete within this duration, a warning is logged and attach continues without blocking. Remote Allocation ----------------- ``remote_allocator_heartbeat_interval`` Heartbeat interval for remote allocator. - **Type**: ``str`` (duration format) - **Default**: ``"5m"`` - **Environment**: ``HYPERACTOR_REMOTE_ALLOCATOR_HEARTBEAT_INTERVAL`` Validation and Error Handling ----------------------------- ``configure`` and ``configured`` validate input immediately: * Unknown keys raise ``ValueError``. * Type mismatches raise ``TypeError`` (for example, passing a string instead of ``ChannelTransport`` for ``default_transport``, a non-bool to logging flags, or an integer instead of a string for duration parameters). * Invalid values raise ``TypeError`` (for example, invalid encoding names, invalid port ranges, or malformed duration strings). * Duration strings must follow `humantime `_ syntax; invalid strings trigger ``TypeError`` with a message that highlights the bad value. Normalization ~~~~~~~~~~~~~ Duration values are normalized when read from :func:`get_global_config`. For instance, setting ``host_spawn_ready_timeout="300s"`` yields ``"5m"`` when you read it back. This matches the behavior exercised in ``monarch/python/tests/test_config.py`` and helps keep logs and telemetry consistent. Examples ======== Basic Configuration ------------------- .. code-block:: python from monarch.config import configure, get_global_config # Set configuration values configure(enable_log_forwarding=True, tail_log_lines=100) # Read current configuration config = get_global_config() print(config["enable_log_forwarding"]) # True print(config["tail_log_lines"]) # 100 Temporary Configuration (Testing) ---------------------------------- .. code-block:: python from monarch.config import configured def test_with_custom_config(): # Configuration is scoped to this context with configured( enable_log_forwarding=True, message_delivery_timeout="1m" ) as config: # Config is active here assert config["enable_log_forwarding"] is True # Config is automatically restored after the context Nested Overrides ---------------- .. code-block:: python from monarch.config import configured with configured(default_transport=ChannelTransport.TcpWithLocalhost): # Inner config overrides logging knobs only; default_transport # stays put. with configured( enable_log_forwarding=True, tail_log_lines=50, ) as config: assert ( config["default_transport"] == ChannelTransport.TcpWithLocalhost ) assert config["enable_log_forwarding"] # After both contexts exit the process is back to the previous settings. Duration Formats ---------------- .. code-block:: python from monarch.config import configured # Various duration formats are supported with configured( message_delivery_timeout="90s", # 1m 30s host_spawn_ready_timeout="5m", # 5 minutes mesh_proc_spawn_max_idle="1h 30m", # 1 hour 30 minutes ): # Timeouts are active pass Environment Variable Override ------------------------------ Configuration can also be set via environment variables: .. code-block:: bash # Set codec max frame length to 100 GiB export HYPERACTOR_CODEC_MAX_FRAME_LENGTH=107374182400 # Enable log forwarding export HYPERACTOR_MESH_ENABLE_LOG_FORWARDING=true # Set message delivery timeout to 5 minutes export HYPERACTOR_MESSAGE_DELIVERY_TIMEOUT=5m Environment variables are read during initialization and can be overridden programmatically. See Also ======== - :doc:`../generated/examples/getting_started` - Getting started guide - :doc:`monarch.actor` - Actor API documentation