Logging#

Monarch v1’s logging subsystem streams stdout/stderr from remote procs back to the client and lets Python control log delivery and levels. This section is written top-down: start with the big picture, then dive into each component.

What’s in this section#

Overview — Python kickoff → Rust actors: how ProcMesh boots logging, what LoggingManager does, and what LoggingMeshClient.spawn(...) creates.
Forwarder internals — LogForwardActor, BOOTSTRAP_LOG_CHANNEL, streaming vs. silent mode, and the versioned sync-flush path.
Stream forwarders — StreamFwder, tee, FileAppender, RotatingLineBuffer; how raw bytes become lines sent to forwarders/files.
Client actor — LogClientActor aggregation windows, similarity bucketing, flush barriers, and teardown.
Python control surface — logging_option(...), flush(), IPython cell-end flushers, FD capture.
Config & env — Tunables like HYPERACTOR_READ_LOG_BUFFER, HYPERACTOR_FORCE_FILE_LOG, HYPERACTOR_PREFIX_WITH_RANK, defaults.
Ordering — what is guaranteed (and what isn’t)
Teardown — barrier-before-stop, EOF handling, drop paths
File aggregation — per-proc files on bootstrap hosts

Quick mental model#

Three moving parts: a client-side coordinator (LogClientActor) and two per-proc meshes (LogForwardActor (optional), LoggerRuntimeActor).
Two planes: raw FD streams (stdout/stderr) → forwarders (if enabled); and Python logging (levels/handlers) → logger runtime.
Barriers: versioned sync flush guarantees all logs up to a point have been delivered.
Conditional forwarding: The LogForwardActor mesh is only spawned if MESH_ENABLE_LOG_FORWARDING is true; otherwise logs stay local.

Quickstart (Python)#

pm = host_mesh.spawn_procs(per_host={"gpus": 1})
await pm.logging_option(
    stream_to_client=True,
    aggregate_window_sec=3,
    level=logging.INFO,
)
# …run workload; logs stream back…

await pm.stop()  # does a blocking flush before teardown