§5 Bootstrapping from Python#
So far we described the Rust side: there is a host, the host has a HostAgent, and we send CreateOrUpdate<ProcSpec> etc. That’s the control plane.
Most users won’t do that by hand — they’ll write Python like this:
import asyncio
from monarch._src.actor.host_mesh import this_host
from monarch._src.actor.proc_mesh import ProcMesh # Optional, for typing
from monarch._src.actor.actor import Actor
from monarch._src.actor.endpoint import endpoint
class Counter(Actor):
...
def train_with_mesh():
mesh = this_host().spawn_procs(per_host={"gpus": 2})
counter = mesh.spawn("counter", Counter, 1)
...
Getting a host in Python (this_host() → this_proc() → context())#
When you write code like:
from monarch._src.actor.host_mesh import this_host
host = this_host()
there’s a bootstrap under it. Here’s what actually happens.
1. this_host() reads the host mesh off the current proc.#
From monarch/_src/actor/host_mesh.py:
def this_host() -> "HostMesh":
"""
The current machine.
This is just shorthand for looking it up via the context
"""
return this_proc().host_mesh
So: this_host() doesn’t build a host. That means we have to look at this_proc().
2. this_proc() pulls the proc mesh off the current context#
From the same file:
def this_proc() -> "ProcMesh":
"""
The current singleton process that this specific actor is
running on
"""
return context().actor_instance.proc
So now we’re down to the real root: context(). Everything hangs off of that.
3. context() — create (once) or return (later) the runtime context#
From monarch/_src/actor/actor_mesh.py:
_context: contextvars.ContextVar[Context] = contextvars.ContextVar(
"monarch.actor_mesh._context"
)
and:
def context() -> Context:
c = _context.get(None)
if c is None:
c = Context._root_client_context() # (1) ask Rust for a bare context
_context.set(c)
from monarch._src.actor.host_mesh import create_local_host_mesh
from monarch._src.actor.proc_mesh import _get_controller_controller
c.actor_instance.proc_mesh = _root_proc_mesh.get() # (2) give it a proc mesh
_this_host_for_fake_in_process_host.get() # (3) make sure a host exists
c.actor_instance._controller_controller = _get_controller_controller()[1] # (4) wire control plane
return c
So the logic is:
First call: no context yet → build one.
Later calls: return the same one from the ContextVar.
The interesting part is step (1) above — Context._root_client_context() — because that’s where Python hands off to Rust.
4. What Context._root_client_context() does (Rust side)#
The Rust in context.rs:
#[staticmethod]
fn _root_client_context(py: Python<'_>) -> PyResult<PyContext> {
let _guard = runtime::get_tokio_runtime().enter();
let instance: PyInstance = global_root_client().into();
Ok(PyContext {
instance: instance.into_pyobject(py)?.into(),
rank: Extent::unity().point_of_rank(0).unwrap(),
})
}
What matters is the call to global_root_client(). That function, on the Rust side, basically does this:
pub fn global_root_client() -> &'static Instance<()> {
static GLOBAL_INSTANCE: OnceLock<(Instance<()>, ActorHandle<()>)> = OnceLock::new();
&GLOBAL_INSTANCE.get_or_init(|| {
// 1. Make a direct proc for the client to live in.
let client_proc = Proc::direct_with_default(
ChannelAddr::any(default_transport()),
"mesh_root_client_proc".into(),
router::global().clone().boxed(),
).unwrap();
// 2. Register that proc in the *global* router so messages can reach it.
router::global().bind(
client_proc.proc_id().clone().into(),
client_proc.clone(),
);
// 3. Start an actual actor instance in that proc, called "client".
let (client, handle) = client_proc.instance("client").expect("root instance create");
(client, handle)
}).0
}
So when _root_client_context() runs, it is really:
Ensuring there is a single, global, direct-addressed proc called “
mesh_root_client_proc”.Putting that proc in the global router.
Spawning a “client” actor in it.
Wrapping that actor as a Python
PyContextand giving it rank 0.
Notice what it doesn’t do: it does not attach a proc mesh or a host mesh. Those Python-only fields are still None at this point.
5. Python fills in the missing pieces#
That’s why, back in Python, right after calling the Rust function, we do three extra things:
c.actor_instance.proc_mesh = _root_proc_mesh.get()
_this_host_for_fake_in_process_host.get()
c.actor_instance._controller_controller = _get_controller_controller()[1]
Here’s what each does:
_root_proc_mesh: _Lazy["ProcMesh"] = _Lazy(_init_root_proc_mesh)Defined as:
def _init_root_proc_mesh() -> "ProcMesh":
from monarch._src.actor.host_mesh import fake_in_process_host
return fake_in_process_host()._spawn_nonblocking(
name="root_client_proc_mesh",
per_host=Extent([], []),
setup=None,
_attach_controller_controller=False,
)
So this:
makes a fake in-process host,
spawns one proc on it,
that proc mesh is stored as
context().actor_instance.proc_mesh. Later, when you callthis_proc()(which readscontext().actor_instance.proc), you’re really just getting a slice of that storedproc_mesh.
_this_host_for_fake_in_process_host: _Lazy["HostMesh"] = _Lazy(_init_this_host_for_fake_in_process_host)Defined as:
def _init_this_host_for_fake_in_process_host() -> "HostMesh":
from monarch._src.actor.host_mesh import create_local_host_mesh
return create_local_host_mesh()
This is the lazy “make me a host mesh” step. It just calls create_local_host_mesh(...) from the v1 Python bindings.
We get into what that does in detail in “Python create_local_host_mesh and Rust bootstrap” (§ below), so here we just say:
this line is what actually spins up the local v1 host mesh using the same Rust path as the canonical bootstrap.
_get_controller_controller()[1]And we stash the control-plane actor intoc.actor_instance._controller_controllerso later spawns have somewhere to go. We aren’t going to unpack that here.Now
this_proc()/this_host()work
After that first context() run:
context().actor_instance.procis set → sothis_proc()returns a realProcMeshafter the first
context()run, the proc mesh you get (context().actor_instance.proc) was created from a host mesh, so it already carries ahost_meshreference — that’s whythis_host()can just dothis_proc().host_mesh.
So the original Python snippet:
mesh = this_host().spawn_procs(per_host={"gpus": 2})
counter = mesh.spawn("counter", Counter, 1)
works because:
this_host()→ got aHostMeshthat Python created duringcontext()bootstrapspawn_procs(...)→ asks that host mesh (which is powered by the Rust v1 host mesh) to create procsmesh.spawn(...)→ now that you have aProcMesh, you can put actors on it
Python create_local_host_mesh and Rust bootstrap#
This note shows that calling create_local_host_mesh(...) in Python ends up driving the same Rust v1 host/agent/bootstrap path we described for the canonical Rust example.
1. Python entry point#
def create_local_host_mesh(
extent: Optional[Extent] = None, env: Optional[Dict[str, str]] = None
) -> "HostMesh":
cmd, args, bootstrap_env = _get_bootstrap_args()
if env is not None:
bootstrap_env.update(env)
return HostMesh.allocate_nonblocking(
"local_host",
extent if extent is not None else Extent([], []),
ProcessAllocator(cmd, args, bootstrap_env),
bootstrap_cmd=_bootstrap_cmd(),
)
_get_bootstrap_args()= “what command/env do we use to start a hyperactor proc?”we wrap that in a
ProcessAllocator(...)we tell the Rust side to
allocate_nonblocking(...)a v1 HostMesh using that allocator.
2. Hand-off to Rust#
The Python classmethod does:
await HyHostMesh.allocate_nonblocking(
context().actor_instance._as_rust(),
await alloc._hy_alloc,
name,
bootstrap_cmd,
)
It passes the allocation and (optionally) the bootstrap command straight to the Rust v1 HostMesh::allocate(...), via the PyHostMesh::allocate_nonblocking(...) binding. That’s the same Rust entry point the canonical bootstrap uses — just exposed to Python.
#[pymethods]
impl PyHostMesh {
#[classmethod]
fn allocate_nonblocking(
_cls: &Bound<'_, PyType>,
instance: &PyInstance,
alloc: &mut PyAlloc,
name: String,
bootstrap_params: Option<PyBootstrapCommand>,
) -> PyResult<PyPythonTask> {
let bootstrap_params =
bootstrap_params.map_or_else(|| alloc.bootstrap_command.clone(), |b| Some(b.to_rust()));
let alloc = match alloc.take() {
Some(alloc) => alloc,
None => {
return Err(PyException::new_err(
"Alloc object already used".to_string(),
));
}
};
let instance = instance.clone();
PyPythonTask::new(async move {
let mesh = instance_dispatch!(instance, async move |cx_instance| {
HostMesh::allocate(cx_instance, alloc, &name, bootstrap_params).await
})
.map_err(|err| PyException::new_err(err.to_string()))?;
Ok(Self::new_owned(mesh))
})
}
}
(This returns a Python task because all v1 Python bindings wrap Rust async in a small bridge. See Appendix: Python async bridge (pytokio).)
HostMesh::allocate(...) is the entry point that stands up the host, creates its system proc, spawns the HostAgent, and makes it reachable — it’s the same path we used in the Rust canonical example.