# §5 Bootstrapping from Python So far we described the Rust side: there is a host, the host has a `HostAgent`, and we send `CreateOrUpdate` etc. That's the control plane. Most users won't do that by hand — they'll write Python like this: ```python import asyncio from monarch._src.actor.host_mesh import this_host from monarch._src.actor.proc_mesh import ProcMesh # Optional, for typing from monarch._src.actor.actor import Actor from monarch._src.actor.endpoint import endpoint class Counter(Actor): ... def train_with_mesh(): mesh = this_host().spawn_procs(per_host={"gpus": 2}) counter = mesh.spawn("counter", Counter, 1) ... ``` ## Getting a host in Python (`this_host()` → `this_proc()` → `context()`) When you write code like: ```python from monarch._src.actor.host_mesh import this_host host = this_host() ``` there's a bootstrap under it. Here's what actually happens. ### 1. `this_host()` reads the host mesh off the current proc. From monarch/\_src/actor/host_mesh.py: ```python def this_host() -> "HostMesh": """ The current machine. This is just shorthand for looking it up via the context """ return this_proc().host_mesh ``` So: `this_host()` doesn't build a host. That means we have to look at `this_proc()`. ## 2. `this_proc()` pulls the proc mesh off the current context From the same file: ```python def this_proc() -> "ProcMesh": """ The current singleton process that this specific actor is running on """ return context().actor_instance.proc ``` So now we're down to the real root: `context()`. Everything hangs off of that. ### 3. `context()` — create (once) or return (later) the runtime context From monarch/\_src/actor/actor_mesh.py: ```python _context: contextvars.ContextVar[Context] = contextvars.ContextVar( "monarch.actor_mesh._context" ) ``` and: ```python def context() -> Context: c = _context.get(None) if c is None: c = Context._root_client_context() # (1) ask Rust for a bare context _context.set(c) from monarch._src.actor.host_mesh import create_local_host_mesh from monarch._src.actor.proc_mesh import _get_controller_controller c.actor_instance.proc_mesh = _root_proc_mesh.get() # (2) give it a proc mesh _this_host_for_fake_in_process_host.get() # (3) make sure a host exists c.actor_instance._controller_controller = _get_controller_controller()[1] # (4) wire control plane return c ``` So the logic is: 1. First call: no context yet → build one. 2. Later calls: return the same one from the ContextVar. The interesting part is step (1) above — `Context._root_client_context()` — because that's where Python hands off to Rust. ### 4. What `Context._root_client_context()` does (Rust side) The Rust in context.rs: ```rust #[staticmethod] fn _root_client_context(py: Python<'_>) -> PyResult { let _guard = runtime::get_tokio_runtime().enter(); let instance: PyInstance = global_root_client().into(); Ok(PyContext { instance: instance.into_pyobject(py)?.into(), rank: Extent::unity().point_of_rank(0).unwrap(), }) } ``` What matters is the call to `global_root_client()`. That function, on the Rust side, basically does this: ```rust pub fn global_root_client() -> &'static Instance<()> { static GLOBAL_INSTANCE: OnceLock<(Instance<()>, ActorHandle<()>)> = OnceLock::new(); &GLOBAL_INSTANCE.get_or_init(|| { // 1. Make a direct proc for the client to live in. let client_proc = Proc::direct_with_default( ChannelAddr::any(default_transport()), "mesh_root_client_proc".into(), router::global().clone().boxed(), ).unwrap(); // 2. Register that proc in the *global* router so messages can reach it. router::global().bind( client_proc.proc_id().clone().into(), client_proc.clone(), ); // 3. Start an actual actor instance in that proc, called "client". let (client, handle) = client_proc.instance("client").expect("root instance create"); (client, handle) }).0 } ``` So when `_root_client_context()` runs, it is really: 1. Ensuring there is a single, global, direct-addressed proc called "`mesh_root_client_proc`". 2. Putting that proc in the global router. 3. Spawning a "client" actor in it. 4. Wrapping that actor as a Python `PyContext` and giving it rank 0. Notice what it doesn't do: it does not attach a proc mesh or a host mesh. Those Python-only fields are still `None` at this point. ### 5. Python fills in the missing pieces That's why, back in Python, right after calling the Rust function, we do three extra things: ```python c.actor_instance.proc_mesh = _root_proc_mesh.get() _this_host_for_fake_in_process_host.get() c.actor_instance._controller_controller = _get_controller_controller()[1] ``` Here's what each does: 1. `_root_proc_mesh: _Lazy["ProcMesh"] = _Lazy(_init_root_proc_mesh)` Defined as: ```python def _init_root_proc_mesh() -> "ProcMesh": from monarch._src.actor.host_mesh import fake_in_process_host return fake_in_process_host()._spawn_nonblocking( name="root_client_proc_mesh", per_host=Extent([], []), setup=None, _attach_controller_controller=False, ) ``` So this: - makes a fake in-process host, - spawns one proc on it, - that proc mesh is stored as `context().actor_instance.proc_mesh`. Later, when you call `this_proc()` (which reads `context().actor_instance.proc`), you're really just getting a slice of that stored `proc_mesh`. 2. `_this_host_for_fake_in_process_host: _Lazy["HostMesh"] = _Lazy(_init_this_host_for_fake_in_process_host)` Defined as: ```python def _init_this_host_for_fake_in_process_host() -> "HostMesh": from monarch._src.actor.host_mesh import create_local_host_mesh return create_local_host_mesh() ``` This is the lazy "make me a host mesh" step. It just calls `create_local_host_mesh(...)` from the v1 Python bindings. We get into what that does in detail in **"Python `create_local_host_mesh` and Rust bootstrap"** (§ below), so here we just say: this line is what actually spins up the local v1 host mesh using the same Rust path as the canonical bootstrap. 3. `_get_controller_controller()[1]` And we stash the control-plane actor into `c.actor_instance._controller_controller` so later spawns have somewhere to go. We aren't going to unpack that here. 6. Now `this_proc()` / `this_host()` work After that first `context()` run: - `context().actor_instance.proc` is set → so `this_proc()` returns a real `ProcMesh` - after the first `context()` run, the proc mesh you get (`context().actor_instance.proc`) was created from a host mesh, so it already carries a `host_mesh` reference — that’s why `this_host()` can just do `this_proc().host_mesh`. So the original Python snippet: ```python mesh = this_host().spawn_procs(per_host={"gpus": 2}) counter = mesh.spawn("counter", Counter, 1) ``` works because: 1. `this_host()` → got a `HostMesh` that Python created during `context()` bootstrap 2. `spawn_procs(...)` → asks that host mesh (which is powered by the Rust v1 host mesh) to create procs 3. `mesh.spawn(...)` → now that you have a `ProcMesh`, you can put actors on it # Python `create_local_host_mesh` and Rust bootstrap This note shows that calling `create_local_host_mesh(...)` in Python ends up driving the same Rust v1 host/agent/bootstrap path we described for the canonical Rust example. ## 1. Python entry point ```python def create_local_host_mesh( extent: Optional[Extent] = None, env: Optional[Dict[str, str]] = None ) -> "HostMesh": cmd, args, bootstrap_env = _get_bootstrap_args() if env is not None: bootstrap_env.update(env) return HostMesh.allocate_nonblocking( "local_host", extent if extent is not None else Extent([], []), ProcessAllocator(cmd, args, bootstrap_env), bootstrap_cmd=_bootstrap_cmd(), ) ``` - `_get_bootstrap_args()` = "what command/env do we use to start a hyperactor proc?" - we wrap that in a `ProcessAllocator(...)` - we tell the Rust side to `allocate_nonblocking(...)` a v1 HostMesh using that allocator. ## 2. Hand-off to Rust The Python classmethod does: ```python await HyHostMesh.allocate_nonblocking( context().actor_instance._as_rust(), await alloc._hy_alloc, name, bootstrap_cmd, ) ``` It passes the allocation and (optionally) the bootstrap command straight to the Rust v1 `HostMesh::allocate(...)`, via the `PyHostMesh::allocate_nonblocking(...)` binding. That's the same Rust entry point the canonical bootstrap uses — just exposed to Python. ```rust #[pymethods] impl PyHostMesh { #[classmethod] fn allocate_nonblocking( _cls: &Bound<'_, PyType>, instance: &PyInstance, alloc: &mut PyAlloc, name: String, bootstrap_params: Option, ) -> PyResult { let bootstrap_params = bootstrap_params.map_or_else(|| alloc.bootstrap_command.clone(), |b| Some(b.to_rust())); let alloc = match alloc.take() { Some(alloc) => alloc, None => { return Err(PyException::new_err( "Alloc object already used".to_string(), )); } }; let instance = instance.clone(); PyPythonTask::new(async move { let mesh = instance_dispatch!(instance, async move |cx_instance| { HostMesh::allocate(cx_instance, alloc, &name, bootstrap_params).await }) .map_err(|err| PyException::new_err(err.to_string()))?; Ok(Self::new_owned(mesh)) }) } } ``` (This returns a Python task because all v1 Python bindings wrap Rust async in a small bridge. See Appendix: **Python async bridge (pytokio)**.) `HostMesh::allocate(...)` is the entry point that stands up the host, creates its system proc, spawns the `HostAgent`, and makes it reachable — it's the same path we used in the Rust canonical example.