Rate this Page

★ ★ ★ ★ ★

Proc meshes & ProcAgent#

What the `ProcAgent` Is#

Every proc in a mesh runs a ProcAgent. It plays the same role on the proc side that the HostAgent plays on the host side: it implements the control-plane interface for “managing this proc as part of a mesh”.

The agent has several responsibilities, all of which will be documented on this page:

wiring the proc into the mesh router,
handling resource-style requests (CreateOrUpdate, Stop, GetState, GetRankStatus, …),
forwarding or recording supervision events,
tracking the lifecycle of actors created on the proc,
and supporting both the legacy v0 and the current v1 spawn APIs.

This chapter begins with the v1 “resource-style” spawn path:: how a request of the form

CreateOrUpdate<ActorSpec>

results in an actual actor being constructed inside the proc using the Remote registry.

To anchor that discussion, here is the essential shape of the agent:

pub struct ProcAgent {
    proc: Proc, // local actor runtime
    remote: Remote,  // registry of SpawnableActor entries (built from RemoteSpawn + remote!(...))
    state: State, // v0/v1 bootstrapping mode
    actor_states: HashMap<Name, ActorInstanceState>, // per-actor spawn results & metadata
    record_supervision_events: bool,
    supervision_events: HashMap<ActorId, Vec<ActorSupervisionEvent>>,
}

proc: Proc The proc-local runtime into which new actors will be installed
remote: Remote A snapshot of the process-local registry of SpawnableActor entries (populated from RemoteSpawn impls via remote!(A)). This is the bridge between global type names and the actual constructors used by Remote::gspawn.
actor_states The agent’s bookkeeping: for each actor name in the mesh, what happened when this proc tried to spawn it.

The sections that follow walk the spawn flow end-to-end. Additional responsibilities (status, supervision, teardown) will be documented after the spawn discussion.

The V1 Spawn Flow#

At a high level, the v1 path for creating an actor on every proc looks like this:

ProcMeshRef ──(CreateOrUpdate<ActorSpec>)──▶ ProcAgent mesh
                      ProcAgent ──(Remote::gspawn)──▶ Proc / Remote registry

(”ProcMeshRef turns spawn::<A> into a broadcast CreateOrUpdate<ActorSpec> to the ProcAgent mesh; each ProcAgent then calls Remote::gspawn into its local Proc using the Remote registry.”)

From the caller’s point of view it starts as:

proc_mesh.spawn::<A>(cx, "name", &params).await

which is just a thin wrapper over:

proc_mesh
  .spawn_with_name::<A>(cx, Name::new("name"), &params)
  .await

The rest of this section unpacks what that call actually does.

From `spawn` to `ActorSpec`#

The real work happens in spawn_with_name_inner:

impl ProcMeshRef {
  async fn spawn_with_name_inner<A: Actor + Referable>(
      &self,
      cx: &impl context::Actor,
      name: Name,
      params: &A::Params,
  ) -> v1::Result<ActorMesh<A>>
  where
      A::Params: RemoteMessage,
  {
      let remote = Remote::collect();
        // `RemoteSpawn` + `remote!(A)` ensure that `A` has a
        // `SpawnableActor` entry in this registry, so
        // `name_of::<A>()` can resolve its global type name.
      let actor_type = remote
          .name_of::<A>()
          .ok_or(Error::ActorTypeNotRegistered(type_name::<A>().to_string()))?
          .to_string();

      let serialized_params = bincode::serialize(params)?;
      let agent_mesh = self.agent_mesh();

      agent_mesh.cast(
          cx,
          resource::CreateOrUpdate::<mesh_agent::ActorSpec> {
              name: name.clone(),
              rank: Default::default(),
              spec: mesh_agent::ActorSpec {
                  actor_type: actor_type.clone(),
                  params_data: serialized_params.clone(),
              },
          },
      )?;

      // ... wait on GetRankStatus and build ActorMesh<A> ...
  }
}

What this does, step by step:

Resolve the Rust type A to a global type name
```
let remote = Remote::collect();
let actor_type = remote
    .name_of::<A>()
    .ok_or(Error::ActorTypeNotRegistered(type_name::<A>().to_string()))?
    .to_string();
```
This is the point where the type-level contract kicks in:
- elsewhere, the user has written remote!(MyActor) for each A: RemoteSpawn,
- that registration adds a SpawnableActor entry to the Remote registry,
- Remote::name_of::<A>() looks up that entry and reads its global type name.
If A was never registered with remote!(A), this call fails with ActorTypeNotRegistered, and the spawn never leaves the caller’s process.
Serialize the spawn parameters
```
let serialized_params = bincode::serialize(params)?;
```
Spawn parameters travel as opaque bytes. The API only enforces that A::Params: RemoteMessage, meaning the caller’s side can serialize them. On the remote side there is no trait bound — the generated RemoteSpawn::gspawn simply attempts to deserialize the incoming byte payload into A::Params and will return an error if it cannot.
Broadcast a resource-style CreateOrUpdate<ActorSpec>
```
let agent_mesh = self.agent_mesh();

agent_mesh.cast(
    cx,
    resource::CreateOrUpdate::<mesh_agent::ActorSpec> {
        name: name.clone(),
        rank: Default::default(),
        spec: mesh_agent::ActorSpec {
            actor_type: actor_type.clone(),
            params_data: serialized_params.clone(),
        },
    },
)?;
```
This is where the proc mesh turns a local method call into a distributed control-plane request:
- agent_mesh is an ActorMeshRef<ProcAgent> – one ProcAgent per proc,
- cast sends the same CreateOrUpdate<ActorSpec> to every ProcAgent,
- the name field is the mesh-level actor name (“this actor, on this mesh”),
- actor_type is the global type name resolved via Remote,
- params_data is the serialized A::Params.

At this point the proc mesh has done its part: it has told every proc in the mesh: “For mesh actor name, please ensure you have one local actor of type actor_type, constructed from params_data.”

How `ProcAgent` handles `CreateOrUpdate<ActorSpec>`#

Once the ProcMeshRef has broadcast a CreateOrUpdate<ActorSpec> to every proc, each proc’s ProcAgent receives that message and attempts to construct the actor locally.

The entry point is:

#[async_trait]
impl Handler<resource::CreateOrUpdate<ActorSpec>> for ProcAgent {
    async fn handle(
        &mut self,
        _cx: &Context<Self>,
        create_or_update: resource::CreateOrUpdate<ActorSpec>,
    ) -> anyhow::Result<()> {
         ...
    }
}

This handler performs four steps:

Idempotence: only the first CreateOrUpdate matters

if self.actor_states.contains_key(&create_or_update.name) {
    // There is no update.
    return Ok(());
}

The CreateOrUpdate resource verb supports “update” in principle, but actor meshes never update an existing actor by name. They only create a fresh actor mesh.

So the agent simply ignores subsequent requests for the same name.

Safety check: reject spawn if the proc has supervision errors

if !self.supervision_events.is_empty() {
    self.actor_states.insert(
        create_or_update.name.clone(),
        ActorInstanceState {
            spawn: Err(anyhow::anyhow!(
                "Cannot spawn new actors on mesh with supervision events"
            )),
            create_rank,
            stopped: false,
        },
    );
    return Ok(());
}

If this proc previously recorded any supervision events for any actor, the proc is considered “poisoned”: it may be in a bad state, and spawning new actors would be unsafe.

The agent records the failure in actor_states and stops.

Later, when the ProcMesh::spawn_with_name_inner calls GetRankStatus::wait to aggregate per-rank results, this proc will contribute a Failed status for that actor name instead of ever reporting it as Running.

Unpack ActorSpec and call remote.gspawn

let ActorSpec {
    actor_type,
    params_data,
} = create_or_update.spec;

self.actor_states.insert(
    create_or_update.name.clone(),
    ActorInstanceState {
        create_rank,
        spawn: self
            .remote
            .gspawn(
                &self.proc,
                &actor_type,
                &create_or_update.name.to_string(),
                params_data,
            )
            .await,
        stopped: false,
    },
);

This is the core of v1 spawning. The agent:

unpacks the ActorSpec (type name + parameter bytes), and
passes those pieces into remote.gspawn(...) to construct the local actor.
actor_type: String – the logical type name registered by remote!(A), computed in ProcMeshRef::spawn_with_name_inner via remote.name_of::<A>(), and used by Remote::gspawn on each proc to find the right constructor.
params_data: Data A raw byte buffer containing serialized A::Params (via bincode::serialize).
self.remote.gspawn(...) This method looks up the SpawnableActor entry for actor_type in the local Remote registry then invoks:

SpawnableActor::spawn(proc, name, params_data)

Internally this calls the actor’s RemoteSpawn::new(params).await construtor registers it under the given name in the proc’s runtime, and returns an ActorId.

The result – success or failure – is recorded in:

ActorInstanceState {
    create_rank,     // this proc's rank in the mesh
    spawn: Result<ActorId, anyhow::Error>,
    stopped: false,
}

The actor_states map is later queried by GetRankStatus and GetState.

Return success locally (no direct reply)

Once the agent has updated actor_states, the handler simply returns:

Ok(())

There is no direct reply back to the caller for CreateOrUpdate<ActorSpec>.

From the agent’s point of view, the work for the message is:

decide whether to attempt a spawn (idempotence + supervision gate),
call remote.gspawn(...) into the local Proc,
record the outcome in actor_states[name] as ActorInstanceState.

That’s it. The handler does not try to decide whether the mesh-level spawn “succeeded” or “failed” - it just persists the per-proc result.

Those per-proc results are later read by the resource query handlers (Handler<GetRankStatus>, Handler<GetState<ActorState>> on ProcAgent).

Completing the Spawn: How `GetRankStatus` Decides Success#

Once every ProcAgent has received the CreateOrUpdate<ActorSpec> message and updated its local actor_states, the caller still does not know:

Did every proc spawn the actor successfully?
Did any proc report a supervision failure?
Are all actors running, or did one terminate immediately?

To answer these questions, the ProcMeshRef performs a second distributed query using the resource verb:

resource::GetRankStatus{ name, reply }

This message is broadcast to the same ProcAgent mesh. Each agent replies with a small “overlay” describing its result for that actor name:

no entry yet -> NotExist
spawn failed -> Failed(error)
spawned and running -> Running
terminated -> Stopped/Failed,
supervision events present -> Failed.

The reply port used to collect all GetRankStatus responses is opened via:

let (port, rx) = cx.mailbox().open_accum_port_opts(
    StatusMesh::from_single(region.clone(), Status::NotExist),
    Some(ReducerOpts { max_update_interval: Some(Duration::from_millis(50)) }),
);

Here, cx is the callers context. In tests this is typically testing::instance(), a tiny driver actor (Instance<>()), so the accumulation port (port/rx)-and thus all collected replies-live in that test instance’s mailbox.

An accumulation port is just a mailbox port that keeps a running aggregate value. Each GetRankStatus reply is an overlay, and the mailbox’s reducer merges those overlays into a single StatusMesh, with one final status per proc/rank.

Proc meshes & ProcAgent#

What the ProcAgent Is#

The V1 Spawn Flow#

From spawn to ActorSpec#

How ProcAgent handles CreateOrUpdate<ActorSpec>#

Completing the Spawn: How GetRankStatus Decides Success#

What the `ProcAgent` Is#

From `spawn` to `ActorSpec`#

How `ProcAgent` handles `CreateOrUpdate<ActorSpec>`#

Completing the Spawn: How `GetRankStatus` Decides Success#