Rate this Page

Proc meshes & ProcAgent#

What the ProcAgent Is#

Every proc in a mesh runs a ProcAgent. It plays the same role on the proc side that the HostAgent plays on the host side: it implements the control-plane interface for “managing this proc as part of a mesh”.

The agent has several responsibilities, all of which will be documented on this page:

  • wiring the proc into the mesh router,

  • handling resource-style requests (CreateOrUpdate, Stop, GetState, GetRankStatus, …),

  • forwarding or recording supervision events,

  • tracking the lifecycle of actors created on the proc,

  • and supporting both the legacy v0 and the current v1 spawn APIs.

This chapter begins with the v1 “resource-style” spawn path:: how a request of the form

CreateOrUpdate<ActorSpec>

results in an actual actor being constructed inside the proc using the Remote registry.

To anchor that discussion, here is the essential shape of the agent:

pub struct ProcAgent {
    proc: Proc, // local actor runtime
    remote: Remote,  // registry of SpawnableActor entries (built from RemoteSpawn + remote!(...))
    state: State, // v0/v1 bootstrapping mode
    actor_states: HashMap<Name, ActorInstanceState>, // per-actor spawn results & metadata
    record_supervision_events: bool,
    supervision_events: HashMap<ActorId, Vec<ActorSupervisionEvent>>,
}
  • proc: Proc The proc-local runtime into which new actors will be installed

  • remote: Remote A snapshot of the process-local registry of SpawnableActor entries (populated from RemoteSpawn impls via remote!(A)). This is the bridge between global type names and the actual constructors used by Remote::gspawn.

  • actor_states The agent’s bookkeeping: for each actor name in the mesh, what happened when this proc tried to spawn it.

The sections that follow walk the spawn flow end-to-end. Additional responsibilities (status, supervision, teardown) will be documented after the spawn discussion.

The V1 Spawn Flow#

At a high level, the v1 path for creating an actor on every proc looks like this:

ProcMeshRef ──(CreateOrUpdate<ActorSpec>)──▶ ProcAgent mesh
                      ProcAgent ──(Remote::gspawn)──▶ Proc / Remote registry

(”ProcMeshRef turns spawn::<A> into a broadcast CreateOrUpdate<ActorSpec> to the ProcAgent mesh; each ProcAgent then calls Remote::gspawn into its local Proc using the Remote registry.”)

From the caller’s point of view it starts as:

proc_mesh.spawn::<A>(cx, "name", &params).await

which is just a thin wrapper over:

proc_mesh
  .spawn_with_name::<A>(cx, Name::new("name"), &params)
  .await

The rest of this section unpacks what that call actually does.

From spawn to ActorSpec#

The real work happens in spawn_with_name_inner:

impl ProcMeshRef {
  async fn spawn_with_name_inner<A: Actor + Referable>(
      &self,
      cx: &impl context::Actor,
      name: Name,
      params: &A::Params,
  ) -> v1::Result<ActorMesh<A>>
  where
      A::Params: RemoteMessage,
  {
      let remote = Remote::collect();
        // `RemoteSpawn` + `remote!(A)` ensure that `A` has a
        // `SpawnableActor` entry in this registry, so
        // `name_of::<A>()` can resolve its global type name.
      let actor_type = remote
          .name_of::<A>()
          .ok_or(Error::ActorTypeNotRegistered(type_name::<A>().to_string()))?
          .to_string();

      let serialized_params = bincode::serialize(params)?;
      let agent_mesh = self.agent_mesh();

      agent_mesh.cast(
          cx,
          resource::CreateOrUpdate::<mesh_agent::ActorSpec> {
              name: name.clone(),
              rank: Default::default(),
              spec: mesh_agent::ActorSpec {
                  actor_type: actor_type.clone(),
                  params_data: serialized_params.clone(),
              },
          },
      )?;

      // ... wait on GetRankStatus and build ActorMesh<A> ...
  }
}

What this does, step by step:

  1. Resolve the Rust type A to a global type name

    let remote = Remote::collect();
    let actor_type = remote
        .name_of::<A>()
        .ok_or(Error::ActorTypeNotRegistered(type_name::<A>().to_string()))?
        .to_string();
    

    This is the point where the type-level contract kicks in:

    • elsewhere, the user has written remote!(MyActor) for each A: RemoteSpawn,

    • that registration adds a SpawnableActor entry to the Remote registry,

    • Remote::name_of::<A>() looks up that entry and reads its global type name.

    If A was never registered with remote!(A), this call fails with ActorTypeNotRegistered, and the spawn never leaves the caller’s process.

  2. Serialize the spawn parameters

    let serialized_params = bincode::serialize(params)?;
    

    Spawn parameters travel as opaque bytes. The API only enforces that A::Params: RemoteMessage, meaning the caller’s side can serialize them. On the remote side there is no trait bound — the generated RemoteSpawn::gspawn simply attempts to deserialize the incoming byte payload into A::Params and will return an error if it cannot.

  3. Broadcast a resource-style CreateOrUpdate<ActorSpec>

    let agent_mesh = self.agent_mesh();
    
    agent_mesh.cast(
        cx,
        resource::CreateOrUpdate::<mesh_agent::ActorSpec> {
            name: name.clone(),
            rank: Default::default(),
            spec: mesh_agent::ActorSpec {
                actor_type: actor_type.clone(),
                params_data: serialized_params.clone(),
            },
        },
    )?;
    

    This is where the proc mesh turns a local method call into a distributed control-plane request:

    • agent_mesh is an ActorMeshRef<ProcAgent> – one ProcAgent per proc,

    • cast sends the same CreateOrUpdate<ActorSpec> to every ProcAgent,

    • the name field is the mesh-level actor name (“this actor, on this mesh”),

    • actor_type is the global type name resolved via Remote,

    • params_data is the serialized A::Params.

At this point the proc mesh has done its part: it has told every proc in the mesh: “For mesh actor name, please ensure you have one local actor of type actor_type, constructed from params_data.”

How ProcAgent handles CreateOrUpdate<ActorSpec>#

Once the ProcMeshRef has broadcast a CreateOrUpdate<ActorSpec> to every proc, each proc’s ProcAgent receives that message and attempts to construct the actor locally.

The entry point is:

#[async_trait]
impl Handler<resource::CreateOrUpdate<ActorSpec>> for ProcAgent {
    async fn handle(
        &mut self,
        _cx: &Context<Self>,
        create_or_update: resource::CreateOrUpdate<ActorSpec>,
    ) -> anyhow::Result<()> {
         ...
    }
}

This handler performs four steps:


  1. Idempotence: only the first CreateOrUpdate matters

if self.actor_states.contains_key(&create_or_update.name) {
    // There is no update.
    return Ok(());
}

The CreateOrUpdate resource verb supports “update” in principle, but actor meshes never update an existing actor by name. They only create a fresh actor mesh.

So the agent simply ignores subsequent requests for the same name.


  1. Safety check: reject spawn if the proc has supervision errors

if !self.supervision_events.is_empty() {
    self.actor_states.insert(
        create_or_update.name.clone(),
        ActorInstanceState {
            spawn: Err(anyhow::anyhow!(
                "Cannot spawn new actors on mesh with supervision events"
            )),
            create_rank,
            stopped: false,
        },
    );
    return Ok(());
}

If this proc previously recorded any supervision events for any actor, the proc is considered “poisoned”: it may be in a bad state, and spawning new actors would be unsafe.

The agent records the failure in actor_states and stops.

Later, when the ProcMesh::spawn_with_name_inner calls GetRankStatus::wait to aggregate per-rank results, this proc will contribute a Failed status for that actor name instead of ever reporting it as Running.


  1. Unpack ActorSpec and call remote.gspawn

let ActorSpec {
    actor_type,
    params_data,
} = create_or_update.spec;

self.actor_states.insert(
    create_or_update.name.clone(),
    ActorInstanceState {
        create_rank,
        spawn: self
            .remote
            .gspawn(
                &self.proc,
                &actor_type,
                &create_or_update.name.to_string(),
                params_data,
            )
            .await,
        stopped: false,
    },
);

This is the core of v1 spawning. The agent:

  • unpacks the ActorSpec (type name + parameter bytes), and

  • passes those pieces into remote.gspawn(...) to construct the local actor.

  • actor_type: String – the logical type name registered by remote!(A), computed in ProcMeshRef::spawn_with_name_inner via remote.name_of::<A>(), and used by Remote::gspawn on each proc to find the right constructor.

  • params_data: Data A raw byte buffer containing serialized A::Params (via bincode::serialize).

  • self.remote.gspawn(...) This method looks up the SpawnableActor entry for actor_type in the local Remote registry then invoks:

SpawnableActor::spawn(proc, name, params_data)

Internally this calls the actor’s RemoteSpawn::new(params).await construtor registers it under the given name in the proc’s runtime, and returns an ActorId.

The result – success or failure – is recorded in:

ActorInstanceState {
    create_rank,     // this proc's rank in the mesh
    spawn: Result<ActorId, anyhow::Error>,
    stopped: false,
}

The actor_states map is later queried by GetRankStatus and GetState.


  1. Return success locally (no direct reply)

Once the agent has updated actor_states, the handler simply returns:

Ok(())

There is no direct reply back to the caller for CreateOrUpdate<ActorSpec>.

From the agent’s point of view, the work for the message is:

  • decide whether to attempt a spawn (idempotence + supervision gate),

  • call remote.gspawn(...) into the local Proc,

  • record the outcome in actor_states[name] as ActorInstanceState.

That’s it. The handler does not try to decide whether the mesh-level spawn “succeeded” or “failed” - it just persists the per-proc result.

Those per-proc results are later read by the resource query handlers (Handler<GetRankStatus>, Handler<GetState<ActorState>> on ProcAgent).

Completing the Spawn: How GetRankStatus Decides Success#

Once every ProcAgent has received the CreateOrUpdate<ActorSpec> message and updated its local actor_states, the caller still does not know:

  • Did every proc spawn the actor successfully?

  • Did any proc report a supervision failure?

  • Are all actors running, or did one terminate immediately?

To answer these questions, the ProcMeshRef performs a second distributed query using the resource verb:

resource::GetRankStatus{ name, reply }

This message is broadcast to the same ProcAgent mesh. Each agent replies with a small “overlay” describing its result for that actor name:

  • no entry yet -> NotExist

  • spawn failed -> Failed(error)

  • spawned and running -> Running

  • terminated -> Stopped/Failed,

  • supervision events present -> Failed.

The reply port used to collect all GetRankStatus responses is opened via:

let (port, rx) = cx.mailbox().open_accum_port_opts(
    StatusMesh::from_single(region.clone(), Status::NotExist),
    Some(ReducerOpts { max_update_interval: Some(Duration::from_millis(50)) }),
);

Here, cx is the callers context. In tests this is typically testing::instance(), a tiny driver actor (Instance<>()), so the accumulation port (port/rx)-and thus all collected replies-live in that test instance’s mailbox.

An accumulation port is just a mailbox port that keeps a running aggregate value. Each GetRankStatus reply is an overlay, and the mailbox’s reducer merges those overlays into a single StatusMesh, with one final status per proc/rank.