Expand description
Mesh-level admin surface for topology introspection and reference walking.
This module defines MeshAdminAgent, an actor that exposes a
uniform, reference-based HTTP API over an entire host mesh. Every
addressable entity in the mesh is represented as a NodePayload
and resolved via an opaque reference string.
Incoming HTTP requests are bridged into the actor message loop
using ResolveReferenceMessage, ensuring that all topology
resolution and data collection happens through actor messaging.
The agent fans out to HostAgent instances to fetch host,
proc, and actor details, then normalizes them into a single
tree-shaped model (NodeProperties + children references)
suitable for topology-agnostic clients such as the admin TUI.
§Schema strategy
The external API contract is schema-first: the JSON Schema
(Draft 2020-12) served at GET /v1/schema is the
authoritative definition of the response shape, derived
directly from the Rust types (NodePayload,
NodeProperties, FailureInfo) via schemars::JsonSchema.
The error envelope schema is at GET /v1/schema/error.
This follows the “Admin Gateway Pattern” RFC (doc): schema is the product; transports and tooling are projections.
§Schema generation pipeline
#[derive(JsonSchema)]onNodePayload,NodeProperties,FailureInfo,ApiError,ApiErrorEnvelope.schemars::schema_for!(T)produces aSchemavalue at runtime (Draft 2020-12).- The
serve_schema/serve_error_schemahandlers inject a$idfield (SC-4) and serve the result as JSON. - Snapshot tests in
introspect::testscompare the raw schemars output (without$id) against checked-in golden files to detect drift (SC-2). - Validation tests confirm that real
NodePayloadsamples pass schema validation (SC-3).
§Regenerating snapshots
After intentional type changes to NodePayload,
NodeProperties, FailureInfo, ApiError, or
ApiErrorEnvelope, regenerate the golden files:
buck run fbcode//monarch/hyperactor_mesh:generate_api_artifacts \
@fbcode//mode/dev-nosan -- \
fbcode/monarch/hyperactor_mesh/src/testdataOr via cargo:
cargo run -p hyperactor_mesh --bin generate_api_artifacts -- \
hyperactor_mesh/src/testdataThen re-run tests to confirm the new snapshot passes.
§Schema invariants (SC-*)
- SC-1 (schema-derived): Schema is derived from Rust
types via
schemars::JsonSchema, not hand-written. - SC-2 (schema-snapshot-stability): Schema changes must be explicit — a snapshot test catches unintentional drift.
- SC-3 (schema-payload-conformance): Real
NodePayloadinstances validate against the generated schema. - SC-4 (schema-version-identity): Served schemas carry a
$idtied to the API version (e.g.https://monarch.meta.com/schemas/v1/node_payload). - SC-5 (route-precedence): Literal schema routes are
matched by specificity before the
{*reference}wildcard (axum 0.8 specificity-based routing).
Note on ApiError.details: the derived schema is maximally
permissive for details (any valid JSON). This is intentional
for v1 — details is a domain-specific escape hatch.
Consumers must not assume a fixed shape.
§Introspection visibility policy
Admin tooling only displays introspectable nodes: entities
that are reachable via actor messaging and respond to
IntrospectMessage. Infrastructure procs that are
non-routable are intentionally opaque to introspection and
are omitted from the navigation graph.
§Definitions
Routable — an entity is routable if the system can address it
via the routing layer and successfully deliver a message to it
using a Reference / ActorId (i.e., there exists a live mailbox
sender reachable through normal routing). Practical test: “can I
send IntrospectMessage::Query to it and get a reply?”
Non-routable — an entity is non-routable if it has no
externally reachable mailbox sender in the routing layer, so
message delivery is impossible by construction (even if you know
its name). Examples: hyperactor_runtime[0], mailbox_server[N],
local[N] — these use PanickingMailboxSender and are never
bound to the router.
Introspectable — tooling can obtain a NodePayload for this
node by sending IntrospectMessage to a routable actor.
Opaque — the node exists but is not introspectable via messaging; tooling cannot observe it through the introspection protocol.
§Proc visibility
A proc is not directly introspected; actors are. Tooling
synthesizes proc-level nodes by grouping introspectable actors by
ProcId.
A proc is visible iff there exists at least one actor on that proc
whose ActorId is deliverable via the routing layer (i.e., the
actor has a bound mailbox sender reachable through normal routing)
and responds to IntrospectMessage.
The rule is: if an entity is routable via the mesh routing layer
(i.e., tooling can deliver IntrospectMessage::Query to one of its
actors), then it is introspectable and appears in the admin graph.
§Navigation identity invariants (NI-*)
Every NodePayload in the topology tree satisfies:
-
NI-1 (identity = reference): A node’s
identityfield must equal the reference string used to resolve it. If the TUI asks for referenceR,payload.identity == R. -
NI-2 (parent coherence): A node’s
parentfield must equal theidentityof the node it appears under. If nodePlistsRin itschildren, thenR.parent == Some(P.identity).
Together these ensure that the TUI can correlate responses to tree nodes, and that upward/downward navigation is consistent.
§Proc-resolution invariants (SP-*)
When a proc reference is resolved, the returned NodePayload
satisfies:
- SP-1 (identity): The identity matches the ProcId reference from the parent’s children list.
- SP-2 (properties): The properties are
NodeProperties::Proc. - SP-3 (parent): The parent is set to the HostId format
(
"host:<actor_id>"). - SP-4 (as_of): The
as_offield is present and non-empty.
Enforced by test_system_proc_identity.
§Proc-agent invariants (PA-*)
- PA-1 (live children): Proc-node children used by admin/TUI must be derived from live proc state at query time. No additional publish event is required for a newly spawned actor to appear.
Enforced by test_proc_children_reflect_directly_spawned_actors.
§Robustness invariant (MA-R1)
- MA-R1 (no-crash):
MeshAdminAgentmust never crash the OS process it resides in. Every handler catches errors and converts them into structured error payloads (ResolveReferenceResponse(Err(..)),NodeProperties::Error, etc.) rather than propagating panics or unwinding. Failed reply sends (the caller went away) are silently swallowed.
§TLS transport invariant (MA-T1)
-
MA-T1 (tls): At Meta (
fbcode_build), the admin HTTP server requires mutual TLS. At startup it probes for certificates viatry_tls_acceptorwith client cert enforcement enabled. If no usable certificate bundle is found,init()returns an error — no plain HTTP fallback. In OSS, TLS is best-effort with plain HTTP fallback. -
MA-T2 (scheme-in-url): The URL returned by
GetAdminAddris alwayshttps://host:portorhttp://host:port, never a barehost:port. All callers receive and use this full URL directly.
§Client host invariants (CH-*)
Let A denote the observed host mesh (the host mesh for which
this MeshAdminAgent was spawned), and let C denote the
process-global singleton client host mesh in the caller process
(whose local proc hosts the root client actor).
-
CH-1 (deduplication): When C ∈ A, the client host appears exactly once in the admin host list (deduplicated by
HostAgentActorIdidentity). When C ∉ A,spawn_adminincludes C alongside A’s hosts so the admin introspects C as a normal host subtree, not as a standalone proc. -
CH-2 (reachability): In both cases, the root client actor is reachable through the standard host → proc → actor walk.
-
CH-3 (ordering):
spawn_adminrequirescx: &impl context::Actor(the caller’s root client instance). Constructing that instance initializes C. Therefore C is available whenspawn_adminexecutes. Any refactor must preserve this ordering.
Mechanism: [HostMeshRef::spawn_admin] reads C from the
caller process (via try_this_host()), merges it with A’s host
list, deduplicates by HostAgent ActorId, and sends the merged
list in SpawnMeshAdmin. This works for same-process and
cross-process setups because merge+dedeup happens in the caller
process before sending the spawn request.
§MAST resolution invariants (MC-*)
CLI-based mast_conda:/// resolution (OSS-compatible fallback):
- MC-1 (cli-contract):
mast get-status --json <job>must exit 0 and produce valid JSON. Missing binary → distinct error. Non-zero exit → includes exit code and stderr. Malformed JSON → parse error. - MC-2 (head-hostname):
head_hostnameextracts the first hostname by ascending task index from the last attempt of each task group. - MC-3 (fqdn-idempotent):
qualify_fqdnpasses through hostnames containing a dot. Short hostnames are qualified viagetaddrinfo(AI_CANONNAME). Failure falls back to the raw hostname. - MC-4 (fqdn-nonblocking):
qualify_fqdnruns the blockinggetaddrinfosyscall viaspawn_blocking. - MC-5 (admin-port):
resolve_admin_portuses the explicit override when provided, otherwise reads the port fromMESH_ADMIN_ADDRconfig.
Enforced by test_head_hostname_*, test_qualify_fqdn_*,
test_resolve_mast_*, test_resolve_admin_port_*.
Structs§
- ApiError
- Structured error response following the gateway RFC envelope pattern.
- ApiError
Envelope - Wrapper for the structured error envelope.
- Mesh
Admin Addr Response - Response payload for
MeshAdminMessage::GetAdminAddr. - Mesh
Admin Agent - Actor that serves a mesh-level admin HTTP endpoint.
- Resolve
Reference Response - Newtype wrapper around
Result<NodePayload, String>for the resolve reply port (OncePortRefrequiresNamed).
Enums§
- Mesh
Admin Message - Messages handled by the
MeshAdminAgent. - Resolve
Reference Message - Message for resolving an opaque reference string into a
NodePayload.
Constants§
- MESH_
ADMIN_ ACTOR_ NAME - Actor name used when spawning the mesh admin agent.
- MESH_
ADMIN_ BRIDGE_ NAME - Actor name for the HTTP bridge client mailbox on the service proc.
Traits§
- Mesh
Admin Message Client - The custom client trait for this message type.
- Mesh
Admin Message Handler - The custom handler trait for this message type.
- Resolve
Reference Message Client - The custom client trait for this message type.
- Resolve
Reference Message Handler - The custom handler trait for this message type.
Functions§
- build_
openapi_ spec - Build the OpenAPI 3.1 spec, embedding schemars-derived JSON
Schemas into
components/schemas. - resolve_
mast_ handle - Resolve a
mast_conda:///<job-name>handle into anhttps://<fqdn>:<port>base URL using themastCLI.