Module introspect

Expand description

Introspection protocol for hyperactor actors.

Every actor has a dedicated introspect task that handles IntrospectMessage by reading InstanceCell state directly, without going through the actor’s message loop. This means:

Stuck actors can be introspected (the task runs independently).
Introspection does not perturb observed state (no Heisenberg).
Live status is reported accurately.

Infrastructure actors publish domain-specific metadata via publish_attrs(), which the introspect task reads for Entity-view queries. Non-addressable children (e.g., system procs) are resolved via a callback registered on InstanceCell.

Callers navigate topology by fetching an IntrospectResult and following its children references.

§Design Invariants

The introspection subsystem maintains eleven invariants (S1–S11). Each is documented at the code site that enforces it.

S1. Introspection must not depend on actor responsiveness – a wedged actor can still be introspected (runtime task, not actor loop).
S2. Introspection must not perturb observed state – reading InstanceCell never sets last_message_handler to IntrospectMessage.
S3. Sender routing is unchanged – senders target the same PortId (IntrospectMessage::port()) across processes.
S4. IntrospectMessage never produces a WorkCell – pre-registration via open_message_port gives the introspect port its own channel, independent of the actor’s work queue.
S5. Replies never use PanickingMailboxSender – the introspect task replies via Mailbox::serialize_and_send_once.
S6. View semantics are stable – Actor view uses live structural state + supervision children; Entity view uses published properties + domain children.
S7. QueryChild must work without actor handlers – system procs are resolved via a per-actor callback on InstanceCell.
S8. Published properties are constrained – actors cannot publish Root or Error payloads (only Host and Proc variants).
S9. Port binding is single source of truth – the introspect port is bound exactly once via bind_actor_port() in Instance::new().
S10. Introspect receiver lifecycle – created in Instance::new(), spawned in start(), dropped in child_instance().
S11. Terminated snapshots do not keep actors resolvable – store_terminated_snapshot writes to the proc’s snapshot map, not the instances map. resolve_actor_ref checks terminal status independently and is unaffected by snapshot storage.
S12. Introspection must not impair actor liveness – introspection queries (including DashMap reads for actor enumeration) must not cause convoy starvation or scheduling delays that stall concurrent actor spawn/stop operations.

§Introspection key invariants (IK-*)

IK-1 (metadata completeness): Every actor-runtime introspection key must carry @meta(INTROSPECT = ...) with non-empty name and desc.
IK-2 (short-name uniqueness): No two introspection keys may share the same IntrospectAttr.name. Duplicates would break the FQ-to-short HTTP remap and schema output.

§Failure introspection invariants (FI-*)

The FailureInfo presentation type lives in hyperactor_mesh::introspect; these invariants are documented here because the enforcement sites are in hyperactor (proc.rs serve(), live_actor_payload).

FI-1 (event-before-status): All InstanceCell state that live_actor_payload reads must be written BEFORE change_status() transitions to terminal.
FI-2 (write-once): InstanceCellState::supervision_event is written at most once per actor lifetime.
FI-3 (failure attrs <-> status): Failure attrs are present iff status is "failed".
FI-4 (is_propagated <-> root_cause_actor): failure_is_propagated == true iff failure_root_cause_actor != this_actor_id.
FI-5 (is_poisoned <-> failed_actor_count): is_poisoned == true iff failed_actor_count > 0.
FI-6 (clean stop = no artifacts): When an actor stops cleanly, supervision_event is None, failure attrs are absent, and the actor does not contribute to failed_actor_count.

§Attrs view invariants (AV-*)

These govern the typed view layer (ActorAttrsView). The full AV-* / DP-* family is documented in hyperactor_mesh::introspect; the subset relevant to this crate:

AV-1 (view-roundtrip): For each view V, V::from_attrs(&v.to_attrs()) == Ok(v).
AV-2 (required-key-strictness): from_attrs fails iff required keys for that view are missing.
AV-3 (unknown-key-tolerance): Unknown attrs keys must not affect successful decode outcome.

Structs§

ActorAttrsView: Typed view over attrs for an actor node.
FailureAttrs: Structured failure fields decoded from FAILURE_* attrs.
IntrospectResult: Internal introspection result. Carries attrs as a JSON string. The mesh layer constructs the API-facing NodePayload (with properties) from this via derive_properties.
RecordedEvent: Structured tracing event from the actor-local flight recorder.

Enums§

AttrsViewError: Error from decoding an Attrs bag into a typed view.
IntrospectMessage: Introspection query sent to any actor.
IntrospectView: Context for introspection query - what aspect of the actor to describe.

Statics§

ACTOR_TYPE: Fully-qualified actor type name.
CHILDREN: Child reference strings for tree navigation. Published by infrastructure actors (HostMeshAgent, ProcAgent) so the Entity view can return children without parsing mesh-layer keys.
CREATED_AT: Timestamp when this actor was created.
ERROR_CODE: Machine-readable error code for error nodes.
ERROR_MESSAGE: Human-readable error message for error nodes.
FAILURE_ERROR_MESSAGE: Failure error message.
FAILURE_IS_PROPAGATED: Whether the failure was propagated from a child.
FAILURE_OCCURRED_AT: Timestamp when failure occurred.
FAILURE_ROOT_CAUSE_ACTOR: Actor that caused the failure (root cause).
FAILURE_ROOT_CAUSE_NAME: Name of root cause actor.
FLIGHT_RECORDER: Flight recorder JSON (recent trace events).
IS_SYSTEM: Whether this actor is infrastructure/system.
LAST_HANDLER: Name of the last message handler invoked.
MESSAGES_PROCESSED: Number of messages processed by this actor.
STATUS: Actor lifecycle status: “running”, “stopped”, “failed”.
STATUS_REASON: Reason for stop/failure (absent when running).
TOTAL_PROCESSING_TIME_US: Total CPU time in message handlers (microseconds).

Functions§

format_timestamp: Format a SystemTime as an ISO 8601 timestamp with millisecond precision.
live_actor_payload: Build an IntrospectResult from live InstanceCell state.
serve_introspect: Introspect task: runs on a dedicated tokio task per actor, handling IntrospectMessage by reading InstanceCell directly and replying via the actor’s [Mailbox].