Module introspect

Module introspect 

Source
Expand description

Introspection protocol for hyperactor actors.

Every actor has a dedicated introspect task that handles IntrospectMessage by reading InstanceCell state directly, without going through the actor’s message loop. This means:

  • Stuck actors can be introspected (the task runs independently).
  • Introspection does not perturb observed state (no Heisenberg).
  • Live status is reported accurately.

Infrastructure actors publish domain-specific metadata via publish_attrs(), which the introspect task reads for Entity-view queries. Non-addressable children (e.g., system procs) are resolved via a callback registered on InstanceCell.

Callers navigate topology by fetching an IntrospectResult and following its children references.

§Design Invariants

The introspection subsystem maintains eleven invariants (S1–S11). Each is documented at the code site that enforces it.

  • S1. Introspection must not depend on actor responsiveness – a wedged actor can still be introspected (runtime task, not actor loop).
  • S2. Introspection must not perturb observed state – reading InstanceCell never sets last_message_handler to IntrospectMessage.
  • S3. Sender routing is unchanged – senders target the same PortId (IntrospectMessage::port()) across processes.
  • S4. IntrospectMessage never produces a WorkCell – pre-registration via open_message_port gives the introspect port its own channel, independent of the actor’s work queue.
  • S5. Replies never use PanickingMailboxSender – the introspect task replies via Mailbox::serialize_and_send_once.
  • S6. View semantics are stable – Actor view uses live structural state + supervision children; Entity view uses published properties + domain children.
  • S7. QueryChild must work without actor handlers – system procs are resolved via a per-actor callback on InstanceCell.
  • S8. Published properties are constrained – actors cannot publish Root or Error payloads (only Host and Proc variants).
  • S9. Port binding is single source of truth – the introspect port is bound exactly once via bind_actor_port() in Instance::new().
  • S10. Introspect receiver lifecycle – created in Instance::new(), spawned in start(), dropped in child_instance().
  • S11. Terminated snapshots do not keep actors resolvable – store_terminated_snapshot writes to the proc’s snapshot map, not the instances map. resolve_actor_ref checks terminal status independently and is unaffected by snapshot storage.
  • S12. Introspection must not impair actor liveness – introspection queries (including DashMap reads for actor enumeration) must not cause convoy starvation or scheduling delays that stall concurrent actor spawn/stop operations.

§Introspection key invariants (IK-*)

  • IK-1 (metadata completeness): Every actor-runtime introspection key must carry @meta(INTROSPECT = ...) with non-empty name and desc.
  • IK-2 (short-name uniqueness): No two introspection keys may share the same IntrospectAttr.name. Duplicates would break the FQ-to-short HTTP remap and schema output.

§Failure introspection invariants (FI-*)

The FailureInfo presentation type lives in hyperactor_mesh::introspect; these invariants are documented here because the enforcement sites are in hyperactor (proc.rs serve(), live_actor_payload).

  • FI-1 (event-before-status): All InstanceCell state that live_actor_payload reads must be written BEFORE change_status() transitions to terminal.
  • FI-2 (write-once): InstanceCellState::supervision_event is written at most once per actor lifetime.
  • FI-3 (failure attrs <-> status): Failure attrs are present iff status is "failed".
  • FI-4 (is_propagated <-> root_cause_actor): failure_is_propagated == true iff failure_root_cause_actor != this_actor_id.
  • FI-5 (is_poisoned <-> failed_actor_count): is_poisoned == true iff failed_actor_count > 0.
  • FI-6 (clean stop = no artifacts): When an actor stops cleanly, supervision_event is None, failure attrs are absent, and the actor does not contribute to failed_actor_count.

§Attrs view invariants (AV-*)

These govern the typed view layer (ActorAttrsView). The full AV-* / DP-* family is documented in hyperactor_mesh::introspect; the subset relevant to this crate:

  • AV-1 (view-roundtrip): For each view V, V::from_attrs(&v.to_attrs()) == Ok(v).
  • AV-2 (required-key-strictness): from_attrs fails iff required keys for that view are missing.
  • AV-3 (unknown-key-tolerance): Unknown attrs keys must not affect successful decode outcome.

Structs§

ActorAttrsView
Typed view over attrs for an actor node.
FailureAttrs
Structured failure fields decoded from FAILURE_* attrs.
IntrospectResult
Internal introspection result. Carries attrs as a JSON string. The mesh layer constructs the API-facing NodePayload (with properties) from this via derive_properties.
RecordedEvent
Structured tracing event from the actor-local flight recorder.

Enums§

AttrsViewError
Error from decoding an Attrs bag into a typed view.
IntrospectMessage
Introspection query sent to any actor.
IntrospectView
Context for introspection query - what aspect of the actor to describe.

Statics§

ACTOR_TYPE
Fully-qualified actor type name.
CHILDREN
Child reference strings for tree navigation. Published by infrastructure actors (HostMeshAgent, ProcAgent) so the Entity view can return children without parsing mesh-layer keys.
CREATED_AT
Timestamp when this actor was created.
ERROR_CODE
Machine-readable error code for error nodes.
ERROR_MESSAGE
Human-readable error message for error nodes.
FAILURE_ERROR_MESSAGE
Failure error message.
FAILURE_IS_PROPAGATED
Whether the failure was propagated from a child.
FAILURE_OCCURRED_AT
Timestamp when failure occurred.
FAILURE_ROOT_CAUSE_ACTOR
Actor that caused the failure (root cause).
FAILURE_ROOT_CAUSE_NAME
Name of root cause actor.
FLIGHT_RECORDER
Flight recorder JSON (recent trace events).
IS_SYSTEM
Whether this actor is infrastructure/system.
LAST_HANDLER
Name of the last message handler invoked.
MESSAGES_PROCESSED
Number of messages processed by this actor.
STATUS
Actor lifecycle status: “running”, “stopped”, “failed”.
STATUS_REASON
Reason for stop/failure (absent when running).
TOTAL_PROCESSING_TIME_US
Total CPU time in message handlers (microseconds).

Functions§

format_timestamp
Format a SystemTime as an ISO 8601 timestamp with millisecond precision.
live_actor_payload
Build an IntrospectResult from live InstanceCell state.
serve_introspect
Introspect task: runs on a dedicated tokio task per actor, handling IntrospectMessage by reading InstanceCell directly and replying via the actor’s [Mailbox].