Referencereference/testing

Testing

The Rust induscode agent is tested by inline #[cfg(test)] modules that live next to the code, a publish = false induscode-testkit harness that carries the agent-layer fakes, and 15 insta snapshot baselines that pin the ratatui TUI cell-for-cell against a TestBackend. Every test is network-free — a scripted ModelInvoker, a localhost wiremock transcript server, and tempfile sandboxes stand in for the model, the wire, and the home directory. Run them with cargo test -p induscode (or cargo nextest run).

Test philosophy
Running the suite
The induscode-testkit harness
Vendored framework seams
Agent-layer fakes
Snapshot tests for the TUI
Unit and integration layout
Hermetic discipline
The behavior-trace parity unit
Green test count
Relationship to the framework suite

Test philosophy

Tests live inside the crate in #[cfg(test)] mod tests blocks at the bottom of each source file — the idiomatic Rust placement, not a parallel tests/ tree. There is no conftest.py analogue and no shared-fixtures module compiled into the product: each test module declares the doubles it needs, and the cross-cutting doubles come from the dev-only induscode-testkit crate, which is wired in as a path [dev-dependencies] entry.

The suite is built on three rules that make it deterministic and CI-portable across all three operating systems:

Rule	How it is enforced
No network	A scripted `ModelInvoker` (`ScriptedModel`) replays recorded emission turns; the 11 wire connectors replay recorded SSE/NDJSON bodies through a localhost `TranscriptServer` (`wiremock`), never a real provider.
No real home	Real-fs tests run under a `TempWorkspace` (`tempfile::TempDir`, removed on drop) — never `~/.indusagi/`.
No PTY	TUI tests draw into a ratatui `TestBackend` in-memory `Buffer` and assert against the cell grid — there is no terminal device, so the snapshot suite runs headless in CI.

The product crate is induscode (the merged crate at crates/induscode), and its two [[bin]] targets are indusr and indusagir — see induscode (Rust) Overview for the binary surface.

Running the suite

cd indus-code-rust

# the whole agent suite — inline #[cfg(test)] modules + snapshot baselines
cargo test -p induscode

# faster, parallel, with a nicer summary
cargo nextest run -p induscode

# the testkit's own self-tests (the fakes and the fixture loader)
cargo test -p induscode-testkit

default-members = ["crates/induscode"] in the virtual workspace manifest means a bare cargo test targets the product crate. The workspace has exactly three members — crates/induscode, crates/induscode-testkit, and xtask.

Run one module or one test by path filter:

# every test in the conductor subsystem module
cargo test -p induscode conductor::

# one snapshot test
cargo test -p induscode draw_models_snapshot

Reviewing snapshot diffs

The TUI snapshots use insta. When a render changes, insta fails the test and writes a .snap.new next to the baseline. Review and accept:

cargo install cargo-insta     # one-time

cargo insta test -p induscode # run + collect pending snapshots
cargo insta review            # interactive accept/reject of diffs
cargo insta accept            # accept all pending (non-interactive)

The induscode-testkit harness

induscode-testkit (crates/induscode-testkit) is the dev-only test scaffold:

# crates/induscode-testkit/Cargo.toml
[package]
name    = "induscode-testkit"
publish = false   # dev-only support crate; never published

[dependencies]
induscode = { path = "../induscode" }
indusagi  = { workspace = true }
ratatui   = { workspace = true }
wiremock  = "0.6"
tempfile  = "3"

It does two things, declared in src/lib.rs:

Vendors the framework's test seams — the scripted ModelInvoker, the ratatui TestBackend render harness, and the wiremock transcript server — built directly on the published indusagi umbrella crate.
Declares the agent-layer fakes the framework testkit does not carry — FakeAgent, CaptureView, BehaviorTrace, the agent-local golden loader, and TempWorkspace.

The public surface is the flat re-export from lib.rs:

// crates/induscode-testkit/src/lib.rs
pub use render::{buffer_to_string, render_frame, render_lines};
pub use script::ScriptedModel;
pub use wire::TranscriptServer;

pub use capture_view::CaptureView;
pub use fake_agent::FakeAgent;
pub use trace::BehaviorTrace;
pub use workspace::TempWorkspace;

Module	Type / fn exported	Role
`render`	`render_frame`, `render_lines`, `buffer_to_string`, `TestBackend`	The ratatui in-memory render harness for snapshot tests.
`script`	`ScriptedModel`	A `ModelInvoker` that replays recorded turns and panics past the script.
`wire`	`TranscriptServer`	A localhost `wiremock` server that serves recorded provider bodies.
`fake_agent`	`FakeAgent`, `RecordedTurn`	A network-free `Agent`-trait fake that records prompts and appends `ack:{prompt}`.
`capture_view`	`CaptureView`, `EndOfInput`	A headless `InteractiveView` capture seam for boot/runner tests with no TTY.
`trace`	`BehaviorTrace`, `ToolCall`	The normalized parity-trace unit compared across the TS and Rust agents.
`fixture`	`fixtures_dir`, `fixture_path`, `load_text`, `load_json`	The agent-local golden-corpus loader.
`workspace`	`TempWorkspace`	A `tempfile`-backed throwaway workspace, cleaned up on drop.

Why the seams are vendored, not re-imported

The framework ships its own indusagi-testkit path crate, but that crate depends on the path indusagi, which would put a second, type-incompatible copy of indusagi in the graph alongside the crates.io indusagi 0.1.0 the agent builds against. So induscode-testkit vendors ScriptedModel, the render_* helpers, and TranscriptServer verbatim into src/{script,render,wire}.rs, built on the published indusagi public contract types — keeping the whole agent workspace resolving a single indusagi crate (enforced by cargo deny check bans). See the framework's testing strategy for the upstream originals.

Vendored framework seams

ScriptedModel — the scripted ModelInvoker

ScriptedModel (src/script.rs) is the Rust port of the TS scriptModel + channelOf(emissions). It implements the framework's indusagi::runtime::contract::ModelInvoker trait, replaying one recorded turn per invoke call and panicking past the script so a runaway drive loop surfaces as a failed assertion rather than a wedged suite:

// crates/induscode-testkit/src/script.rs
pub struct ScriptedModel {
    turns: Vec<Arc<Vec<Emission>>>,
    cursor: AtomicUsize,
}

impl ScriptedModel {
    pub fn new(turns: Vec<Vec<Emission>>) -> Self { /* … */ }
    pub fn single(turn: Vec<Emission>) -> Self { Self::new(vec![turn]) }
    pub fn calls(&self) -> usize { /* turn count issued so far */ }
}

impl ModelInvoker for ScriptedModel {
    fn invoke(&self, _conversation: Conversation, _options: StreamOptions) -> Channel {
        // returns a re-iterable Channel::of(..) for turns[cursor]; panics past the end
    }
}

The returned Channel is re-iterable (iterating it twice replays the same turn without advancing the cursor), mirroring the production gateway stream contract. model.calls() is asserted to pin the exact number of drive-loop turns. The self-tests cover ordered replay, re-iterability, and the #[should_panic(expected = "only 1 turn(s) recorded")] runaway guard.

TranscriptServer — the wiremock transcript replay

TranscriptServer (src/wire.rs) wraps a wiremock::MockServer so connector tests replay a recorded provider body through the real framer + decoder + fold rather than mocking the connector:

// crates/induscode-testkit/src/wire.rs
pub const SSE_CONTENT_TYPE: &str = "text/event-stream";
pub const NDJSON_CONTENT_TYPE: &str = "application/x-ndjson";

impl TranscriptServer {
    pub async fn start() -> Self { /* ephemeral localhost port */ }
    pub fn uri(&self) -> String { /* http://127.0.0.1:<port> */ }
    pub async fn serve_sse(&self, method: &str, path: &str, body: impl Into<String>);
    pub async fn serve_ndjson(&self, method: &str, path: &str, body: impl Into<String>);
    pub async fn serve_status(&self, method: &str, path: &str, status: u16);
}

serve_status drives the HTTP-status → GatewayError-kind table tests. The self-tests use a dependency-free TcpStream GET (no HTTP client of their own) to prove the SSE content-type and status codes round-trip.

The render harness

render.rs is the ratatui snapshot engine. Three functions cover the two render shapes view code takes:

// crates/induscode-testkit/src/render.rs
pub fn render_frame<F>(w: u16, h: u16, draw: F) -> String
where F: FnOnce(&mut ratatui::Frame);

pub fn render_lines(w: u16, lines: &[ratatui::text::Line<'_>]) -> String;

pub fn buffer_to_string(buffer: &Buffer) -> String;

render_frame draws into a fixed w×h TestBackend and stringifies the buffer; render_lines lays a Vec<Line> (the shape view functions like the diff renderer return) one-per-row; buffer_to_string flattens a Buffer into snapshot text — glyph layout only, style dropped. Color/modifier are asserted separately by the unit tests that read buf[(x, y)].bg etc. The doc-comment intent: the snapshot pins the flicker / wrapping / clamping contract, not the palette.

Agent-layer fakes

FakeAgent — the Agent-trait fake

FakeAgent (src/fake_agent.rs) is the Rust port of the TS fakeAgent() from conductor.test.ts / submit.test.ts / session-persist.test.ts. It records each prompt and appends a RecordedTurn { prompt, ack: "ack:{prompt}" }, and counts aborts so boot/console tests assert a count rather than callback-spy:

// crates/induscode-testkit/src/fake_agent.rs
pub struct RecordedTurn { pub prompt: String, pub ack: String }

impl FakeAgent {
    pub fn record_prompt(&self, input: &str);  // appends ack:{input}
    pub fn record_abort(&self);
    pub fn prompts(&self) -> Vec<String>;
    pub fn turns(&self) -> Vec<RecordedTurn>;
    pub fn abort_count(&self) -> usize;
}

CaptureView — the headless InteractiveView

CaptureView (src/capture_view.rs) is the headless InteractiveView capture seam: render pushes events into a Vec, prompt pops the next scripted input line and returns EndOfInput once the queue drains (the EOF-not-sentinel discipline). Boot/runner tests then assert "this invocation drove the view with these events and asked for input N times":

// crates/induscode-testkit/src/capture_view.rs
pub struct EndOfInput;

impl CaptureView {
    pub fn with_inputs<I, S>(inputs: I) -> Self where I: IntoIterator<Item = S>, S: Into<String>;
    pub fn next_input(&self) -> Result<String, EndOfInput>;
    pub fn rendered(&self) -> Vec<String>;
    pub fn prompt_count(&self) -> usize;
    pub fn is_closed(&self) -> bool;
}

TempWorkspace — the temp-dir sandbox

TempWorkspace (src/workspace.rs) is the tempfile::TempDir-backed throwaway directory the deck (checkpoint, read-edit-gate), session-persist, settings, and @file-attachment suites seed files into. Each test gets its own, removed on drop, so runs are isolated and parallel-safe:

// crates/induscode-testkit/src/workspace.rs
impl TempWorkspace {
    pub fn new() -> Self;
    pub fn root(&self) -> &Path;
    pub fn path(&self, rel: impl AsRef<Path>) -> PathBuf;
    pub fn write_file(&self, rel: impl AsRef<Path>, contents: impl AsRef<[u8]>) -> PathBuf;
    pub fn read_file(&self, rel: impl AsRef<Path>) -> String;
}

The agent-local golden loader

fixture.rs resolves tests/fixtures/ under this workspace root (the agent goldens — export_html.json, sgr_corpus.json, system_prompt.json, wire_ndjson.json, slash_catalog.json, … — dumped from the TS agent, never hand-edited). It is anchored to CARGO_MANIFEST_DIR walked up two levels, so it resolves identically no matter which test invoked it. The framework's own loader resolves under the framework root and cannot reach the agent corpus, so this thin mirror exists:

// crates/induscode-testkit/src/fixture.rs
pub fn fixtures_dir() -> PathBuf;          // <agent-workspace>/tests/fixtures
pub fn fixture_path(name: &str) -> PathBuf;
pub fn load_text(name: &str) -> String;    // panics loudly on a missing fixture
pub fn load_json<T: DeserializeOwned>(name: &str) -> T;

Snapshot tests for the TUI

The console renderer is pinned by 15 insta snapshot baselines under src/console/**/snapshots/. Each snapshot test draws a component into a fixed-size TestBackend, flattens the buffer to a string, and calls insta::assert_snapshot!. insta is a [dev-dependencies] of induscode alongside induscode-testkit, proptest, tempfile, assert_cmd, predicates, and ratatui.

A representative test — the model picker overlay:

// crates/induscode/src/console/overlays/models.rs
#[test]
fn draw_models_snapshot() {
    let theme = theme();
    let mut terminal = Terminal::new(TestBackend::new(64, 20)).expect("terminal");
    terminal
        .draw(|f| draw_models(f, f.area(), &theme, sample(), 0))
        .expect("draw");
    let buf = terminal.backend().buffer().clone();
    let mut out = String::new();
    for y in 0..20u16 {
        for x in 0..64u16 {
            out.push_str(buf[(x, y)].symbol());
        }
        out.push('\n');
    }
    insta::assert_snapshot!(out);
}

A .snap baseline is a YAML header (source: + expression:) followed by the literal rendered cells. For example the approval-dialog baseline (induscode__console__overlays__approval__tests__render_snapshot.snap) records the full bordered box:

---
source: crates/induscode-console/src/overlays/approval.rs
expression: "render_to_string(&dialog(), 70, 16)"
---
╭────────────────────────────────────────────────────────────────────╮
│ Permission                                                         │
│ Allow Bash to run?                                                 │
│ npm run test                                                       │
│ Allow-always remembers: Bash(npm run test:*)                       │
│ > Allow once  —  run this one call                                 │
│   Allow always  —  run it and remember the tool for this session   │
│   Deny  —  block this call                                         │
│ Up/Down to move, Enter selects, Esc denies                         │
╰────────────────────────────────────────────────────────────────────╯

The 15 baselines, by area:

Snapshot file (stem)	Area	What it pins
`overlays__approval__tests__render_snapshot`	Dialogs	The permission approval dialog box.
`overlays__models__tests__draw_models_snapshot`	Dialogs	The model picker list.
`overlays__scoped_models__tests__draw_scoped_models_snapshot`	Dialogs	The per-scope model routing editor.
`overlays__oauth__tests__draw_oauth_snapshot`	Dialogs	The OAuth device-flow prompt.
`overlays__signin__tests__draw_signin_snapshot`	Dialogs	The provider sign-in overlay.
`overlays__signout__tests__draw_signout_snapshot`	Dialogs	The sign-out overlay.
`overlays__sessions__tests__draw_sessions_snapshot`	Dialogs	The session resume picker.
`overlays__settings__tests__draw_settings_snapshot`	Dialogs	The settings overlay.
`overlays__theme__tests__draw_theme_snapshot`	Dialogs	The theme picker.
`overlays__tree__tests__draw_tree_snapshot`	Dialogs	The transcript-tree navigator.
`overlays__user_turns__tests__draw_user_turns_snapshot`	Dialogs	The user-turns history view.
`overlays__plugin__tests__render_snapshot`	Dialogs	The plugin/MCP overlay.
`console__slash__tests__catalog_listing_snapshot`	Slash commands	The full slash catalog listing (static + dynamic splice + collision guard).
`view__chrome__tests__footer_stats_threshold_snapshot`	Console	The footer stats line at the visibility threshold.
`view__chrome__tests__live_theme_switch_snapshot`	Theming	The same chrome rendered under `midnight` then `daylight`.

Snapshot tests assert glyph layout; cell-level style (the live theme switch, gutter highlight) is checked by sibling assertion tests that read buf[(x, y)] directly — e.g. approval.rs::render_shows_header_args_suggestion_and_choices asserts the > Allow once selection gutter on the highlighted row.

Unit and integration layout

There is no parallel tests/ integration tree in the product crate — every test is an inline #[cfg(test)] mod tests block, so unit and integration tests sit in the file that owns the code. The module tree under crates/induscode/src/ mirrors the subsystem layout (see Architecture). Inline test attributes (#[test] / #[tokio::test] / #[should_panic]) break down roughly:

Module path	Tests (≈)	Covers
`console/`	355	Reducer, input/intents/chords, theme, banner + chrome, the overlay router, the slash handler — plus the 15 snapshot tests. See Console overview.
`conductor/`	237	Catalog gate, model matcher, transcript-tree append/branch/fork, NDJSON serialize round-trip, submit/resume, queue + fork, contract + signal hub. See Conductor.
`launch/`	111	The flag table, invocation parser, usage renderer, package command. See Launch.
`deck/`	98	Card catalog, builtin cards, provisioning, the contract + bridge ledger. See Capability Deck.
`briefing/`	83	System-prompt composition. See Briefing.
`boot/`	82	Tokenize → mode mapping, workspace resolution, `run_stages`, runner selection. See Boot.
`core/`	71	Brand identity, guardrails, workspace layout, `sessions`, `settings`, the kit helpers. See Settings / Sessions.
`addons/`	71	The addon host pipeline. See Addons.
`insight/`	51	The tracing wrapper over the framework. See Insight.
`window_budget/`	48	Context-window budgeting + condense scope. See Window Budget.
`transcript_export/`	36	Markdown/HTML transcript export. See Transcript Export.
`channels/`	30	The non-interactive print/JSON channels. See Channels.
`runtime_bridge/`	27	External-runtime provider routing. See Runtime Bridge.

The three layers of fidelity:

Pure-data unit tests over dependency-free reducers, folds, and matchers (e.g. the console reducer, the conductor model matcher, the diff renderer).
Seam / protocol tests over injected doubles — ScriptedModel for the model, FakeAgent for the Agent trait, TranscriptServer for the wire, CaptureView for the interactive view, TempWorkspace for real-fs tools.
Snapshot tests drawing real components into a TestBackend.

Hermetic discipline

The whole suite is sealed against the outside world:

No model call. ScriptedModel replays recorded Vec<Emission> turns; its calls() cursor doubles as a runaway-loop guard that panics past the script.
No real provider. The 11 wire connectors replay recorded SSE/NDJSON bodies through the real framer/decoder over a localhost TranscriptServer. The HTTP-status → error-kind table is driven by serve_status.
No real home. Real-fs tests run under TempWorkspace; the goldens load from the agent-local tests/fixtures/ via the fixture loader.
No PTY. TUI tests draw into a ratatui TestBackend and assert against the in-memory Buffer — the snapshot suite runs in CI on all three OSes with no terminal device.

The behavior-trace parity unit

BehaviorTrace (src/trace.rs) is the comparison unit for the behavioral-parity harness, which runs the real TS agent and the Rust agent on the same scripted scenario and asserts the observable behavior is identical. It is a normalized, order-preserving record of what the agent decided — not how it painted (that is covered by snapshots and goldens):

// crates/induscode-testkit/src/trace.rs
pub struct ToolCall { pub name: String, pub args: String, pub is_error: bool }

pub struct BehaviorTrace {
    pub tool_calls: Vec<ToolCall>,
    pub final_text: String,
    pub settle_phase: String,
    pub session_node_count: usize,
}

impl BehaviorTrace {
    pub fn normalize(self) -> Self;   // strips timestamps/ULIDs/abs temp paths → placeholders
}

It derives PartialEq + Eq + Serialize + Deserialize, so a parity scenario reduces to assert_eq!(rust_trace.normalize(), ts_trace.normalize()) and round-trips through JSON. For the milestone-by-milestone parity ledger this encodes, see the crate's PARITY_REPORT.md and the TS / Python editions.

Green test count

The inline test attributes discoverable in source total 1,344: ~1,324 #[test] / #[tokio::test] / #[should_panic] attributes in the induscode product crate (including the 15 TUI snapshot tests) plus 20 self-tests in induscode-testkit. The framework underneath (indusagi-rust) carries its own separately-counted suite — see the framework testing page. All milestones (M0 → M-final) are declared GREEN with zero failures, flake-free across repeated runs, and clippy -D warnings clean.

# reproduce the inline-test attribute count yourself
grep -rE '^\s*#\[(tokio::)?test' crates/induscode/src     | wc -l   # ≈ 1324
grep -rE '^\s*#\[(tokio::)?test' crates/induscode-testkit/src | wc -l   # 20

Relationship to the framework suite

The agent suite imports both from the crate under test and from the published indusagi framework it builds on. ScriptedModel implements the framework's indusagi::runtime::contract::ModelInvoker; the chrome snapshot tests use indusagi::tui_render::bridge::AgentMessage, indusagi::tui_render::theme_adapter::create_theme_adapter, and indusagi::tui_render::types::{SessionStats, StatsTokens, StatusKind}. That makes the suite an integration boundary: a framework change can move these tests even with no induscode change. The framework has its own test suite; this one sits on top of it, resolving a single published indusagi crate (the reason the framework testkit seams are vendored rather than re-imported).

See also: Architecture for the module map, Console overview for the rendered surface the snapshots pin, and Conductor for the drive loop ScriptedModel and FakeAgent exercise.