Referencereference/testing

Testing

The tests/ tree is the framework's full pytest suite — 1,349 collected tests, strictly network-free and credential-free. Run it from the package root with pytest; configuration lives in pyproject.toml (asyncio_mode = "auto", testpaths = ["tests"]).

The suite layout deliberately mirrors the package layout under indusagi/: one test subdirectory per subsystem (agent, ai, capabilities, connectors, interop, llmgateway, mcp, react_ink, runtime, shell_app, smithy, swarm, tracing, tui, ui_bridge), plus two root-level guards. Because of this, the test tree doubles as a map of the package's public surface — each directory imports only its sibling subpackage's public API.

Running the tests
Collection model
Coverage map
Test-double strategy
Root-level guards
Key concepts
Why 1,349?
Relationship to neighbors

Running the tests

Install the package with its dev extras, then invoke pytest from the package root. asyncio_mode = "auto" means every async def test_* runs without an explicit @pytest.mark.asyncio decorator.

# from the package root (pyproject.toml supplies testpaths + asyncio mode)
# pytest                                     # all 1,349 tests
# pytest tests/llmgateway                    # one subsystem (101 collected)
# pytest tests/tui/test_keys.py              # one file
# pytest --collect-only -q tests | tail -1   # -> "1349 tests collected"
# pytest tests/ui_bridge -q                  # the Textual Pilot integration tests

No environment variables or API keys are required. Tests that exercise key resolution monkeypatch os.environ and never read real credentials; tests that exercise sessions open on-disk fixtures strictly read-only.

Collection model

A few structural facts shape how collection works:

No top-level conftest.py and no __init__.py anywhere under tests/. pytest uses rootdir-relative sys.path insertion, which is why tests/agent/ imports the sibling module agent_suite_helpers by bare name (from agent_suite_helpers import make_agent).
asyncio_mode = "auto" — async tests need no marker.
The only per-directory conftest.py is tests/mcp/conftest.py, which installs the in-memory MCP transport fixture.

The physical test-function count is only ~470 (def test_* / async def test_*), but pytest collects 1,349 because of heavy parametrization (see Why 1,349?).

Coverage map

Per-directory collected counts (verified via pytest --collect-only):

Directory	Collected	Holds
`tests/tui`	444	Pure-logic TUI modules: fuzzy scorer, terminal-key decoder, width/wrap/slice. Heavily parametrized over escape-sequence and grapheme corpora.
`tests/llmgateway`	101	Catalog `get_card` + `estimate_cost`, full-catalog fallback + third-party routing, gateway dispatch over the mock, SSE/NDJSON streaming framer corpus (respx).
`tests/ai`	68	`complete()`, stream event framing, `EventStream` multi-cursor replay, `ModelRegistry`, env API-key resolution.
`tests/mcp`	43	`MCPClient` and `MCPClientPool` over in-memory transport + `memory` phantom-module parity. Owns the only `conftest.py`.
`tests/agent`	34	`Agent.prompt`/`steer`, abort-salvage, stream framing, `create_*_tool` factories, `SessionManager` JSONL. Shares `agent_suite_helpers.py`.
`tests/capabilities`	28	Tools over a real `tmp_path` sandbox + `make_local_context`, offline web-tool `is_error` contract, coding-collection membership, runner dispatch.
`tests/shell_app`	27	Auth CLI (API-key-first), boot-pipeline flag consumption (`--system`/`--no-tools`/`--mcp`), assembled shell (tokenize, `Locator`, settings, `OneShotRunner`, `WireRunner`).
`tests/smithy`	23	Agent-builder: `FlagReader`/`SmithyConfig`, `define_agent`, `build_interactive` over scripted `ask()`, `build_from_config`, `load_knowledge`.
`tests/runtime`	15	Pure cadence FSM + `create_agent` with a scripted `ModelInvoker` (tool loop, abort, branch/resume, compaction).
`tests/react_ink`	12	Diff renderer: line classification, gutter, context window, word-span pairing, theme-role paints (`theme_adapter` via `importorskip`).
`tests/ui_bridge`	10	Textual TUI integration: `exit_transcript` renderer + `mount_interactive`, and end-to-end `InteractiveApp` Pilot tests over real `create_agent` + mock connector.
`tests/connectors`	8	SaaS connector bridge over an in-memory mock `SaasBackend` (no vendor SDK/network).
`tests/swarm`	8	Coordination kernel over `tmp_path`: `JsonCell` concurrency, `TicketBoard` deps/cycles, `Channel` cursors; scripted fake teammate `Agent`s.
`tests/tracing`	7	Recorder -> channel -> sink flow, `RatioStrategy` sampling, `SecretScrubber` redaction, `trace_agent_run`; `FileSink` under `tmp_path`.
`tests/interop`	5	Protocol bridge in-memory round-trip: `create_provider_host` + `create_server_endpoint` + `mount_protocol_bridge`.
`tests/test_public_api.py`	507	Import-audit guard (see below).
`tests/test_scaffold.py`	9	M0 smoke checks.
`tests/fixtures/`	—	Single data fixture `ts_session_v3.jsonl` (no Python code).

Each subsystem directory maps to a documented page: AI, Agent, MCP, Memory, LLM Gateway, Runtime, Capabilities, Interop, Connectors, Swarm, Smithy, Tracing, Shell App, TUI, react-ink, UI Bridge.

Test-double strategy

The double strategy is uniform and network-free across the whole suite. Three seam patterns recur.

The deterministic `mock` connector

agent, ai, llmgateway, and ui_bridge bind to the catalog card mock-1 (indusagi.llmgateway.connectors.mock.MockConnector) so that gateway_stream routes locally. The mock streams fixed text ("Hello world", usage 8 in / 5 out) and a scripted echo tool call, making those runs fully deterministic without any network.

Scripted gateway / `ModelInvoker`

When a test needs to drive the model's output, it replaces the live model with a pre-recorded list of Emissions replayed as a Channel:

tests/agent uses agent_suite_helpers.GatewayScript, which monkeypatches the module-level indusagi.agent.agent.gateway_stream seam.
tests/runtime and tests/shell_app inject a scripted ModelInvoker replayed as a Channel of Emissions.

# pattern used across tests/agent via agent_suite_helpers.GatewayScript
from agent_suite_helpers import make_agent, GatewayScript, done_reply, events_named


async def test_prompt_runs_clean(monkeypatch):
    script = GatewayScript([done_reply("pong")])
    monkeypatch.setattr("indusagi.agent.agent.gateway_stream", script)
    agent = make_agent()
    await agent.prompt("ping")
    assert events_named(agent, "message_end")

In-memory MCP transport

tests/mcp and tests/interop stand up a create_provider_host server half over the MCP SDK's create_client_server_memory_streams and reach it through create_server_endpoint. tests/mcp/conftest.py monkeypatches indusagi.mcp.client.create_server_endpoint to inject an in-memory session factory per server name.

# tests/mcp use the conftest fixture to route a facade client onto an in-memory host
from indusagi.mcp import MCPClient
# conftest provides: install_transport, ping_tool, echo_tool, boom_tool, sleep_tool


async def test_call_tool(install_transport):
    install_transport({"srv": memory_session_factory([ping_tool()])})  # noqa: F821
    client = MCPClient(...)
    await client.connect()
    result = await client.call_tool("ping", {})
    assert any(b.get("text") == "pong" for b in result.content)

HTTP, filesystem, and Textual

HTTP wire tests (llmgateway streaming/gateway/catalog-fallback) use respx to mock httpx, exercising real SSE/NDJSON framing across chunk boundaries, CRLF/LF/CR, and multibyte UTF-8 without a server.
Filesystem suites (capabilities, swarm, smithy, tracing) write under pytest's tmp_path against the real local fs/shell backends rather than stubs — genuine I/O paths, no network.
Textual TUI tests (ui_bridge) drive the real InteractiveApp headless via App.run_test() / Pilot.
react_ink/test_diff.py guards an optional sibling with pytest.importorskip("indusagi.react_ink.theme_adapter").

Root-level guards

Name	Kind	Source	Purpose
`test_public_api.py`	guard	`tests/test_public_api.py`	Import-audit: parametrized `test_module_imports` over ~279 `pkgutil.walk_packages` modules, `test_every_all_name_resolves` over every `__all__` barrel, `test_public_barrels_declare_all` over a fixed `PUBLIC_BARRELS` list, `test_root_lazy_subsystem_aliases` (PEP 562 `indusagi._SUBSYSTEMS`), `test_connectors_import_without_composio`, and `test_user_journey_import` over 14 README/example import lines.
`test_scaffold.py`	guard	`tests/test_scaffold.py`	M0 smoke: `indusagi.__version__`/`VERSION` exposed, `CancelToken`/`CancelledByToken` conventions, `_internal.env` helpers (`env_name`, `indusagi_home`).
`agent_suite_helpers.py`	module	`tests/agent/agent_suite_helpers.py`	Shared agent harness: `make_agent`, `GatewayScript`, `done_reply`, `events_named`, `event_types`, `message_text`, `now_ms`.
`conftest.py`	module	`tests/mcp/conftest.py`	MCP in-memory plumbing fixture `install_transport`; tool builders `ping_tool`/`echo_tool`/`boom_tool`/`sleep_tool`, `box_of`, `inert_context`, `texts()`.
`ts_session_v3.jsonl`	fixture	`tests/fixtures/ts_session_v3.jsonl`	The only data fixture: an anonymized minimal session record set consumed by `agent/test_session_manager.py`.

test_public_api.py is the single guard that the whole tree imports cleanly and every barrel's __all__ is honest. Its PUBLIC_BARRELS list and USER_JOURNEY_IMPORTS lines enumerate the canonical entry points for each subsystem, so it doubles as a living checklist of the package exports.

import importlib
import pkgutil
import indusagi

modules = {"indusagi"} | {
    info.name
    for info in pkgutil.walk_packages(indusagi.__path__, prefix="indusagi.")
}
for name in sorted(modules):
    importlib.import_module(name)  # every module must import

from indusagi.agent import Agent, create_coding_tools  # a USER_JOURNEY_IMPORTS line
from indusagi.mcp import MCPClient, MCPClientPool       # another

Key concepts

Term	Meaning
mock connector / `mock-1` card	A deterministic in-process connector (`indusagi.llmgateway.connectors.mock`) bound to catalog card `mock-1`; streams fixed text and a scripted echo tool call. Makes `agent`/`ai`/`llmgateway`/`ui_bridge` runs network-free.
`gateway_stream` seam	The module-level `from indusagi.llmgateway import stream as gateway_stream` in `indusagi.agent.agent` that agent tests monkeypatch (via `GatewayScript`) or route to the mock — the single injection point for scripting the model in the facade.
Scripted `ModelInvoker` / Channel of Emissions	`runtime`/`shell_app`/`ui_bridge` replace the live model with a pre-recorded list of `Emission`s replayed as a `Channel`, so the conductor loop is fully deterministic.
In-memory MCP transport	`create_client_server_memory_streams` links a `create_provider_host` server half to a `create_server_endpoint` client half in-process; `mcp/conftest.py` injects this per server name.
Import-audit guard	`test_public_api.py`'s three guarantees: every `walk_packages` module imports, every `__all__` name resolves via `getattr`, and the exact README/example import lines keep working.
`tmp_path` golden tests	`capabilities`/`swarm`/`smithy`/`tracing` run against a real on-disk sandbox and the genuine local fs/shell backends instead of stubs.

Why 1,349?

File size does not equal test count — equating the two is a trap. Two parametrization sources account for nearly the entire gap between ~470 physical functions and 1,349 collected:

test_public_api.py (507 collected) — test_module_imports parametrizes over every module pkgutil.walk_packages discovers under indusagi (~279 modules; the file's own tripwire asserts >= 200), plus test_every_all_name_resolves over every barrel declaring __all__ and 14 USER_JOURNEY_IMPORTS.
tests/tui (444 collected) — test_keys.py (14 parametrize blocks) and test_text_width.py (9 blocks) fan a small number of functions into hundreds of cases over terminal escape-sequence and grapheme-width corpora.

The suite is strictly offline / no-credential: env-key tests monkeypatch os.environ (never read real keys), and session tests open the smallest real on-disk session strictly read-only (never via SessionManager.open, which would rewrite a pre-v3 file).

Relationship to neighbors

Cross-subsystem coupling shows up entirely in the harnesses:

agent, ai, and llmgateway share the deterministic mock/mock-1 connector card.
mcp and interop share the same MCP-SDK in-memory transport pattern; mcp/conftest.py explicitly reuses interop's create_provider_host approach.
runtime, shell_app, agent, and ui_bridge all script the model via a Channel of Emissions / ModelInvoker, so the conductor loop runs deterministically.
ui_bridge composes runtime.create_agent + the llmgateway mock + the Textual app, making its tests the broadest integration tests in the suite.

Several suites pin deliberate Python-side behaviors that are documented as intentional, not bugs: shell_app/test_boot_flags.py asserts that --system/--no-tools/--mcp actually take effect and that faulted MCP servers print to stderr; shell_app/test_auth_cli.py pins the API-key-first login path. For the full TS-vs-Python coverage story and the per-file "ported 1:1" vs "authored" provenance, see Parity. For the entry points enumerated by PUBLIC_BARRELS, see Package exports; for the high-level layering, see Architecture.

On This Page

Table of Contents Running the tests Collection model Coverage map Test-double strategy The deterministic `mock` connector Scripted gateway / `ModelInvoker`In-memory MCP transport HTTP, filesystem, and Textual Root-level guards Key concepts Why 1,349?Relationship to neighbors