Facadesfacades/memory

Memory

indusagi.memory is a deliberately empty facade module — import indusagi.memory succeeds but exports nothing. It exists only for import-path parity. The real conversational-memory work lives in the runtime compaction engine (indusagi.runtime.memory), and the framework's private cancellation and environment plumbing lives in indusagi._internal. This page documents all three.

The "memory" label sits on three unrelated things in this codebase, and conflating them is the most common mistake. indusagi.memory (the facade) is a stub with zero symbols. indusagi._internal is the foundational shared-utility layer (cooperative cancellation, env-var registry, state directory). indusagi.runtime.memory is the actual context-condensation engine that keeps a conversation under the model's window. Each section below is honest about what it is and is not.

The memory facade is a stub
Internal cancellation
Internal environment registry
The runtime memory engine
Token estimation
Compaction
Relationship to neighbors

The memory facade is a stub

src/indusagi/memory.py is a phantom module. Its entire body is:

from __future__ import annotations

__all__: list[str] = []

import indusagi.memory resolves and the module loads, but it has no exports. It is not re-exported by the package's top-level __init__.py. It exists purely so code written against the package layout keeps importing — the corresponding upstream facade was likewise an empty value-level module (only a type was declared on it).

Name	Kind	Source	Purpose
`__all__`	const	`memory.py`	An empty `list[str]`. The module exports nothing.

The module docstring is explicit: do not add symbols to this module. Agent memory is the runtime compaction engine described below, in a different package. If you are looking for where conversations get summarized, skip to The runtime memory engine.

Internal cancellation

indusagi._internal.cancel provides cooperative cancellation for the whole framework. It is a queryable, multi-listener, chainable token — used in place of a raw OS/asyncio task abort. Cancellation is cooperative: holders poll a property or register a callback; nothing is forcibly killed. This is deliberate, so a tool can observe cancellation and still return a typed is_error result rather than having its coroutine torn down mid-flight.

Name	Kind	Source	Purpose
`CancelToken`	class	`_internal/cancel.py`	One-shot, thread-safe, slots-based cancellation token shared across subsystems.
`CancelledByToken`	class	`_internal/cancel.py`	The exception `raise_if_cancelled()` raises; carries `.reason`.
`Callback`	type	`_internal/cancel.py`	`Callable[[], None]` — the listener signature.

CancelToken surface:

Member	Semantics
`.cancelled`	`bool` property — has this token been cancelled.
`.reason`	`str \| None` property — the reason passed to `cancel()`, if any.
`cancel(reason=None)`	Flip the token once (idempotent), then fan out to listeners.
`add_callback(cb) -> unsubscribe`	Register a listener; returns a disposer. Registering on an already-cancelled token fires `cb` immediately and returns a no-op disposer.
`child() -> CancelToken`	A new token cancelled when this one is, but not vice versa.
`raise_if_cancelled()`	Raise `CancelledByToken(reason)` if cancelled, else return.

from indusagi._internal.cancel import CancelToken, CancelledByToken

token = CancelToken()
stop = token.add_callback(lambda: print("aborting work"))

# ... later, from any thread:
token.cancel("user pressed esc")   # prints "aborting work"
print(token.cancelled, token.reason)   # True 'user pressed esc'

# Cooperative check inside a long loop:
try:
    token.raise_if_cancelled()
except CancelledByToken as e:
    print("stopped:", e.reason)

child = token.child()   # propagates cancellation downward only
stop()                  # disposer is safe to call any time

Two design points matter. First, CancelledByToken subclasses plain Exception, not asyncio.CancelledError — framework code never swallows real task cancellation, so the two are kept distinct. Second, exceptions raised by listeners during the cancel() fan-out are swallowed by design: one bad listener can never break the cancellation broadcast to the others.

Internal environment registry

indusagi._internal.env is the single sanctioned place to read environment variables. Subsystems do not touch os.environ directly — they go through these helpers so every branded variable is resolved one way.

Name	Kind	Source	Purpose
`env_name`	function	`_internal/env.py`	Brand a suffix into an `INDUSAGI_`-prefixed name: `env_name("home") -> "INDUSAGI_HOME"`.
`read_env`	function	`_internal/env.py`	Read the branded var for a suffix: `os.environ.get(env_name(suffix))`.
`read_raw`	function	`_internal/env.py`	Escape hatch for non-branded names (e.g. `AWS_REGION`).
`indusagi_home`	function	`_internal/env.py`	The framework state directory as a `Path`.
`ENV_PREFIX`	const	`_internal/env.py`	`"INDUSAGI"` — the brand prefix `env_name` applies.

env_name uppercases the suffix and replaces - with _ before prefixing. indusagi_home() returns $INDUSAGI_HOME (expanded) when set, otherwise ~/.pindusagi:

from indusagi._internal.env import env_name, read_env, read_raw, indusagi_home

print(env_name("agent-dir"))   # "INDUSAGI_AGENT_DIR"
log_level = read_env("log")    # reads INDUSAGI_LOG, or None
region = read_raw("AWS_REGION")  # non-branded, literal name
state_dir = indusagi_home()    # Path("~/.pindusagi") or $INDUSAGI_HOME

The Python build's state directory is ~/.pindusagi, intentionally separate from any other install's state directory — the Python build owns it and never reads or writes a different layout's store. The auth.json format remains byte-compatible, but the store itself is no longer shared.

The runtime memory engine

indusagi.runtime.memory is where conversational memory actually happens. When a history's estimated footprint approaches the model's context window, the engine replaces the oldest stretch of turns with a single distilled summary turn while keeping a verbatim tail of recent turns. The barrel exposes a cheap token estimator plus the four-part condensation flow: decide, locate, distill, stitch.

Name	Kind	Source	Purpose
`estimate_context_tokens`	function	`runtime/memory/estimate.py`	Coarse token estimate of a turn sequence.
`should_compact`	function	`runtime/memory/compactor.py`	Decide whether history warrants condensation.
`find_cut_point`	function	`runtime/memory/compactor.py`	Locate the tool-safe prefix/tail boundary.
`summarize`	async function	`runtime/memory/compactor.py`	Distill a stretch of turns into one synthetic turn.
`compact`	async function	`runtime/memory/compactor.py`	Top-level condensation: returns `(summary, *tail)`.

Import directly from the subpackage — the runtime/__init__.py barrel does not re-export these:

from indusagi.runtime.memory import (
    estimate_context_tokens,
    should_compact,
    find_cut_point,
    summarize,
    compact,
)

Token estimation

estimate_context_tokens(messages) returns a non-negative int approximating the token footprint of a Sequence[Turn]. It is a heuristic, not a real tokenizer: a provider-specific tokenizer is costly to run on every turn, so this sums the character counts of every block, divides by CHARS_PER_TOKEN (4) via math.ceil, and adds TURN_OVERHEAD_TOKENS (4) per turn for role/message framing.

Per-block character counts: TextBlock/ThinkingBlock contribute their prose length; ToolCallBlock contributes its name plus serialized arguments; ToolResultBlock contributes its serialized output; ImageBlock contributes its base64 payload length; unrecognized blocks contribute nothing. Non-string values are serialized with compact separators and ensure_ascii=False so the character count stays consistent.

from indusagi.runtime.memory import estimate_context_tokens

tokens = estimate_context_tokens(conversation_turns)
print(tokens)   # e.g. 5120

The estimate intentionally leans toward over-triggering condensation slightly rather than overflowing the window.

Compaction

should_compact(messages, model, cfg=None) returns True when the estimate reaches model.context_window * policy.trigger_ratio. It returns False for a non-positive context window (nothing meaningful to compare against), and falls back to DEFAULT_POLICY when cfg is None.

DEFAULT_POLICY is CompactionPolicy(trigger_ratio=0.8, keep_recent=8) — condense at 80% of the window, keep the 8 most recent turns verbatim. CompactionPolicy is a frozen dataclass from the runtime contract.

find_cut_point(messages, keep_recent) returns the index of the first turn that survives verbatim. The tail nominally starts at len - keep_recent, but the boundary is then nudged forward (shrinking the tail, enlarging the condensed prefix) while the candidate first-kept turn carries a ToolResultBlock — so a tool_call and its tool_result are never split across the boundary. 0 means condense nothing; len means condense everything.

summarize(messages, invoke) is the distillation step. It renders the turns into a role-prefixed transcript, builds a Conversation whose system preamble is the distillation instruction and whose single user turn is that transcript, invokes the injected ModelInvoker, drains the streamed Channel, and returns one synthetic turn. The summary is filed under the user role (not assistant or system) so it stays valid input regardless of which provider runs next, and is prefixed with the SUMMARY_HEADING [condensed earlier context].

compact(messages, model, cfg, invoke) is the top-level entry point:

from indusagi.runtime.memory import compact

# `invoke` is a ModelInvoker: Callable[[Conversation, StreamOptions], Channel]
new_history = await compact(messages, model, cfg=None, invoke=invoke)
# Returns (summary, *tail) when there is a prefix worth condensing,
# otherwise the original history unchanged.

compact resolves the policy (falling back to DEFAULT_POLICY), finds the tool-safe cut, summarizes the prefix, and returns (summary, *tail). When the cut point is <= 0 it returns the input unchanged, so it is safe to call unconditionally. The model argument is accepted for signature parity and is immediately discarded (del model); only the policy and invoker drive the work.

A note on the live caller: the conductor in Runtime guards against an infinite compaction loop by only invoking the costly summarizer when find_cut_point > 1. Once a history is already minimal there is nothing left to condense. The condensation gate reads the live conversation snapshot, not a captured payload — a deliberately preserved behavior.

Relationship to neighbors

The three pieces have very different fan-out.

indusagi.memory (the stub) is a leaf with no dependencies and no dependents.

indusagi._internal is foundational and consumed broadly. CancelToken threads through the Agent Facade, MCP clients, Connectors, gateway streaming, the Capabilities tools, and Runtime dispatch. The env helpers feed Shell App discovery, agent sessions, API-key resolution, credentials, and the cloud connectors.

indusagi.runtime.memory depends upward on the LLM Gateway contract (Block/Turn/Conversation/ModelCard/StreamOptions/Channel and the emission types) and on the runtime contract (CompactionPolicy and the ModelInvoker type alias Callable[[Conversation, StreamOptions], Channel]). Its only consumers are the Runtime conductor — the live compaction gate — and the Interop protocol bridge.

Back to the Architecture overview.

On This Page

Table of Contents The memory facade is a stub Internal cancellation Internal environment registry The runtime memory engine Token estimation Compaction Relationship to neighbors