Memory
indusagi.memoryis a deliberately empty facade module —import indusagi.memorysucceeds but exports nothing. It exists only for import-path parity. The real conversational-memory work lives in the runtime compaction engine (indusagi.runtime.memory), and the framework's private cancellation and environment plumbing lives inindusagi._internal. This page documents all three.
The "memory" label sits on three unrelated things in this codebase, and conflating
them is the most common mistake. indusagi.memory (the facade) is a stub with zero
symbols. indusagi._internal is the foundational shared-utility layer (cooperative
cancellation, env-var registry, state directory). indusagi.runtime.memory is the
actual context-condensation engine that keeps a conversation under the model's
window. Each section below is honest about what it is and is not.
Table of Contents
- The memory facade is a stub
- Internal cancellation
- Internal environment registry
- The runtime memory engine
- Token estimation
- Compaction
- Relationship to neighbors
The memory facade is a stub
src/indusagi/memory.py is a phantom module. Its entire body is:
from __future__ import annotations
__all__: list[str] = []
import indusagi.memory resolves and the module loads, but it has no exports. It is
not re-exported by the package's top-level __init__.py. It exists purely so code
written against the package layout keeps importing — the corresponding upstream
facade was likewise an empty value-level module (only a type was declared on it).
| Name | Kind | Source | Purpose |
|---|---|---|---|
__all__ |
const | memory.py |
An empty list[str]. The module exports nothing. |
The module docstring is explicit: do not add symbols to this module. Agent memory is the runtime compaction engine described below, in a different package. If you are looking for where conversations get summarized, skip to The runtime memory engine.
Internal cancellation
indusagi._internal.cancel provides cooperative cancellation for the whole
framework. It is a queryable, multi-listener, chainable token — used in place of a
raw OS/asyncio task abort. Cancellation is cooperative: holders poll a property or
register a callback; nothing is forcibly killed. This is deliberate, so a tool can
observe cancellation and still return a typed is_error result rather than having
its coroutine torn down mid-flight.
| Name | Kind | Source | Purpose |
|---|---|---|---|
CancelToken |
class | _internal/cancel.py |
One-shot, thread-safe, slots-based cancellation token shared across subsystems. |
CancelledByToken |
class | _internal/cancel.py |
The exception raise_if_cancelled() raises; carries .reason. |
Callback |
type | _internal/cancel.py |
Callable[[], None] — the listener signature. |
CancelToken surface:
| Member | Semantics |
|---|---|
.cancelled |
bool property — has this token been cancelled. |
.reason |
str | None property — the reason passed to cancel(), if any. |
cancel(reason=None) |
Flip the token once (idempotent), then fan out to listeners. |
add_callback(cb) -> unsubscribe |
Register a listener; returns a disposer. Registering on an already-cancelled token fires cb immediately and returns a no-op disposer. |
child() -> CancelToken |
A new token cancelled when this one is, but not vice versa. |
raise_if_cancelled() |
Raise CancelledByToken(reason) if cancelled, else return. |
from indusagi._internal.cancel import CancelToken, CancelledByToken
token = CancelToken()
stop = token.add_callback(lambda: print("aborting work"))
# ... later, from any thread:
token.cancel("user pressed esc") # prints "aborting work"
print(token.cancelled, token.reason) # True 'user pressed esc'
# Cooperative check inside a long loop:
try:
token.raise_if_cancelled()
except CancelledByToken as e:
print("stopped:", e.reason)
child = token.child() # propagates cancellation downward only
stop() # disposer is safe to call any time
Two design points matter. First, CancelledByToken subclasses plain Exception,
not asyncio.CancelledError — framework code never swallows real task
cancellation, so the two are kept distinct. Second, exceptions raised by listeners
during the cancel() fan-out are swallowed by design: one bad listener can never
break the cancellation broadcast to the others.
Internal environment registry
indusagi._internal.env is the single sanctioned place to read environment
variables. Subsystems do not touch os.environ directly — they go through these
helpers so every branded variable is resolved one way.
| Name | Kind | Source | Purpose |
|---|---|---|---|
env_name |
function | _internal/env.py |
Brand a suffix into an INDUSAGI_-prefixed name: env_name("home") -> "INDUSAGI_HOME". |
read_env |
function | _internal/env.py |
Read the branded var for a suffix: os.environ.get(env_name(suffix)). |
read_raw |
function | _internal/env.py |
Escape hatch for non-branded names (e.g. AWS_REGION). |
indusagi_home |
function | _internal/env.py |
The framework state directory as a Path. |
ENV_PREFIX |
const | _internal/env.py |
"INDUSAGI" — the brand prefix env_name applies. |
env_name uppercases the suffix and replaces - with _ before prefixing.
indusagi_home() returns $INDUSAGI_HOME (expanded) when set, otherwise
~/.pindusagi:
from indusagi._internal.env import env_name, read_env, read_raw, indusagi_home
print(env_name("agent-dir")) # "INDUSAGI_AGENT_DIR"
log_level = read_env("log") # reads INDUSAGI_LOG, or None
region = read_raw("AWS_REGION") # non-branded, literal name
state_dir = indusagi_home() # Path("~/.pindusagi") or $INDUSAGI_HOME
The Python build's state directory is ~/.pindusagi, intentionally separate from any
other install's state directory — the Python build owns it and never reads or writes a
different layout's store. The auth.json format remains byte-compatible, but the
store itself is no longer shared.
The runtime memory engine
indusagi.runtime.memory is where conversational memory actually happens. When a
history's estimated footprint approaches the model's context window, the engine
replaces the oldest stretch of turns with a single distilled summary turn while
keeping a verbatim tail of recent turns. The barrel exposes a cheap token estimator
plus the four-part condensation flow: decide, locate, distill, stitch.
| Name | Kind | Source | Purpose |
|---|---|---|---|
estimate_context_tokens |
function | runtime/memory/estimate.py |
Coarse token estimate of a turn sequence. |
should_compact |
function | runtime/memory/compactor.py |
Decide whether history warrants condensation. |
find_cut_point |
function | runtime/memory/compactor.py |
Locate the tool-safe prefix/tail boundary. |
summarize |
async function | runtime/memory/compactor.py |
Distill a stretch of turns into one synthetic turn. |
compact |
async function | runtime/memory/compactor.py |
Top-level condensation: returns (summary, *tail). |
Import directly from the subpackage — the runtime/__init__.py barrel does not
re-export these:
from indusagi.runtime.memory import (
estimate_context_tokens,
should_compact,
find_cut_point,
summarize,
compact,
)
Token estimation
estimate_context_tokens(messages) returns a non-negative int approximating the
token footprint of a Sequence[Turn]. It is a heuristic, not a real tokenizer: a
provider-specific tokenizer is costly to run on every turn, so this sums the
character counts of every block, divides by CHARS_PER_TOKEN (4) via math.ceil,
and adds TURN_OVERHEAD_TOKENS (4) per turn for role/message framing.
Per-block character counts: TextBlock/ThinkingBlock contribute their prose
length; ToolCallBlock contributes its name plus serialized arguments;
ToolResultBlock contributes its serialized output; ImageBlock contributes its
base64 payload length; unrecognized blocks contribute nothing. Non-string values are
serialized with compact separators and ensure_ascii=False so the character count
stays consistent.
from indusagi.runtime.memory import estimate_context_tokens
tokens = estimate_context_tokens(conversation_turns)
print(tokens) # e.g. 5120
The estimate intentionally leans toward over-triggering condensation slightly rather than overflowing the window.
Compaction
should_compact(messages, model, cfg=None) returns True when the estimate reaches
model.context_window * policy.trigger_ratio. It returns False for a non-positive
context window (nothing meaningful to compare against), and falls back to
DEFAULT_POLICY when cfg is None.
DEFAULT_POLICY is CompactionPolicy(trigger_ratio=0.8, keep_recent=8) — condense
at 80% of the window, keep the 8 most recent turns verbatim. CompactionPolicy is a
frozen dataclass from the runtime contract.
find_cut_point(messages, keep_recent) returns the index of the first turn that
survives verbatim. The tail nominally starts at len - keep_recent, but the boundary
is then nudged forward (shrinking the tail, enlarging the condensed prefix) while
the candidate first-kept turn carries a ToolResultBlock — so a tool_call and its
tool_result are never split across the boundary. 0 means condense nothing; len
means condense everything.
summarize(messages, invoke) is the distillation step. It renders the turns into a
role-prefixed transcript, builds a Conversation whose system preamble is the
distillation instruction and whose single user turn is that transcript, invokes the
injected ModelInvoker, drains the streamed Channel, and returns one synthetic
turn. The summary is filed under the user role (not assistant or system) so it
stays valid input regardless of which provider runs next, and is prefixed with the
SUMMARY_HEADING [condensed earlier context].
compact(messages, model, cfg, invoke) is the top-level entry point:
from indusagi.runtime.memory import compact
# `invoke` is a ModelInvoker: Callable[[Conversation, StreamOptions], Channel]
new_history = await compact(messages, model, cfg=None, invoke=invoke)
# Returns (summary, *tail) when there is a prefix worth condensing,
# otherwise the original history unchanged.
compact resolves the policy (falling back to DEFAULT_POLICY), finds the tool-safe
cut, summarizes the prefix, and returns (summary, *tail). When the cut point is
<= 0 it returns the input unchanged, so it is safe to call unconditionally. The
model argument is accepted for signature parity and is immediately discarded
(del model); only the policy and invoker drive the work.
A note on the live caller: the conductor in Runtime
guards against an infinite compaction loop by only invoking the costly summarizer when
find_cut_point > 1. Once a history is already minimal there is nothing left to
condense. The condensation gate reads the live conversation snapshot, not a captured
payload — a deliberately preserved behavior.
Relationship to neighbors
The three pieces have very different fan-out.
indusagi.memory (the stub) is a leaf with no dependencies and no dependents.
indusagi._internal is foundational and consumed broadly. CancelToken threads
through the Agent Facade, MCP clients,
Connectors, gateway streaming, the
Capabilities tools, and Runtime
dispatch. The env helpers feed Shell App discovery,
agent sessions, API-key resolution, credentials, and the cloud connectors.
indusagi.runtime.memory depends upward on the LLM Gateway
contract (Block/Turn/Conversation/ModelCard/StreamOptions/Channel and the
emission types) and on the runtime contract (CompactionPolicy and the ModelInvoker
type alias Callable[[Conversation, StreamOptions], Channel]). Its only consumers are
the Runtime conductor — the live compaction gate — and
the Interop protocol bridge.
Back to the Architecture overview.
