Memory Layer (L5) β
Purpose: Intelligent context assembly and memory management with budget-aware policies and cross-layer coordination.
Layer Position: L5 - Central memory coordination between data layers (L1-L4) and orchestration (L6)
Functional Overview β
Memory Layer serves as the intelligent memory subsystem that:
- Assembles context from multiple sources with budget constraints
- Manages memory lifecycle using KV policies (pin/compress/evict)
- Prioritizes sources based on quality, recency, and relevance
- Handles degradation gracefully when budgets or sources fail
- Coordinates with L8 for memory quality optimization
Core Responsibilities β
1. Context Assembly β
- Aggregate relevant information from L1 (Noosphere), L2 (Project Library), L3 (Workshop), L4 (Experience)
- Apply Level-of-Detail (LOD) filtering: macro β micro β atomic
- Enforce token and time budgets from orchestration requests
- Maintain coherence and avoid contradictions in assembled context
2. Memory Management β
- Pin: Keep frequently accessed or critical context in fast storage
- Compress: Reduce storage footprint of less-accessed but valuable context
- Evict: Remove stale, low-value, or budget-exceeding context
- Consolidate: Merge related memories to reduce fragmentation
3. Source Prioritization β
- Rank sources by quality signals from L8 evaluation
- Consider recency, credibility, and user feedback
- Apply domain-specific weighting (code vs docs vs research)
- Handle source conflicts through precedence rules
4. Budget Enforcement β
- Token Budget: Ensure assembled context fits within LLM context windows
- Time Budget: Complete assembly within orchestration deadlines
- Quality Budget: Maintain minimum relevance and coherence thresholds
- Cost Budget: Optimize for computational and storage efficiency
Architecture Components β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β L5: Memory Layer β
ββββββββββββββββ¬βββββββββββββββββββ¬βββββββββββββββββββββββββββ€
β Context β KV Policy β Budget β
β Assembler β Engine β Manager β
ββββββββ¬ββββββββ΄ββββββββββ¬βββββββββ΄βββββββββββ¬ββββββββββββββββ
β β β
βΌ Assembly βΌ Memory Ops βΌ Constraints
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Sources: L1 Noosphere β L2 Project β L3 Workshop β L4 Exp β
β search resultsβ lib contentβ tool outputsβ hints β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β² β²
β Assembled Context Feedback β
βΌ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β L6: Orchestration (CEO/ACS/HCS) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β²
β Quality Signals
βΌ
βββββββββββββββββββββ
β L8: Evaluation β
βββββββββββββββββββββ
Context Assembly Process β
1. Request Analysis β
typescript
interface MemoryRequest {
request_id: string;
intent: string;
entities: string[];
budgets: {
tokens_max: number; // Hard limit for assembled context
time_ms: number; // Assembly deadline
quality_min: number; // Minimum relevance threshold
};
lod_preference: "macro" | "micro" | "atomic" | "adaptive";
source_scope: string[]; // Which layers to query
kv_policy?: KVPolicy; // Override default memory policies
}
2. Source Coordination β
typescript
interface SourceQuery {
layer: "L1" | "L2" | "L3" | "L4";
query: string;
entities: string[];
max_results: number;
deadline_ms: number;
lod: string;
}
// Parallel queries to multiple layers
const sourceQueries = [
{ layer: "L1", query: intent, max_results: 15, lod: "macro" },
{ layer: "L2", query: intent, max_results: 10, lod: "micro" },
{ layer: "L4", query: intent, max_results: 5, lod: "adaptive" }
];
3. Content Prioritization β
typescript
interface ContextFragment {
id: string;
content: string;
source_layer: string;
lod: "macro" | "micro" | "atomic";
priority_score: number; // Computed priority (0.0-1.0)
token_cost: number;
metadata: {
relevance: number; // Relevance to query
recency: number; // How recent is the content
credibility: number; // Source credibility from L8
coherence: number; // Fits well with other fragments
};
}
4. Budget-Aware Assembly β
typescript
class ContextAssembler {
async assembleContext(request: MemoryRequest): Promise<AssembledContext> {
// 1. Query all sources in parallel
const sourceResults = await this.queryAllSources(request);
// 2. Prioritize and score fragments
const fragments = this.prioritizeFragments(sourceResults, request);
// 3. Select optimal subset within budget
const selected = this.selectWithinBudget(fragments, request.budgets);
// 4. Check coherence and resolve conflicts
const coherent = await this.ensureCoherence(selected);
// 5. Apply KV policy for future access
await this.applyKVPolicy(coherent, request.kv_policy);
return {
request_id: request.request_id,
fragments: coherent,
metadata: {
total_tokens: coherent.reduce((sum, f) => sum + f.token_cost, 0),
assembly_time_ms: Date.now() - startTime,
source_coverage: this.computeCoverage(coherent),
quality_achieved: this.computeQuality(coherent)
}
};
}
}
KV Policy Engine β
Memory Categories β
- Hot: Frequently accessed, keep in fast storage (Redis/Memory)
- Warm: Occasionally accessed, compressed storage (Disk/S3)
- Cold: Rarely accessed, archive storage (S3 Glacier)
- Frozen: Historical/audit, minimal access (S3 Deep Archive)
Policy Operations β
typescript
interface KVPolicy {
pin: {
patterns: string[]; // Content patterns to always keep hot
max_size_mb: number; // Maximum size for pinned content
ttl_hours: number; // Time-to-live for pinned items
};
compress: {
age_threshold_hours: number; // Compress after this age
access_threshold: number; // Compress if accessed < N times
compression_ratio: number; // Target compression ratio
};
evict: {
max_total_size_gb: number; // Total memory budget
lru_weight: number; // Weight for LRU eviction
quality_weight: number; // Weight for quality-based eviction
};
}
Default Policies by Content Type β
typescript
const DEFAULT_POLICIES: Record<string, KVPolicy> = {
"code_snippets": {
pin: { patterns: ["*.js", "*.ts", "*.py"], max_size_mb: 100, ttl_hours: 24 },
compress: { age_threshold_hours: 6, access_threshold: 3, compression_ratio: 0.7 },
evict: { max_total_size_gb: 2, lru_weight: 0.6, quality_weight: 0.4 }
},
"documentation": {
pin: { patterns: ["README*", "*.md"], max_size_mb: 50, ttl_hours: 12 },
compress: { age_threshold_hours: 12, access_threshold: 2, compression_ratio: 0.8 },
evict: { max_total_size_gb: 1, lru_weight: 0.4, quality_weight: 0.6 }
},
"research_papers": {
pin: { patterns: ["*.pdf", "abstract*"], max_size_mb: 200, ttl_hours: 48 },
compress: { age_threshold_hours: 24, access_threshold: 1, compression_ratio: 0.9 },
evict: { max_total_size_gb: 5, lru_weight: 0.2, quality_weight: 0.8 }
}
};
Source Prioritization Framework β
Priority Scoring Algorithm β
typescript
function computePriorityScore(fragment: ContextFragment, context: AssemblyContext): number {
const relevanceScore = computeRelevance(fragment.content, context.query);
const recencyScore = computeRecencyScore(fragment.metadata.created_at);
const credibilityScore = fragment.metadata.credibility; // From L8
const coherenceScore = computeCoherence(fragment, context.existingFragments);
// Weighted combination with domain-specific weights
const weights = context.domain_weights || {
relevance: 0.4,
recency: 0.2,
credibility: 0.25,
coherence: 0.15
};
return (
relevanceScore * weights.relevance +
recencyScore * weights.recency +
credibilityScore * weights.credibility +
coherenceScore * weights.coherence
);
}
Source Precedence Rules β
- L1 (Noosphere) - Global knowledge, highest credibility
- L2 (Project Library) - Project-specific context, highest relevance
- L4 (Experience) - User patterns and hints, highest personalization
- L3 (Workshop) - Tool outputs, highest recency
Conflict Resolution β
typescript
interface ConflictResolution {
strategy: "precedence" | "merge" | "flag" | "user_choice";
confidence: number;
reasoning: string;
}
function resolveConflict(fragments: ContextFragment[]): ConflictResolution {
// Detect semantic conflicts
const conflicts = detectSemanticConflicts(fragments);
if (conflicts.length === 0) {
return { strategy: "merge", confidence: 1.0, reasoning: "No conflicts detected" };
}
// Apply precedence rules
if (conflicts.every(c => c.severity < 0.3)) {
return { strategy: "precedence", confidence: 0.8, reasoning: "Minor conflicts, use source precedence" };
}
// Flag for human review
return { strategy: "flag", confidence: 0.6, reasoning: "Major conflicts require human review" };
}
Budget Management β
Token Budget Enforcement β
typescript
class TokenBudgetManager {
selectWithinBudget(fragments: ContextFragment[], maxTokens: number): ContextFragment[] {
// Sort by priority score descending
const sorted = fragments.sort((a, b) => b.priority_score - a.priority_score);
const selected: ContextFragment[] = [];
let totalTokens = 0;
for (const fragment of sorted) {
if (totalTokens + fragment.token_cost <= maxTokens) {
selected.push(fragment);
totalTokens += fragment.token_cost;
}
}
// If budget allows, try to fill remaining space with atomic details
if (totalTokens < maxTokens * 0.9) {
const remainingBudget = maxTokens - totalTokens;
const atomicFragments = this.getAtomicFragments(selected, remainingBudget);
selected.push(...atomicFragments);
}
return selected;
}
}
Time Budget Management β
typescript
class TimeBudgetManager {
async assembleWithDeadline(request: MemoryRequest): Promise<AssembledContext> {
const deadline = Date.now() + request.budgets.time_ms;
// Reserve time for final assembly (20% of budget)
const queryDeadline = deadline - (request.budgets.time_ms * 0.2);
const sourcePromises = this.queryAllSources(request, queryDeadline);
// Wait for sources with timeout
const sourceResults = await Promise.allSettled(
sourcePromises.map(p => this.withTimeout(p, queryDeadline))
);
// Use successful results, log failures
const successfulResults = sourceResults
.filter((result): result is PromiseFulfilledResult<SourceResult> =>
result.status === 'fulfilled')
.map(result => result.value);
return this.assembleFromPartialResults(successfulResults, request, deadline);
}
}
Integration with L8 Evaluation β
Memory Quality Feedback β
typescript
interface MemoryQualitySignals {
coherence_score: number; // How well fragments fit together
completeness: number; // Coverage of required entities
contradiction_rate: number; // Conflicting information rate
user_satisfaction: number; // Implicit feedback from usage
}
// L8 sends feedback to improve memory policies
class MemoryFeedbackHandler {
async processFeedback(signals: MemoryQualitySignals): Promise<void> {
if (signals.contradiction_rate > 0.05) {
// Tighten conflict detection sensitivity
await this.adjustConflictDetection(1.2);
}
if (signals.completeness < 0.8) {
// Increase budget allocation for key entities
await this.adjustEntityPriorities(signals);
}
if (signals.coherence_score < 0.75) {
// Improve coherence scoring algorithm
await this.updateCoherenceWeights(signals);
}
}
}
Memory Optimization Actions β
- Consolidation: Merge related memories to reduce fragmentation
- Rebalancing: Adjust KV policy parameters based on access patterns
- Pruning: Remove contradictory or outdated information
- Enrichment: Request additional context for incomplete assemblies
Performance Characteristics β
SLA Targets (v0.1) β
- Assembly Latency: P95 < 500ms for typical requests
- Memory Utilization: < 80% of allocated memory budget
- Cache Hit Rate: > 60% for repeated queries
- Context Quality: Average coherence score > 0.80
Scaling Properties β
- Horizontal: Multiple assembly workers with shared memory store
- Vertical: Memory-optimized instances for large context assembly
- Caching: Multi-tier caching (Redis β Disk β S3)
- Sharding: Content sharding by domain or project for isolation
Error Handling & Degradation β
Graceful Degradation Strategies β
- Source Failures: Continue with available sources, flag missing coverage
- Budget Exhaustion: Return best-effort assembly with quality warnings
- Time Overruns: Return partial assembly with completion metadata
- Memory Pressure: Aggressive eviction with recovery notifications
Error Recovery β
typescript
class MemoryErrorHandler {
async handleAssemblyFailure(error: AssemblyError, request: MemoryRequest): Promise<AssembledContext> {
switch (error.type) {
case "SOURCE_TIMEOUT":
// Retry with reduced source set
return this.retryWithFallbacks(request, error.failed_sources);
case "BUDGET_EXCEEDED":
// Return partial assembly with warnings
return this.createPartialAssembly(request, error.partial_results);
case "COHERENCE_FAILURE":
// Fall back to simple concatenation
return this.fallbackAssembly(request);
default:
throw new SystemError(`Unrecoverable memory assembly failure: ${error.message}`);
}
}
}
Contracts (v0.1) β
Inputs
- MemoryRequest.v0: { request_id, intent, entities[], budgets{ tokens_max, time_ms, quality_min }, lod_preference, source_scope[], kv_policy? }
- SourceResults.v0 (internal): union of provider replies from L1/L2/L3/L4
Processing
- Query sources in parallel with per-source deadlines; prioritize by quality/recency/relevance
- Rank fragments and select within token/time budgets; ensure coherence; apply KV policies
Outputs
- AssembledContext.v0: { request_id, fragments[], metadata{ total_tokens, assembly_time_ms, source_coverage, quality_achieved } }
SLO / Targets
- Assembly p95 < 500ms; 0 token overruns; coherence score β₯ 0.8 average
Edge Cases
- SOURCE_TIMEOUT β assemble from partial; include warning codes
- BUDGET_EXCEEDED β partial assembly with explicit trims and metrics
- COHERENCE_FAILURE β fallback to simple concatenation
Resilience & Observability β
Resilience
- Timeout per source; partial acceptance; retries for idempotent reads only
- KV policy safeguards under memory pressure (aggressive eviction, compress)
Metrics
- memory_assembly_duration_seconds, memory_coherence_score, memory_cache_hit_ratio
- memory_source_latency{layer=L1|L2|L3|L4}, memory_source_errors_total
Tracing
- request_id spans across queryβrankβassemble; attributes: budgets, counts, LOD profile
Privacy & Security β
Controls
- Enforce privacy_mode on all fragments before assembly; no raw sensitive text persisted in hot caches
- Conflict/pruning rules remove contradictory or sensitive fragments
Access
- Least-privilege for reads/writes; encrypt at rest/in transit; segregate sensitive tiers
Implementation Roadmap β
Phase 1: Core Assembly (v0.1) β
- Basic context assembly from L1/L2 sources
- Simple token budget enforcement
- Basic KV policies (pin/evict only)
- Integration with orchestration layer
Phase 2: Advanced Memory (v0.2) β
- Full source prioritization framework
- Compression and tiered storage
- Conflict detection and resolution
- L8 feedback integration
Phase 3: Optimization (v1.0) β
- Machine learning-based priority scoring
- Predictive pre-assembly
- Advanced consolidation algorithms
- Real-time memory optimization
Documentation Index β
- Context Assembler β Budget-aware context assembly algorithms
- KV Policies β Memory management and source prioritization
- Error Handling β Degradation strategies and error recovery
- L8 Integration β Memory quality optimization
Status: Comprehensive specification ready for implementation Dependencies: L1-L4 provider APIs, L8 evaluation signals, orchestration budget contracts