Skip to content

Memory Layer (L5) ​

Purpose: Intelligent context assembly and memory management with budget-aware policies and cross-layer coordination.

Layer Position: L5 - Central memory coordination between data layers (L1-L4) and orchestration (L6)

Functional Overview ​

Memory Layer serves as the intelligent memory subsystem that:

  • Assembles context from multiple sources with budget constraints
  • Manages memory lifecycle using KV policies (pin/compress/evict)
  • Prioritizes sources based on quality, recency, and relevance
  • Handles degradation gracefully when budgets or sources fail
  • Coordinates with L8 for memory quality optimization

Core Responsibilities ​

1. Context Assembly ​

  • Aggregate relevant information from L1 (Noosphere), L2 (Project Library), L3 (Workshop), L4 (Experience)
  • Apply Level-of-Detail (LOD) filtering: macro β†’ micro β†’ atomic
  • Enforce token and time budgets from orchestration requests
  • Maintain coherence and avoid contradictions in assembled context

2. Memory Management ​

  • Pin: Keep frequently accessed or critical context in fast storage
  • Compress: Reduce storage footprint of less-accessed but valuable context
  • Evict: Remove stale, low-value, or budget-exceeding context
  • Consolidate: Merge related memories to reduce fragmentation

3. Source Prioritization ​

  • Rank sources by quality signals from L8 evaluation
  • Consider recency, credibility, and user feedback
  • Apply domain-specific weighting (code vs docs vs research)
  • Handle source conflicts through precedence rules

4. Budget Enforcement ​

  • Token Budget: Ensure assembled context fits within LLM context windows
  • Time Budget: Complete assembly within orchestration deadlines
  • Quality Budget: Maintain minimum relevance and coherence thresholds
  • Cost Budget: Optimize for computational and storage efficiency

Architecture Components ​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     L5: Memory Layer                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   Context    β”‚   KV Policy      β”‚     Budget               β”‚
β”‚   Assembler  β”‚   Engine         β”‚     Manager              β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                 β”‚                   β”‚
       β–Ό Assembly        β–Ό Memory Ops        β–Ό Constraints
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Sources: L1 Noosphere β”‚ L2 Project β”‚ L3 Workshop β”‚ L4 Exp  β”‚
β”‚          search resultsβ”‚ lib contentβ”‚ tool outputsβ”‚ hints   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β–²                                                     β–²
       β”‚ Assembled Context                       Feedback     β”‚
       β–Ό                                                     β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           L6: Orchestration (CEO/ACS/HCS)                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β–²
                                   β”‚ Quality Signals
                                   β–Ό
                           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                           β”‚  L8: Evaluation   β”‚
                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Context Assembly Process ​

1. Request Analysis ​

typescript
interface MemoryRequest {
  request_id: string;
  intent: string;
  entities: string[];
  budgets: {
    tokens_max: number;      // Hard limit for assembled context
    time_ms: number;         // Assembly deadline
    quality_min: number;    // Minimum relevance threshold
  };
  lod_preference: "macro" | "micro" | "atomic" | "adaptive";
  source_scope: string[];   // Which layers to query
  kv_policy?: KVPolicy;     // Override default memory policies
}

2. Source Coordination ​

typescript
interface SourceQuery {
  layer: "L1" | "L2" | "L3" | "L4";
  query: string;
  entities: string[];
  max_results: number;
  deadline_ms: number;
  lod: string;
}

// Parallel queries to multiple layers
const sourceQueries = [
  { layer: "L1", query: intent, max_results: 15, lod: "macro" },
  { layer: "L2", query: intent, max_results: 10, lod: "micro" },
  { layer: "L4", query: intent, max_results: 5, lod: "adaptive" }
];

3. Content Prioritization ​

typescript
interface ContextFragment {
  id: string;
  content: string;
  source_layer: string;
  lod: "macro" | "micro" | "atomic";
  priority_score: number;  // Computed priority (0.0-1.0)
  token_cost: number;
  metadata: {
    relevance: number;     // Relevance to query
    recency: number;       // How recent is the content
    credibility: number;   // Source credibility from L8
    coherence: number;     // Fits well with other fragments
  };
}

4. Budget-Aware Assembly ​

typescript
class ContextAssembler {
  async assembleContext(request: MemoryRequest): Promise<AssembledContext> {
    // 1. Query all sources in parallel
    const sourceResults = await this.queryAllSources(request);
    
    // 2. Prioritize and score fragments
    const fragments = this.prioritizeFragments(sourceResults, request);
    
    // 3. Select optimal subset within budget
    const selected = this.selectWithinBudget(fragments, request.budgets);
    
    // 4. Check coherence and resolve conflicts
    const coherent = await this.ensureCoherence(selected);
    
    // 5. Apply KV policy for future access
    await this.applyKVPolicy(coherent, request.kv_policy);
    
    return {
      request_id: request.request_id,
      fragments: coherent,
      metadata: {
        total_tokens: coherent.reduce((sum, f) => sum + f.token_cost, 0),
        assembly_time_ms: Date.now() - startTime,
        source_coverage: this.computeCoverage(coherent),
        quality_achieved: this.computeQuality(coherent)
      }
    };
  }
}

KV Policy Engine ​

Memory Categories ​

  • Hot: Frequently accessed, keep in fast storage (Redis/Memory)
  • Warm: Occasionally accessed, compressed storage (Disk/S3)
  • Cold: Rarely accessed, archive storage (S3 Glacier)
  • Frozen: Historical/audit, minimal access (S3 Deep Archive)

Policy Operations ​

typescript
interface KVPolicy {
  pin: {
    patterns: string[];      // Content patterns to always keep hot
    max_size_mb: number;     // Maximum size for pinned content
    ttl_hours: number;       // Time-to-live for pinned items
  };
  compress: {
    age_threshold_hours: number;  // Compress after this age
    access_threshold: number;     // Compress if accessed < N times
    compression_ratio: number;    // Target compression ratio
  };
  evict: {
    max_total_size_gb: number;   // Total memory budget
    lru_weight: number;          // Weight for LRU eviction
    quality_weight: number;      // Weight for quality-based eviction
  };
}

Default Policies by Content Type ​

typescript
const DEFAULT_POLICIES: Record<string, KVPolicy> = {
  "code_snippets": {
    pin: { patterns: ["*.js", "*.ts", "*.py"], max_size_mb: 100, ttl_hours: 24 },
    compress: { age_threshold_hours: 6, access_threshold: 3, compression_ratio: 0.7 },
    evict: { max_total_size_gb: 2, lru_weight: 0.6, quality_weight: 0.4 }
  },
  "documentation": {
    pin: { patterns: ["README*", "*.md"], max_size_mb: 50, ttl_hours: 12 },
    compress: { age_threshold_hours: 12, access_threshold: 2, compression_ratio: 0.8 },
    evict: { max_total_size_gb: 1, lru_weight: 0.4, quality_weight: 0.6 }
  },
  "research_papers": {
    pin: { patterns: ["*.pdf", "abstract*"], max_size_mb: 200, ttl_hours: 48 },
    compress: { age_threshold_hours: 24, access_threshold: 1, compression_ratio: 0.9 },
    evict: { max_total_size_gb: 5, lru_weight: 0.2, quality_weight: 0.8 }
  }
};

Source Prioritization Framework ​

Priority Scoring Algorithm ​

typescript
function computePriorityScore(fragment: ContextFragment, context: AssemblyContext): number {
  const relevanceScore = computeRelevance(fragment.content, context.query);
  const recencyScore = computeRecencyScore(fragment.metadata.created_at);
  const credibilityScore = fragment.metadata.credibility; // From L8
  const coherenceScore = computeCoherence(fragment, context.existingFragments);
  
  // Weighted combination with domain-specific weights
  const weights = context.domain_weights || {
    relevance: 0.4,
    recency: 0.2,
    credibility: 0.25,
    coherence: 0.15
  };
  
  return (
    relevanceScore * weights.relevance +
    recencyScore * weights.recency +
    credibilityScore * weights.credibility +
    coherenceScore * weights.coherence
  );
}

Source Precedence Rules ​

  1. L1 (Noosphere) - Global knowledge, highest credibility
  2. L2 (Project Library) - Project-specific context, highest relevance
  3. L4 (Experience) - User patterns and hints, highest personalization
  4. L3 (Workshop) - Tool outputs, highest recency

Conflict Resolution ​

typescript
interface ConflictResolution {
  strategy: "precedence" | "merge" | "flag" | "user_choice";
  confidence: number;
  reasoning: string;
}

function resolveConflict(fragments: ContextFragment[]): ConflictResolution {
  // Detect semantic conflicts
  const conflicts = detectSemanticConflicts(fragments);
  
  if (conflicts.length === 0) {
    return { strategy: "merge", confidence: 1.0, reasoning: "No conflicts detected" };
  }
  
  // Apply precedence rules
  if (conflicts.every(c => c.severity < 0.3)) {
    return { strategy: "precedence", confidence: 0.8, reasoning: "Minor conflicts, use source precedence" };
  }
  
  // Flag for human review
  return { strategy: "flag", confidence: 0.6, reasoning: "Major conflicts require human review" };
}

Budget Management ​

Token Budget Enforcement ​

typescript
class TokenBudgetManager {
  selectWithinBudget(fragments: ContextFragment[], maxTokens: number): ContextFragment[] {
    // Sort by priority score descending
    const sorted = fragments.sort((a, b) => b.priority_score - a.priority_score);
    
    const selected: ContextFragment[] = [];
    let totalTokens = 0;
    
    for (const fragment of sorted) {
      if (totalTokens + fragment.token_cost <= maxTokens) {
        selected.push(fragment);
        totalTokens += fragment.token_cost;
      }
    }
    
    // If budget allows, try to fill remaining space with atomic details
    if (totalTokens < maxTokens * 0.9) {
      const remainingBudget = maxTokens - totalTokens;
      const atomicFragments = this.getAtomicFragments(selected, remainingBudget);
      selected.push(...atomicFragments);
    }
    
    return selected;
  }
}

Time Budget Management ​

typescript
class TimeBudgetManager {
  async assembleWithDeadline(request: MemoryRequest): Promise<AssembledContext> {
    const deadline = Date.now() + request.budgets.time_ms;
    
    // Reserve time for final assembly (20% of budget)
    const queryDeadline = deadline - (request.budgets.time_ms * 0.2);
    
    const sourcePromises = this.queryAllSources(request, queryDeadline);
    
    // Wait for sources with timeout
    const sourceResults = await Promise.allSettled(
      sourcePromises.map(p => this.withTimeout(p, queryDeadline))
    );
    
    // Use successful results, log failures
    const successfulResults = sourceResults
      .filter((result): result is PromiseFulfilledResult<SourceResult> => 
        result.status === 'fulfilled')
      .map(result => result.value);
    
    return this.assembleFromPartialResults(successfulResults, request, deadline);
  }
}

Integration with L8 Evaluation ​

Memory Quality Feedback ​

typescript
interface MemoryQualitySignals {
  coherence_score: number;        // How well fragments fit together
  completeness: number;           // Coverage of required entities
  contradiction_rate: number;     // Conflicting information rate
  user_satisfaction: number;      // Implicit feedback from usage
}

// L8 sends feedback to improve memory policies
class MemoryFeedbackHandler {
  async processFeedback(signals: MemoryQualitySignals): Promise<void> {
    if (signals.contradiction_rate > 0.05) {
      // Tighten conflict detection sensitivity
      await this.adjustConflictDetection(1.2);
    }
    
    if (signals.completeness < 0.8) {
      // Increase budget allocation for key entities
      await this.adjustEntityPriorities(signals);
    }
    
    if (signals.coherence_score < 0.75) {
      // Improve coherence scoring algorithm
      await this.updateCoherenceWeights(signals);
    }
  }
}

Memory Optimization Actions ​

  • Consolidation: Merge related memories to reduce fragmentation
  • Rebalancing: Adjust KV policy parameters based on access patterns
  • Pruning: Remove contradictory or outdated information
  • Enrichment: Request additional context for incomplete assemblies

Performance Characteristics ​

SLA Targets (v0.1) ​

  • Assembly Latency: P95 < 500ms for typical requests
  • Memory Utilization: < 80% of allocated memory budget
  • Cache Hit Rate: > 60% for repeated queries
  • Context Quality: Average coherence score > 0.80

Scaling Properties ​

  • Horizontal: Multiple assembly workers with shared memory store
  • Vertical: Memory-optimized instances for large context assembly
  • Caching: Multi-tier caching (Redis β†’ Disk β†’ S3)
  • Sharding: Content sharding by domain or project for isolation

Error Handling & Degradation ​

Graceful Degradation Strategies ​

  1. Source Failures: Continue with available sources, flag missing coverage
  2. Budget Exhaustion: Return best-effort assembly with quality warnings
  3. Time Overruns: Return partial assembly with completion metadata
  4. Memory Pressure: Aggressive eviction with recovery notifications

Error Recovery ​

typescript
class MemoryErrorHandler {
  async handleAssemblyFailure(error: AssemblyError, request: MemoryRequest): Promise<AssembledContext> {
    switch (error.type) {
      case "SOURCE_TIMEOUT":
        // Retry with reduced source set
        return this.retryWithFallbacks(request, error.failed_sources);
        
      case "BUDGET_EXCEEDED":
        // Return partial assembly with warnings
        return this.createPartialAssembly(request, error.partial_results);
        
      case "COHERENCE_FAILURE":
        // Fall back to simple concatenation
        return this.fallbackAssembly(request);
        
      default:
        throw new SystemError(`Unrecoverable memory assembly failure: ${error.message}`);
    }
  }
}

Contracts (v0.1) ​

Inputs

  • MemoryRequest.v0: { request_id, intent, entities[], budgets{ tokens_max, time_ms, quality_min }, lod_preference, source_scope[], kv_policy? }
  • SourceResults.v0 (internal): union of provider replies from L1/L2/L3/L4

Processing

  • Query sources in parallel with per-source deadlines; prioritize by quality/recency/relevance
  • Rank fragments and select within token/time budgets; ensure coherence; apply KV policies

Outputs

  • AssembledContext.v0: { request_id, fragments[], metadata{ total_tokens, assembly_time_ms, source_coverage, quality_achieved } }

SLO / Targets

  • Assembly p95 < 500ms; 0 token overruns; coherence score β‰₯ 0.8 average

Edge Cases

  • SOURCE_TIMEOUT β†’ assemble from partial; include warning codes
  • BUDGET_EXCEEDED β†’ partial assembly with explicit trims and metrics
  • COHERENCE_FAILURE β†’ fallback to simple concatenation

Resilience & Observability ​

Resilience

  • Timeout per source; partial acceptance; retries for idempotent reads only
  • KV policy safeguards under memory pressure (aggressive eviction, compress)

Metrics

  • memory_assembly_duration_seconds, memory_coherence_score, memory_cache_hit_ratio
  • memory_source_latency{layer=L1|L2|L3|L4}, memory_source_errors_total

Tracing

  • request_id spans across queryβ†’rankβ†’assemble; attributes: budgets, counts, LOD profile

Privacy & Security ​

Controls

  • Enforce privacy_mode on all fragments before assembly; no raw sensitive text persisted in hot caches
  • Conflict/pruning rules remove contradictory or sensitive fragments

Access

  • Least-privilege for reads/writes; encrypt at rest/in transit; segregate sensitive tiers

Implementation Roadmap ​

Phase 1: Core Assembly (v0.1) ​

  • Basic context assembly from L1/L2 sources
  • Simple token budget enforcement
  • Basic KV policies (pin/evict only)
  • Integration with orchestration layer

Phase 2: Advanced Memory (v0.2) ​

  • Full source prioritization framework
  • Compression and tiered storage
  • Conflict detection and resolution
  • L8 feedback integration

Phase 3: Optimization (v1.0) ​

  • Machine learning-based priority scoring
  • Predictive pre-assembly
  • Advanced consolidation algorithms
  • Real-time memory optimization

Documentation Index ​


Status: Comprehensive specification ready for implementation Dependencies: L1-L4 provider APIs, L8 evaluation signals, orchestration budget contracts