Skip to content

Experience Retrieval ​

Online retrieval returns a few high-signal Experience Units within strict time/token budgets.

Inputs ​

  • intent (string)
  • budgets: { tokens_max, time_ms, experience_top_k? }
  • privacy_mode: allow | redact | block
  • optional filters: entities[], recency window, channel, actor.type

Scoring (hybrid) ​

  • vector_score(summary_embedding, intent_embedding)
  • lexical_score(summary, intent terms)
  • recency_boost(decay by recency_ts_ms)
  • entity_overlap_boost(entities ∩ query_entities)
  • final score: weighted sum with monotonic boosts; calibrated to keep p95 latency < 8 ms

Degradation path ​

  1. Lower top_k (e.g., 10 β†’ 5 β†’ 3)
  2. Simplify scoring (skip lexical or entity boosts)
  3. Return minimal (IDs/metadata only if privacy == block)

Reply shape (render_context_reply.v0) ​

  • slices.experience[]: { unit_id, summary?, refs?, score, reason? }
  • summary omitted if privacy.mode == block
  • reason: short justification: "entities overlap: ent.parser; recent; high lexical"

Errors and retry ​

  • error.options[] may include:
    • retry_with_reduced_scope (reduce top_k / increase deadline)
    • skip_experience_path
    • return_minimal

Observability ​

  • Log: request_id, final_top_k, effective_time_ms, scores distribution
  • Metrics: p50/p95 latency, Recall@K (offline), coverage_entities, redaction rate, zero-leak guarantee

See also:

  • ./contracts-registry.md
  • ./README.md
  • ../adapters/http-adapter.md

Debug alias (P1 optional) ​

  • GET /experience/search (authenticated, non-critical path)
    • Query: intent, top_k?, filters (entities, recency, channel)
    • Purpose: diagnose relevance and latency of L4 in isolation
    • Note: same scoring and privacy enforcement as render path

Examples ​

Request (querystring)

GET /experience/search?intent=refactor_function&top_k=5&entities=ent.parser,ent.ast&recency_days=30

Response (JSON)

json
{
  "request_id": "req-debug-001",
  "latency_ms": 4,
  "experience": [
    {
      "unit_id": "6d1c3de7-59a6-4a6a-8238-f0f6dd67f0b9",
      "summary": "Refactored parse() into parse_v2 with improved error handling; covered edge cases.",
      "refs": ["doc:design/parser.md"],
      "score": 0.81,
      "reason": "entities overlap: ent.parser, ent.ast; recent"
    }
  ]
}

Comprehensive Testing Framework ​

Retrieval Performance Testing ​

typescript
describe('L4 Experience Retrieval Tests', () => {
  describe('Retrieval Latency SLA', () => {
    test('meets p95 < 8ms retrieval latency', async () => {
      const query = {
        intent: 'implement database connection pooling',
        budgets: { tokens_max: 500, time_ms: 8, experience_top_k: 5 },
        privacy_mode: 'allow',
        filters: { entities: ['database', 'connection'], recency_days: 30 }
      };
      
      const latencies = [];
      for (let i = 0; i < 100; i++) {
        const start = Date.now();
        await experienceRetrieval.retrieve(query);
        latencies.push(Date.now() - start);
      }
      
      const p95_latency = latencies.sort((a, b) => a - b)[94]; // 95th percentile
      const p50_latency = latencies.sort((a, b) => a - b)[49]; // 50th percentile
      
      expect(p95_latency).toBeLessThan(8);
      expect(p50_latency).toBeLessThan(4);
    });
    
    test('hybrid scoring performs within budget', async () => {
      const complex_query = {
        intent: 'optimize React component re-rendering with useMemo and useCallback',
        budgets: { tokens_max: 300, time_ms: 6 },
        filters: { entities: ['react', 'performance', 'optimization'] }
      };
      
      const start = Date.now();
      const results = await experienceRetrieval.retrieve(complex_query);
      const total_time = Date.now() - start;
      
      expect(total_time).toBeLessThan(6);
      expect(results.units.length).toBeGreaterThan(0);
      expect(results.units[0]).toHaveProperty('score');
      expect(results.units[0].score).toBeGreaterThan(0.5);
    });
  });
  
  describe('Scoring Algorithm Validation', () => {
    test('vector similarity scoring', async () => {
      const vector_query = {
        intent: 'authentication middleware implementation',
        entities: ['auth', 'middleware', 'jwt']
      };
      
      const results = await experienceRetrieval.retrieve(vector_query);
      
      // Validate that results are ranked by relevance
      for (let i = 1; i < results.units.length; i++) {
        expect(results.units[i-1].score).toBeGreaterThanOrEqual(results.units[i].score);
      }
    });
    
    test('recency boost application', async () => {
      const recent_query = {
        intent: 'database migration',
        filters: { recency_days: 7 }
      };
      
      const results = await experienceRetrieval.retrieve(recent_query);
      
      // Recent items should have higher scores
      const recent_items = results.units.filter(unit => 
        Date.now() - unit.recency_ts_ms < 7 * 24 * 60 * 60 * 1000
      );
      
      expect(recent_items.length).toBeGreaterThan(0);
      expect(recent_items[0].score).toBeGreaterThan(0.7);
    });
    
    test('entity overlap boosting', async () => {
      const entity_query = {
        intent: 'implement API rate limiting',
        entities: ['api', 'rate-limiting', 'middleware'],
        filters: { top_k: 3 }
      };
      
      const results = await experienceRetrieval.retrieve(entity_query);
      
      // Top results should have entity overlap
      expect(results.units[0].reason).toContain('entities overlap');
      expect(results.units[0].entity_overlap_score).toBeGreaterThan(0.6);
    });
  });
  
  describe('Privacy Mode Handling', () => {
    test('redact mode filters PII from summaries', async () => {
      const redact_query = {
        intent: 'user account setup',
        privacy_mode: 'redact'
      };
      
      const results = await experienceRetrieval.retrieve(redact_query);
      
      results.units.forEach(unit => {
        expect(unit.summary).not.toMatch(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/); // No emails
        expect(unit.summary).not.toMatch(/\b\d{3}-\d{2}-\d{4}\b/); // No SSNs
        expect(unit.privacy_redacted).toBe(true);
      });
    });
    
    test('block mode returns metadata only', async () => {
      const block_query = {
        intent: 'sensitive data processing',
        privacy_mode: 'block'
      };
      
      const results = await experienceRetrieval.retrieve(block_query);
      
      results.units.forEach(unit => {
        expect(unit.summary).toBeUndefined(); // No summary text
        expect(unit.unit_id).toBeDefined();
        expect(unit.refs).toBeDefined();
        expect(unit.score).toBeDefined();
      });
    });
  });
});

Production Load Testing ​

typescript
describe('L4 Retrieval Production Load Tests', () => {
  test('handles concurrent retrieval requests', async () => {
    const concurrent_queries = Array.from({ length: 100 }, (_, i) => ({
      intent: `test query ${i}`,
      budgets: { time_ms: 8, experience_top_k: 5 },
      privacy_mode: 'allow'
    }));
    
    const start_time = Date.now();
    const results = await Promise.allSettled(
      concurrent_queries.map(query => experienceRetrieval.retrieve(query))
    );
    const total_time = Date.now() - start_time;
    
    const successful = results.filter(r => r.status === 'fulfilled');
    const success_rate = successful.length / results.length;
    
    expect(success_rate).toBeGreaterThan(0.95); // 95%+ success rate
    expect(total_time).toBeLessThan(5000); // 100 concurrent in < 5s
  });
  
  test('degradation path under resource pressure', async () => {
    const high_load_query = {
      intent: 'complex system architecture optimization',
      budgets: { time_ms: 2, experience_top_k: 10 }, // Tight constraints
      filters: { entities: ['architecture', 'performance', 'optimization'] }
    };
    
    const results = await experienceRetrieval.retrieve(high_load_query);
    
    // Should degrade gracefully
    expect(results.degradation_applied).toBe(true);
    expect(results.effective_top_k).toBeLessThanOrEqual(5); // Reduced from 10
    expect(results.stats.t_ms).toBeLessThan(3); // Within budget + margin
  });
});

Production Operations Framework ​

Retrieval Performance Monitoring ​

yaml
experience_retrieval_monitoring:
  metrics:
    # Core performance metrics
    - name: l4_retrieval_latency_seconds
      type: histogram
      buckets: [0.001, 0.003, 0.005, 0.008, 0.015, 0.030]
      labels: [query_complexity, privacy_mode, degradation_applied]
      
    - name: l4_retrieval_throughput_qps
      type: gauge
      labels: [instance_id]
      
    # Scoring and relevance metrics
    - name: l4_hybrid_scoring_distribution
      type: histogram
      buckets: [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
      
    - name: l4_entity_overlap_effectiveness
      type: gauge
      labels: [overlap_range]
      
    # Resource utilization
    - name: l4_degradation_trigger_rate
      type: counter
      labels: [degradation_type]
      
  alerts:
    - name: L4RetrievalLatencyP95High
      condition: l4_retrieval_latency_seconds{quantile="0.95"} > 0.008
      severity: critical
      duration: 1m
      
    - name: L4RetrievalLatencyP50High  
      condition: l4_retrieval_latency_seconds{quantile="0.50"} > 0.004
      severity: warning
      duration: 2m
      
    - name: L4DegradationRateHigh
      condition: rate(l4_degradation_trigger_rate[5m]) > 0.1
      severity: warning

Deployment & Scaling Configuration ​

yaml
l4_retrieval_deployment:
  services:
    retrieval_engine:
      image: mnemoverse/l4-retrieval:latest
      replicas: 5
      resources:
        requests: { cpu: 300m, memory: 768Mi }
        limits: { cpu: 1, memory: 2Gi }
      
    vector_search:
      image: mnemoverse/vector-search:latest
      replicas: 3
      resources:
        requests: { cpu: 500m, memory: 1Gi }
        
    hybrid_scorer:
      image: mnemoverse/hybrid-scorer:latest
      replicas: 2
      resources:
        requests: { cpu: 200m, memory: 512Mi }
        
  autoscaling:
    retrieval_engine:
      min_replicas: 3
      max_replicas: 15
      target_cpu_utilization: 70
      target_latency_p95_ms: 6
      
  health_checks:
    retrieval_health:
      path: "/health/retrieval"
      interval_seconds: 10
      timeout_seconds: 2
      failure_threshold: 3

Error Handling & Recovery ​

Comprehensive Error Responses ​

typescript
interface RetrievalError {
  code: 'TIMEOUT' | 'DEGRADATION_FAILED' | 'INDEX_UNAVAILABLE' | 'PRIVACY_VIOLATION';
  message: string;
  details?: {
    effective_timeout_ms?: number;
    degradation_steps_applied?: string[];
    retry_suggestions?: string[];
  };
  stats: {
    partial_results: number;
    time_elapsed_ms: number;
  };
}

const error_handling_examples = {
  timeout_with_partial_results: {
    error: {
      code: 'TIMEOUT',
      message: 'Retrieval exceeded time budget but returning partial results',
      details: {
        effective_timeout_ms: 8,
        degradation_steps_applied: ['reduced_top_k', 'simplified_scoring'],
        retry_suggestions: ['Increase time budget', 'Reduce top_k', 'Use simpler filters']
      }
    },
    stats: { partial_results: 3, time_elapsed_ms: 9 },
    units: [/* partial results */]
  }
};