Experience Retrieval β
Online retrieval returns a few high-signal Experience Units within strict time/token budgets.
Inputs β
- intent (string)
- budgets:
{ tokens_max, time_ms, experience_top_k? }
- privacy_mode: allow | redact | block
- optional filters: entities[], recency window, channel, actor.type
Scoring (hybrid) β
- vector_score(summary_embedding, intent_embedding)
- lexical_score(summary, intent terms)
- recency_boost(decay by recency_ts_ms)
- entity_overlap_boost(entities β© query_entities)
- final score: weighted sum with monotonic boosts; calibrated to keep p95 latency < 8 ms
Degradation path β
- Lower top_k (e.g., 10 β 5 β 3)
- Simplify scoring (skip lexical or entity boosts)
- Return minimal (IDs/metadata only if privacy == block)
Reply shape (render_context_reply.v0) β
- slices.experience[]:
{ unit_id, summary?, refs?, score, reason? }
- summary omitted if privacy.mode == block
- reason: short justification: "entities overlap: ent.parser; recent; high lexical"
Errors and retry β
- error.options[] may include:
- retry_with_reduced_scope (reduce top_k / increase deadline)
- skip_experience_path
- return_minimal
Observability β
- Log: request_id, final_top_k, effective_time_ms, scores distribution
- Metrics: p50/p95 latency, Recall@K (offline), coverage_entities, redaction rate, zero-leak guarantee
See also:
- ./contracts-registry.md
- ./README.md
- ../adapters/http-adapter.md
Debug alias (P1 optional) β
- GET /experience/search (authenticated, non-critical path)
- Query: intent, top_k?, filters (entities, recency, channel)
- Purpose: diagnose relevance and latency of L4 in isolation
- Note: same scoring and privacy enforcement as render path
Examples β
Request (querystring)
GET /experience/search?intent=refactor_function&top_k=5&entities=ent.parser,ent.ast&recency_days=30
Response (JSON)
json
{
"request_id": "req-debug-001",
"latency_ms": 4,
"experience": [
{
"unit_id": "6d1c3de7-59a6-4a6a-8238-f0f6dd67f0b9",
"summary": "Refactored parse() into parse_v2 with improved error handling; covered edge cases.",
"refs": ["doc:design/parser.md"],
"score": 0.81,
"reason": "entities overlap: ent.parser, ent.ast; recent"
}
]
}
Comprehensive Testing Framework β
Retrieval Performance Testing β
typescript
describe('L4 Experience Retrieval Tests', () => {
describe('Retrieval Latency SLA', () => {
test('meets p95 < 8ms retrieval latency', async () => {
const query = {
intent: 'implement database connection pooling',
budgets: { tokens_max: 500, time_ms: 8, experience_top_k: 5 },
privacy_mode: 'allow',
filters: { entities: ['database', 'connection'], recency_days: 30 }
};
const latencies = [];
for (let i = 0; i < 100; i++) {
const start = Date.now();
await experienceRetrieval.retrieve(query);
latencies.push(Date.now() - start);
}
const p95_latency = latencies.sort((a, b) => a - b)[94]; // 95th percentile
const p50_latency = latencies.sort((a, b) => a - b)[49]; // 50th percentile
expect(p95_latency).toBeLessThan(8);
expect(p50_latency).toBeLessThan(4);
});
test('hybrid scoring performs within budget', async () => {
const complex_query = {
intent: 'optimize React component re-rendering with useMemo and useCallback',
budgets: { tokens_max: 300, time_ms: 6 },
filters: { entities: ['react', 'performance', 'optimization'] }
};
const start = Date.now();
const results = await experienceRetrieval.retrieve(complex_query);
const total_time = Date.now() - start;
expect(total_time).toBeLessThan(6);
expect(results.units.length).toBeGreaterThan(0);
expect(results.units[0]).toHaveProperty('score');
expect(results.units[0].score).toBeGreaterThan(0.5);
});
});
describe('Scoring Algorithm Validation', () => {
test('vector similarity scoring', async () => {
const vector_query = {
intent: 'authentication middleware implementation',
entities: ['auth', 'middleware', 'jwt']
};
const results = await experienceRetrieval.retrieve(vector_query);
// Validate that results are ranked by relevance
for (let i = 1; i < results.units.length; i++) {
expect(results.units[i-1].score).toBeGreaterThanOrEqual(results.units[i].score);
}
});
test('recency boost application', async () => {
const recent_query = {
intent: 'database migration',
filters: { recency_days: 7 }
};
const results = await experienceRetrieval.retrieve(recent_query);
// Recent items should have higher scores
const recent_items = results.units.filter(unit =>
Date.now() - unit.recency_ts_ms < 7 * 24 * 60 * 60 * 1000
);
expect(recent_items.length).toBeGreaterThan(0);
expect(recent_items[0].score).toBeGreaterThan(0.7);
});
test('entity overlap boosting', async () => {
const entity_query = {
intent: 'implement API rate limiting',
entities: ['api', 'rate-limiting', 'middleware'],
filters: { top_k: 3 }
};
const results = await experienceRetrieval.retrieve(entity_query);
// Top results should have entity overlap
expect(results.units[0].reason).toContain('entities overlap');
expect(results.units[0].entity_overlap_score).toBeGreaterThan(0.6);
});
});
describe('Privacy Mode Handling', () => {
test('redact mode filters PII from summaries', async () => {
const redact_query = {
intent: 'user account setup',
privacy_mode: 'redact'
};
const results = await experienceRetrieval.retrieve(redact_query);
results.units.forEach(unit => {
expect(unit.summary).not.toMatch(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/); // No emails
expect(unit.summary).not.toMatch(/\b\d{3}-\d{2}-\d{4}\b/); // No SSNs
expect(unit.privacy_redacted).toBe(true);
});
});
test('block mode returns metadata only', async () => {
const block_query = {
intent: 'sensitive data processing',
privacy_mode: 'block'
};
const results = await experienceRetrieval.retrieve(block_query);
results.units.forEach(unit => {
expect(unit.summary).toBeUndefined(); // No summary text
expect(unit.unit_id).toBeDefined();
expect(unit.refs).toBeDefined();
expect(unit.score).toBeDefined();
});
});
});
});
Production Load Testing β
typescript
describe('L4 Retrieval Production Load Tests', () => {
test('handles concurrent retrieval requests', async () => {
const concurrent_queries = Array.from({ length: 100 }, (_, i) => ({
intent: `test query ${i}`,
budgets: { time_ms: 8, experience_top_k: 5 },
privacy_mode: 'allow'
}));
const start_time = Date.now();
const results = await Promise.allSettled(
concurrent_queries.map(query => experienceRetrieval.retrieve(query))
);
const total_time = Date.now() - start_time;
const successful = results.filter(r => r.status === 'fulfilled');
const success_rate = successful.length / results.length;
expect(success_rate).toBeGreaterThan(0.95); // 95%+ success rate
expect(total_time).toBeLessThan(5000); // 100 concurrent in < 5s
});
test('degradation path under resource pressure', async () => {
const high_load_query = {
intent: 'complex system architecture optimization',
budgets: { time_ms: 2, experience_top_k: 10 }, // Tight constraints
filters: { entities: ['architecture', 'performance', 'optimization'] }
};
const results = await experienceRetrieval.retrieve(high_load_query);
// Should degrade gracefully
expect(results.degradation_applied).toBe(true);
expect(results.effective_top_k).toBeLessThanOrEqual(5); // Reduced from 10
expect(results.stats.t_ms).toBeLessThan(3); // Within budget + margin
});
});
Production Operations Framework β
Retrieval Performance Monitoring β
yaml
experience_retrieval_monitoring:
metrics:
# Core performance metrics
- name: l4_retrieval_latency_seconds
type: histogram
buckets: [0.001, 0.003, 0.005, 0.008, 0.015, 0.030]
labels: [query_complexity, privacy_mode, degradation_applied]
- name: l4_retrieval_throughput_qps
type: gauge
labels: [instance_id]
# Scoring and relevance metrics
- name: l4_hybrid_scoring_distribution
type: histogram
buckets: [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
- name: l4_entity_overlap_effectiveness
type: gauge
labels: [overlap_range]
# Resource utilization
- name: l4_degradation_trigger_rate
type: counter
labels: [degradation_type]
alerts:
- name: L4RetrievalLatencyP95High
condition: l4_retrieval_latency_seconds{quantile="0.95"} > 0.008
severity: critical
duration: 1m
- name: L4RetrievalLatencyP50High
condition: l4_retrieval_latency_seconds{quantile="0.50"} > 0.004
severity: warning
duration: 2m
- name: L4DegradationRateHigh
condition: rate(l4_degradation_trigger_rate[5m]) > 0.1
severity: warning
Deployment & Scaling Configuration β
yaml
l4_retrieval_deployment:
services:
retrieval_engine:
image: mnemoverse/l4-retrieval:latest
replicas: 5
resources:
requests: { cpu: 300m, memory: 768Mi }
limits: { cpu: 1, memory: 2Gi }
vector_search:
image: mnemoverse/vector-search:latest
replicas: 3
resources:
requests: { cpu: 500m, memory: 1Gi }
hybrid_scorer:
image: mnemoverse/hybrid-scorer:latest
replicas: 2
resources:
requests: { cpu: 200m, memory: 512Mi }
autoscaling:
retrieval_engine:
min_replicas: 3
max_replicas: 15
target_cpu_utilization: 70
target_latency_p95_ms: 6
health_checks:
retrieval_health:
path: "/health/retrieval"
interval_seconds: 10
timeout_seconds: 2
failure_threshold: 3
Error Handling & Recovery β
Comprehensive Error Responses β
typescript
interface RetrievalError {
code: 'TIMEOUT' | 'DEGRADATION_FAILED' | 'INDEX_UNAVAILABLE' | 'PRIVACY_VIOLATION';
message: string;
details?: {
effective_timeout_ms?: number;
degradation_steps_applied?: string[];
retry_suggestions?: string[];
};
stats: {
partial_results: number;
time_elapsed_ms: number;
};
}
const error_handling_examples = {
timeout_with_partial_results: {
error: {
code: 'TIMEOUT',
message: 'Retrieval exceeded time budget but returning partial results',
details: {
effective_timeout_ms: 8,
degradation_steps_applied: ['reduced_top_k', 'simplified_scoring'],
retry_suggestions: ['Increase time budget', 'Reduce top_k', 'Use simpler filters']
}
},
stats: { partial_results: 3, time_elapsed_ms: 9 },
units: [/* partial results */]
}
};