Metrics & SLO β
SLO targets β
- Retrieval latency (online): p95 < 8 ms, p50 ~ 3 ms
- Privacy: leak_rate = 0 (strict)
- Freshness: summarize lag p95 < 2 s under normal load
Online metrics β
l4_retrieval_latency_ms {p50,p95}
(Prometheus: l4_retrieval_latency_ms_bucket/_sum/_count) l4_minimal_responses (counter) (Prometheus: l4_minimal_responses_total)
- l4_cover_entities (count/ratio) l4_summarize_latency_ms
{p50,p95}
(Prometheus: l4_summarize_latency_ms_bucket) l4_index_latency_ms{p50,p95}
(Prometheus: l4_index_latency_ms_bucket) - l4_share_of_total_time (ratio) # computed as l4_time_ms / total_orchestration_time_ms
- l4_time_ms # measured inside L4 boundaries (from call start to reply)
Pipeline metrics β
- l4_ingest_qps, l4_ingest_error_rate
- l4_summarize_latency_ms
{p50,p95}
- l4_index_latency_ms
{p50,p95}
- l4_compression_ratio
- l4_units_total, l4_units_pinned, l4_units_evicted
Quality (offline) β
- Recall@K on curated queries
- Novelty vs Redundancy (duplicate content rate)
- Coverage by entity set
Observability fields β
- request_id, user_id/agent_id, session ids
- correlation with orchestration logs and L1 signals
How to estimate "β€ 40% of total time" β
- Record total orchestration time per request (t_total)
- Record L4 exclusive time (t_l4)
- Report l4_share_of_total_time = t_l4 / t_total; watch p50/p95 distributions
- If median share persistently > 40%, consider introducing budgets.experience_time_ms (P1)
Production Implementation Framework β
Real-time Monitoring Dashboard β
yaml
l4_experience_metrics_dashboard:
panels:
# Core Performance Panel
retrieval_performance:
title: "L4 Retrieval Performance"
metrics:
- l4_retrieval_latency_ms{quantile="0.50,0.95"}
- l4_retrieval_throughput_qps
- l4_share_of_total_time{quantile="0.50,0.95"}
thresholds:
critical: { p95_latency_ms: 8, throughput_qps: 50 }
warning: { p50_latency_ms: 4, share_of_total: 0.4 }
# Quality & Coverage Panel
experience_quality:
title: "Experience Unit Quality"
metrics:
- l4_experience_quality_score{quantile="0.50,0.90"}
- l4_cover_entities_ratio
- l4_novelty_vs_redundancy_ratio
targets:
quality_score_p50: 0.8
entity_coverage: 0.75
novelty_ratio: 0.7
# Privacy & Security Panel
privacy_compliance:
title: "Privacy Compliance"
metrics:
- l4_privacy_redaction_rate{mode="redact,block"}
- l4_privacy_leak_rate # Must be 0
- l4_pii_detection_accuracy
sla:
leak_rate: 0.0
redaction_accuracy: 0.99
Comprehensive Testing Framework β
typescript
describe('L4 Experience Metrics Tests', () => {
describe('SLO Compliance Testing', () => {
test('retrieval latency meets SLO targets', async () => {
const test_queries = [
{ intent: 'implement authentication', complexity: 'simple' },
{ intent: 'optimize database queries with complex joins', complexity: 'high' },
{ intent: 'debug memory leaks in React components', complexity: 'medium' }
];
const latencies = [];
for (const query of test_queries) {
const measurements = await measureRetrievalLatency(query, 100); // 100 samples
latencies.push(...measurements);
}
const p50 = percentile(latencies, 0.5);
const p95 = percentile(latencies, 0.95);
expect(p50).toBeLessThan(3); // p50 ~ 3ms target
expect(p95).toBeLessThan(8); // p95 < 8ms SLO
// Log for monitoring
console.log(`L4 Latency Metrics: p50=${p50}ms, p95=${p95}ms`);
});
test('privacy leak rate maintains zero tolerance', async () => {
const sensitive_queries = [
{ intent: 'user authentication with email validation', pii_expected: true },
{ intent: 'API key rotation procedures', sensitive: true },
{ intent: 'database connection string setup', credentials: true }
];
let total_retrievals = 0;
let privacy_leaks = 0;
for (const query of sensitive_queries) {
const results = await experienceRetrieval.retrieve({
...query,
privacy_mode: 'redact'
});
total_retrievals += results.units.length;
// Check for PII leakage in summaries
results.units.forEach(unit => {
if (containsPII(unit.summary)) {
privacy_leaks++;
}
});
}
const leak_rate = privacy_leaks / total_retrievals;
expect(leak_rate).toBe(0); // Zero tolerance for privacy leaks
});
test('experience quality scores meet targets', async () => {
const quality_test_queries = [
{ intent: 'implement user registration flow', expected_min_quality: 0.8 },
{ intent: 'optimize React performance', expected_min_quality: 0.75 },
{ intent: 'setup CI/CD pipeline', expected_min_quality: 0.85 }
];
const quality_scores = [];
for (const query of quality_test_queries) {
const results = await experienceRetrieval.retrieve(query);
results.units.forEach(unit => {
expect(unit.quality_score).toBeGreaterThan(query.expected_min_quality);
quality_scores.push(unit.quality_score);
});
}
const avg_quality = quality_scores.reduce((a, b) => a + b) / quality_scores.length;
expect(avg_quality).toBeGreaterThan(0.8); // Average quality > 0.8
});
});
describe('Pipeline Performance Testing', () => {
test('ingestion throughput meets targets', async () => {
const batch_events = Array.from({ length: 1000 }, (_, i) => ({
id: `test_event_${i}`,
intent: `test operation ${i}`,
outcome: { status: 'success' }
}));
const start_time = Date.now();
const results = await Promise.allSettled(
batch_events.map(event => l4Pipeline.ingest(event))
);
const total_time = Date.now() - start_time;
const successful = results.filter(r => r.status === 'fulfilled').length;
const throughput = successful / (total_time / 1000); // events per second
expect(throughput).toBeGreaterThan(500); // > 500 events/sec target
expect(successful / batch_events.length).toBeGreaterThan(0.95); // 95% success rate
});
test('summarization latency stays within bounds', async () => {
const event_clusters = [
generateEventCluster(5), // Small cluster
generateEventCluster(15), // Medium cluster
generateEventCluster(30) // Large cluster
];
const summarization_times = [];
for (const cluster of event_clusters) {
const start = Date.now();
const summary = await l4Pipeline.summarize(cluster);
const duration = Date.now() - start;
summarization_times.push(duration);
expect(summary.quality_score).toBeGreaterThan(0.7);
}
const p95_summarization = percentile(summarization_times, 0.95);
expect(p95_summarization).toBeLessThan(2000); // p95 < 2s target
});
});
});
Advanced Metrics Collection β
typescript
interface L4MetricsCollector {
// Real-time performance metrics
recordRetrievalLatency(latency_ms: number, query_complexity: string): void;
recordQualityScore(score: number, unit_type: string): void;
recordPrivacyAction(action: 'allow' | 'redact' | 'block', pii_detected: boolean): void;
// Business intelligence metrics
recordEntityCoverage(entities_requested: string[], entities_covered: string[]): void;
recordUserSatisfaction(feedback_rating: number, experience_units_used: number): void;
recordLearningEffectiveness(before_score: number, after_score: number): void;
}
class ProductionMetricsCollector implements L4MetricsCollector {
private prometheus: PrometheusRegistry;
private influxDB: InfluxDBClient;
recordRetrievalLatency(latency_ms: number, query_complexity: string) {
this.prometheus.histogram('l4_retrieval_latency_ms')
.labels({ complexity: query_complexity })
.observe(latency_ms);
this.influxDB.writePoint({
measurement: 'l4_performance',
fields: { retrieval_latency_ms: latency_ms },
tags: { complexity: query_complexity },
timestamp: new Date()
});
}
recordQualityScore(score: number, unit_type: string) {
this.prometheus.histogram('l4_experience_quality_score')
.labels({ type: unit_type })
.observe(score);
// Alert if quality drops below threshold
if (score < 0.6) {
this.alertManager.send({
severity: 'warning',
summary: `Low quality experience unit detected: ${score}`,
labels: { component: 'l4-experience', type: unit_type }
});
}
}
}
Observability & Debugging Framework β
yaml
l4_observability_stack:
logging:
structured_logs:
format: json
fields: [timestamp, request_id, user_id, latency_ms, quality_score, privacy_mode]
retention: 30_days
debug_logging:
retrieval_decisions: true
scoring_breakdown: true
privacy_actions: true
tracing:
jaeger_integration: true
trace_sampling_rate: 0.1
custom_spans:
- experience_retrieval
- hybrid_scoring
- privacy_filtering
alerting:
channels: [slack, pagerduty, email]
escalation:
warning: 15_minutes
critical: 5_minutes
dashboards:
grafana_dashboards:
- l4_performance_overview
- l4_quality_metrics
- l4_privacy_compliance
- l4_resource_utilization
Performance Optimization Guidelines β
typescript
interface L4PerformanceOptimization {
// Index optimization strategies
optimizeVectorIndex(): {
target_dimensions: 384;
quantization: 'int8';
index_type: 'hnsw';
ef_construction: 200;
m_connections: 16;
};
// Caching strategies
implementCaching(): {
embedding_cache: { ttl: '1h', max_size: '100MB' };
query_result_cache: { ttl: '10m', max_size: '50MB' };
entity_lookup_cache: { ttl: '30m', max_size: '25MB' };
};
// Resource allocation
scaleResources(current_metrics: PerformanceMetrics): {
cpu_scaling: 'horizontal' | 'vertical';
memory_requirements: string;
replica_count: number;
};
}
Success Criteria & Validation β
Foundation Success Criteria (v0.1) β
Performance Targets:
typescript
const l4_success_criteria = {
foundation_v01: {
retrieval_latency: {
p50: { target: 3, unit: 'ms', tolerance: 1 },
p95: { target: 8, unit: 'ms', tolerance: 2 }
},
throughput: {
ingestion: { target: 1000, unit: 'events/sec', tolerance: 200 },
retrieval: { target: 500, unit: 'queries/sec', tolerance: 100 }
},
quality: {
experience_unit_quality: { target: 0.8, tolerance: 0.1 },
privacy_leak_rate: { target: 0.0, tolerance: 0.0 },
entity_coverage: { target: 0.75, tolerance: 0.1 }
}
},
validation_tests: [
'retrieval_latency_sla_compliance',
'privacy_zero_leak_guarantee',
'quality_score_distribution',
'concurrent_load_handling',
'degradation_path_effectiveness'
]
};
See also:
- ./README.md
- ../evaluation/metrics.md