Skip to content

Metrics & SLO ​

SLO targets ​

  • Retrieval latency (online): p95 < 8 ms, p50 ~ 3 ms
  • Privacy: leak_rate = 0 (strict)
  • Freshness: summarize lag p95 < 2 s under normal load

Online metrics ​

l4_retrieval_latency_ms {p50,p95} (Prometheus: l4_retrieval_latency_ms_bucket/_sum/_count) l4_minimal_responses (counter) (Prometheus: l4_minimal_responses_total)

  • l4_cover_entities (count/ratio) l4_summarize_latency_ms {p50,p95} (Prometheus: l4_summarize_latency_ms_bucket) l4_index_latency_ms {p50,p95} (Prometheus: l4_index_latency_ms_bucket)
  • l4_share_of_total_time (ratio) # computed as l4_time_ms / total_orchestration_time_ms
  • l4_time_ms # measured inside L4 boundaries (from call start to reply)

Pipeline metrics ​

  • l4_ingest_qps, l4_ingest_error_rate
  • l4_summarize_latency_ms {p50,p95}
  • l4_index_latency_ms {p50,p95}
  • l4_compression_ratio
  • l4_units_total, l4_units_pinned, l4_units_evicted

Quality (offline) ​

  • Recall@K on curated queries
  • Novelty vs Redundancy (duplicate content rate)
  • Coverage by entity set

Observability fields ​

  • request_id, user_id/agent_id, session ids
  • correlation with orchestration logs and L1 signals

How to estimate "≀ 40% of total time" ​

  • Record total orchestration time per request (t_total)
  • Record L4 exclusive time (t_l4)
  • Report l4_share_of_total_time = t_l4 / t_total; watch p50/p95 distributions
  • If median share persistently > 40%, consider introducing budgets.experience_time_ms (P1)

Production Implementation Framework ​

Real-time Monitoring Dashboard ​

yaml
l4_experience_metrics_dashboard:
  panels:
    # Core Performance Panel
    retrieval_performance:
      title: "L4 Retrieval Performance"
      metrics:
        - l4_retrieval_latency_ms{quantile="0.50,0.95"}
        - l4_retrieval_throughput_qps
        - l4_share_of_total_time{quantile="0.50,0.95"}
      thresholds:
        critical: { p95_latency_ms: 8, throughput_qps: 50 }
        warning: { p50_latency_ms: 4, share_of_total: 0.4 }
        
    # Quality & Coverage Panel  
    experience_quality:
      title: "Experience Unit Quality"
      metrics:
        - l4_experience_quality_score{quantile="0.50,0.90"}
        - l4_cover_entities_ratio
        - l4_novelty_vs_redundancy_ratio
      targets:
        quality_score_p50: 0.8
        entity_coverage: 0.75
        novelty_ratio: 0.7
        
    # Privacy & Security Panel
    privacy_compliance:
      title: "Privacy Compliance"
      metrics:
        - l4_privacy_redaction_rate{mode="redact,block"}
        - l4_privacy_leak_rate  # Must be 0
        - l4_pii_detection_accuracy
      sla:
        leak_rate: 0.0
        redaction_accuracy: 0.99

Comprehensive Testing Framework ​

typescript
describe('L4 Experience Metrics Tests', () => {
  describe('SLO Compliance Testing', () => {
    test('retrieval latency meets SLO targets', async () => {
      const test_queries = [
        { intent: 'implement authentication', complexity: 'simple' },
        { intent: 'optimize database queries with complex joins', complexity: 'high' },
        { intent: 'debug memory leaks in React components', complexity: 'medium' }
      ];
      
      const latencies = [];
      for (const query of test_queries) {
        const measurements = await measureRetrievalLatency(query, 100); // 100 samples
        latencies.push(...measurements);
      }
      
      const p50 = percentile(latencies, 0.5);
      const p95 = percentile(latencies, 0.95);
      
      expect(p50).toBeLessThan(3); // p50 ~ 3ms target
      expect(p95).toBeLessThan(8); // p95 < 8ms SLO
      
      // Log for monitoring
      console.log(`L4 Latency Metrics: p50=${p50}ms, p95=${p95}ms`);
    });
    
    test('privacy leak rate maintains zero tolerance', async () => {
      const sensitive_queries = [
        { intent: 'user authentication with email validation', pii_expected: true },
        { intent: 'API key rotation procedures', sensitive: true },
        { intent: 'database connection string setup', credentials: true }
      ];
      
      let total_retrievals = 0;
      let privacy_leaks = 0;
      
      for (const query of sensitive_queries) {
        const results = await experienceRetrieval.retrieve({
          ...query,
          privacy_mode: 'redact'
        });
        
        total_retrievals += results.units.length;
        
        // Check for PII leakage in summaries
        results.units.forEach(unit => {
          if (containsPII(unit.summary)) {
            privacy_leaks++;
          }
        });
      }
      
      const leak_rate = privacy_leaks / total_retrievals;
      expect(leak_rate).toBe(0); // Zero tolerance for privacy leaks
    });
    
    test('experience quality scores meet targets', async () => {
      const quality_test_queries = [
        { intent: 'implement user registration flow', expected_min_quality: 0.8 },
        { intent: 'optimize React performance', expected_min_quality: 0.75 },
        { intent: 'setup CI/CD pipeline', expected_min_quality: 0.85 }
      ];
      
      const quality_scores = [];
      
      for (const query of quality_test_queries) {
        const results = await experienceRetrieval.retrieve(query);
        
        results.units.forEach(unit => {
          expect(unit.quality_score).toBeGreaterThan(query.expected_min_quality);
          quality_scores.push(unit.quality_score);
        });
      }
      
      const avg_quality = quality_scores.reduce((a, b) => a + b) / quality_scores.length;
      expect(avg_quality).toBeGreaterThan(0.8); // Average quality > 0.8
    });
  });
  
  describe('Pipeline Performance Testing', () => {
    test('ingestion throughput meets targets', async () => {
      const batch_events = Array.from({ length: 1000 }, (_, i) => ({
        id: `test_event_${i}`,
        intent: `test operation ${i}`,
        outcome: { status: 'success' }
      }));
      
      const start_time = Date.now();
      const results = await Promise.allSettled(
        batch_events.map(event => l4Pipeline.ingest(event))
      );
      const total_time = Date.now() - start_time;
      
      const successful = results.filter(r => r.status === 'fulfilled').length;
      const throughput = successful / (total_time / 1000); // events per second
      
      expect(throughput).toBeGreaterThan(500); // > 500 events/sec target
      expect(successful / batch_events.length).toBeGreaterThan(0.95); // 95% success rate
    });
    
    test('summarization latency stays within bounds', async () => {
      const event_clusters = [
        generateEventCluster(5), // Small cluster
        generateEventCluster(15), // Medium cluster  
        generateEventCluster(30) // Large cluster
      ];
      
      const summarization_times = [];
      
      for (const cluster of event_clusters) {
        const start = Date.now();
        const summary = await l4Pipeline.summarize(cluster);
        const duration = Date.now() - start;
        
        summarization_times.push(duration);
        expect(summary.quality_score).toBeGreaterThan(0.7);
      }
      
      const p95_summarization = percentile(summarization_times, 0.95);
      expect(p95_summarization).toBeLessThan(2000); // p95 < 2s target
    });
  });
});

Advanced Metrics Collection ​

typescript
interface L4MetricsCollector {
  // Real-time performance metrics
  recordRetrievalLatency(latency_ms: number, query_complexity: string): void;
  recordQualityScore(score: number, unit_type: string): void;
  recordPrivacyAction(action: 'allow' | 'redact' | 'block', pii_detected: boolean): void;
  
  // Business intelligence metrics
  recordEntityCoverage(entities_requested: string[], entities_covered: string[]): void;
  recordUserSatisfaction(feedback_rating: number, experience_units_used: number): void;
  recordLearningEffectiveness(before_score: number, after_score: number): void;
}

class ProductionMetricsCollector implements L4MetricsCollector {
  private prometheus: PrometheusRegistry;
  private influxDB: InfluxDBClient;
  
  recordRetrievalLatency(latency_ms: number, query_complexity: string) {
    this.prometheus.histogram('l4_retrieval_latency_ms')
      .labels({ complexity: query_complexity })
      .observe(latency_ms);
      
    this.influxDB.writePoint({
      measurement: 'l4_performance',
      fields: { retrieval_latency_ms: latency_ms },
      tags: { complexity: query_complexity },
      timestamp: new Date()
    });
  }
  
  recordQualityScore(score: number, unit_type: string) {
    this.prometheus.histogram('l4_experience_quality_score')
      .labels({ type: unit_type })
      .observe(score);
      
    // Alert if quality drops below threshold
    if (score < 0.6) {
      this.alertManager.send({
        severity: 'warning',
        summary: `Low quality experience unit detected: ${score}`,
        labels: { component: 'l4-experience', type: unit_type }
      });
    }
  }
}

Observability & Debugging Framework ​

yaml
l4_observability_stack:
  logging:
    structured_logs:
      format: json
      fields: [timestamp, request_id, user_id, latency_ms, quality_score, privacy_mode]
      retention: 30_days
      
    debug_logging:
      retrieval_decisions: true
      scoring_breakdown: true
      privacy_actions: true
      
  tracing:
    jaeger_integration: true
    trace_sampling_rate: 0.1
    custom_spans:
      - experience_retrieval
      - hybrid_scoring
      - privacy_filtering
      
  alerting:
    channels: [slack, pagerduty, email]
    escalation:
      warning: 15_minutes
      critical: 5_minutes
      
  dashboards:
    grafana_dashboards:
      - l4_performance_overview
      - l4_quality_metrics  
      - l4_privacy_compliance
      - l4_resource_utilization

Performance Optimization Guidelines ​

typescript
interface L4PerformanceOptimization {
  // Index optimization strategies
  optimizeVectorIndex(): {
    target_dimensions: 384;
    quantization: 'int8';
    index_type: 'hnsw';
    ef_construction: 200;
    m_connections: 16;
  };
  
  // Caching strategies
  implementCaching(): {
    embedding_cache: { ttl: '1h', max_size: '100MB' };
    query_result_cache: { ttl: '10m', max_size: '50MB' };
    entity_lookup_cache: { ttl: '30m', max_size: '25MB' };
  };
  
  // Resource allocation
  scaleResources(current_metrics: PerformanceMetrics): {
    cpu_scaling: 'horizontal' | 'vertical';
    memory_requirements: string;
    replica_count: number;
  };
}

Success Criteria & Validation ​

Foundation Success Criteria (v0.1) ​

Performance Targets:

typescript
const l4_success_criteria = {
  foundation_v01: {
    retrieval_latency: {
      p50: { target: 3, unit: 'ms', tolerance: 1 },
      p95: { target: 8, unit: 'ms', tolerance: 2 }
    },
    throughput: {
      ingestion: { target: 1000, unit: 'events/sec', tolerance: 200 },
      retrieval: { target: 500, unit: 'queries/sec', tolerance: 100 }
    },
    quality: {
      experience_unit_quality: { target: 0.8, tolerance: 0.1 },
      privacy_leak_rate: { target: 0.0, tolerance: 0.0 },
      entity_coverage: { target: 0.75, tolerance: 0.1 }
    }
  },
  
  validation_tests: [
    'retrieval_latency_sla_compliance',
    'privacy_zero_leak_guarantee', 
    'quality_score_distribution',
    'concurrent_load_handling',
    'degradation_path_effectiveness'
  ]
};

See also:

  • ./README.md
  • ../evaluation/metrics.md