RAG Evaluation
Measure Precision@k and Recall@k for your retrieval pipeline, detect score drift after model updates, and add a CI regression gate that fails on quality drops.
Measure Precision@k and Recall@k for your retrieval pipeline, detect score drift after model updates, and add a CI regression gate that fails on quality drops.
Write parity-driven tests that prove one query intent behaves identically across every RushDB surface.