Lets take an example : You are tasked with building a clinical decision support system for doctors, doctors will ask questions like "What's the recommended treatment for a 45 year-old diabetic patient with kidney disease?
- Your database has 500k medical papers
- Treatment guidelines are present
- Case studies are present
How would you architect the RAG system and what will be your key tradeoffs from a PM standpoint?
Dealing with medical industry, accuracy has to be paramount and the requirement should be 99% accurate because stakes are high, latency is something we can compromise on as long as accuracy is intact so lets say under 3 seconds. Our results should provide citations and also Compliance related informations in the recommendations and answers. This will build the trust
- For searching the documents, guidelines and case studies :
- Hybrid search (BM25 + Embdeddings)
- Why ? --> Medical terms need exact matches but symptom search can be semantic so hybrid search works best
- Specialized medical embeddings (example : BioBERT) because domain specific traning is crucial for accuracy
- Metadata filtering first --> Why ? --> Because we have 500k documents which is huge dataset, we can filter by speciality, recency and evidence level. This will reduce your unnecessary token consumption for going through all document everytime, instead narrowing down your search path.
- Chunking Strategy :
- Hierarchical chinking preserving paper structure. Why ? --> because context matters in medical domain (methodology, patient cohort, conclusions)
- Keep Abstract and relevant sections together. Why ? --> because conclusions need context of study designs in Medical domain
- Safety Mechanism :
- Confidence Scoring - Multiple evidence requirement ( atleast 3 or more sources)
- Conflicting evidence detection
- Uncertainty flagging.
- Human in loop (High stake decisions require physician reviews etc. Anomaly detection triggers review)
- Evaluation Strategy :
- Offline Metrics - Medical expert evaluations by a good number of physicians, citiations accuracy checks, comparison to treatment guidelines
- Online Metrics - Physicians feedback loop, outcome tracking (where permitted), false positive / negative rate
- Key Trade-Offs
- Accuracy vs Latency : Prioritize accuracy - MITIGATION FOR LATENCY ? --> Pre-compute for common queries, progressive loading etc.
- Recency vs Evidence qualitty : weight by evidence level + recency - MITIGATION --> Mark "Emerging research" vs "Established"
- Cost vs Coverage : Start with high evidence papers (lets say 100k), expand based on usage - MITIGATION FOR COVERAGE ? --> Track coverage gaps, prioritization indexing.
- Success Metrics
- Clinical Metrics : Diagnostic Accuracy improvements , treatment plan adherence, patient outcome correlations
- Operations Metrics : Physician adoption rate, Time saved per case, error rate reduction
- Business Metrics : Cost per query , ROI vs hiring more specialists
Essentially under these 6 buckets, I design my mental model to take decisions around a RAG architecture and this can be expanded to any domain and industry. There are lot of concepts and processes under RAG which my other blogs will cover in depth. I try to keep the learning simple to retain them easily and apply faster in real world :)


