Description: Simple retrieval-then-generate approach
Flow: Query → Retrieve relevant documents → Generate response using retrieved context
Components: Single vector database, basic embedding model, LLM for generation
Use Case: Simple Q&A applications, proof of concepts
Limitations: No query optimization, basic relevance matching
Description: Enhanced retrieval with pre/post-processing
Features: Query rewriting, document re-ranking, result filtering
Components: Query optimizer, multiple retrieval strategies, re-ranking models
Improvements: Better relevance, reduced hallucination, contextual understanding
Use Case: Production applications requiring higher accuracy
Description: Flexible, component-based architecture
Features: Interchangeable modules (retrieval, generation, reasoning)
Components: Pluggable retrievers, generators, evaluators, orchestrators
Benefits: Customizable pipelines, easier testing and optimization
Use Case: Complex applications with varying requirements
Description: Agent-driven retrieval with dynamic decision-making
Features: Multi-step reasoning, adaptive retrieval strategies, tool usage
Components: Planning agents, retrieval agents, synthesis agents
Capabilities: Complex query decomposition, iterative refinement
Use Case: Research assistance, complex problem-solving
Description: Multi-level retrieval with document hierarchy
Structure: Document → Section → Paragraph → Sentence levels
Features: Coarse-to-fine retrieval, contextual preservation
Benefits: Better long-document understanding, improved context relevance
Use Case: Technical documentation, legal documents, research papers
Description: Combines multiple retrieval methods
Approaches: Dense + sparse retrieval, multiple embedding models
Techniques: Reciprocal rank fusion, weighted combination
Benefits: Improved recall and precision, robust retrieval
Use Case: Diverse content types, comprehensive search requirements
Description: Self-reflective retrieval with quality assessment
Features: Retrieval necessity prediction, relevance scoring, response verification
Components: Reflection tokens, quality critics, adaptive triggering
Benefits: Reduced unnecessary retrievals, improved factual accuracy
Use Case: High-stakes applications requiring reliability
Description: Error-correcting retrieval with web fallback
Features: Retrieval quality assessment, web search integration, knowledge correction
Flow: Assess retrieval → Correct if needed → Generate with verified knowledge
Benefits: Handles knowledge gaps, improves factual accuracy
Use Case: Dynamic knowledge domains, fact-checking applications
Description: Dynamic strategy selection based on query complexity
Features: Query classification, strategy routing, adaptive processing
Strategies: No retrieval, single-step, multi-step, iterative
Benefits: Optimized processing, cost efficiency, improved performance
Use Case: Mixed query types, production optimization
Description: Graph-based knowledge representation and retrieval
Features: Entity relationships, graph traversal, community detection
Components: Knowledge graphs, graph databases, relationship reasoning
Benefits: Complex relationship understanding, multi-hop reasoning
Use Case: Knowledge-intensive domains, relationship-heavy queries
Description: Retrieval across text, images, audio, video
Features: Cross-modal embeddings, multi-modal fusion, diverse content types
Components: Vision encoders, audio processors, multi-modal LLMs
Use Case: Rich media applications, comprehensive content search
Description: Context-aware retrieval for multi-turn conversations
Features: Conversation history integration, context tracking, query refinement
Components: Memory management, context windows, dialogue state tracking
Use Case: Chatbots, virtual assistants, interactive applications
Description: Time-aware retrieval considering temporal relevance
Features: Temporal embeddings, time-based filtering, recency weighting
Components: Time-aware indexing, temporal ranking, freshness scoring
Use Case: News applications, time-sensitive information, evolving knowledge
Description: Retrieval across multiple distributed knowledge sources
Features: Cross-source querying, source-specific optimization, result aggregation
Components: Multiple vector stores, federation layer, source routing
Use Case: Multi-tenant systems, organizational silos, diverse data sources
Description: Real-time retrieval and generation for continuous data
Features: Incremental updates, real-time indexing, streaming responses
Components: Stream processing, dynamic indexing, real-time APIs
Use Case: Live data feeds, real-time monitoring, continuous learning
Query Complexity: Simple → Basic RAG; Complex → Agentic RAG
Accuracy Requirements: High → Self-RAG, CRAG; Standard → Advanced RAG
Data Types: Text → Basic RAG; Mixed → Multi-Modal RAG
Scale: Small → Naive RAG; Enterprise → Modular/Federated RAG
Real-time Needs: Static → Basic RAG; Dynamic → Streaming/Adaptive RAG
Relationship Importance: Entity-heavy → GraphRAG; Document-focused → Hierarchical RAG
Each pattern addresses specific use cases and requirements, and modern implementations often combine multiple patterns for optimal performance.
1. Basic/Foundational RAG
Primary Paper:
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). "Retrieval-augmented generation for knowledge-intensive NLP tasks." Advances in Neural Information Processing Systems, 33, 9459-9474.
Venue: NeurIPS 2020
Citations: 3,000+ (highly influential)
Follow-up Work:
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., ... & Yih, W. T. (2020). "Dense passage retrieval for open-domain question answering." arXiv preprint arXiv:2004.04906.
2. Self-RAG
Primary Paper:
Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2023). "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection." International Conference on Learning Representations (ICLR), 2024.
Venue: ICLR 2024 (accepted)
arXiv: arXiv:2310.11511
3. Corrective RAG (CRAG)
Primary Paper:
Yan, S., Gu, J. C., Zhu, Y., & Ling, Z. H. (2024). "Corrective Retrieval Augmented Generation." International Conference on Learning Representations (ICLR), 2024.
Venue: ICLR 2024 (accepted)
arXiv: arXiv:2401.15884
4. GraphRAG (Academic Foundations)
Core Concept Papers:
Yasunaga, M., Leskovec, J., & Liang, P. (2021). "QA-GNN: Reasoning with language models and knowledge graphs for question answering." NAACL-HLT, 2021.
Venue: NAACL 2021
Zhang, Y., Li, X., Cui, L., Wu, B., Gu, Y., & Dublish, N. (2023). "Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models." arXiv preprint arXiv:2309.01219.
Microsoft's Implementation (Technical Report):
Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., ... & Larson, J. (2024). "From Local to Global: A Graph RAG Approach to Query-Focused Summarization." arXiv preprint arXiv:2404.16130.
5. Fusion/Hybrid RAG
Rank Fusion Foundations:
Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). "Reciprocal rank fusion outperforms condorcet and individual rank learning methods." Proceedings of the 32nd international ACM SIGIR conference.
Venue: SIGIR 2009
Dense-Sparse Hybrid:
Gao, L., Ma, X., Lin, J., & Callan, J. (2023). "Precise Zero-Shot Dense Retrieval without Relevance Labels." Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.
Venue: ACL 2023
Lin, S. C., Yang, J. H., & Lin, J. (2021). "Distilling dense representations for ranking using tightly-coupled teachers." arXiv preprint arXiv:2010.11386.
6. Multi-Modal RAG (Emerging Academic Work)
Foundation Papers:
Chen, J., Lin, H., Han, X., & Sun, L. (2023). "M-BEIR: A Multi-domain Benchmark for Multi-modal Information Retrieval." arXiv preprint arXiv:2308.14565.
Li, J., Li, D., Xiong, C., & Hoi, S. (2022). "BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation." ICML, 2022.
Venue: ICML 2022
7. Conversational RAG
Academic Foundations:
Qu, C., Yang, L., Qiu, M., Croft, W. B., Zhang, Y., & Iyyer, M. (2019). "BERT with history answer embedding for conversational question answering." Proceedings of the 42nd International ACM SIGIR Conference.
Venue: SIGIR 2019
Anantha, R., Vakulenko, S., Tu, Z., Longpre, S., Pulman, S., & Chappidi, S. (2021). "Open-domain question answering goes conversational via question rewriting." NAACL-HLT, 2021.
Venue: NAACL 2021
NLP: ACL, EMNLP, NAACL, EACL
ML: ICLR, NeurIPS, ICML
IR: SIGIR, ECIR, CIKM, WSDM
AI: AAAI, IJCAI
These papers represent the academic foundation. Many "industry patterns" (like Modular RAG, Agentic RAG) are implementation frameworks from companies like:
LangChain/LangSmith documentation
LlamaIndex research papers (often arXiv, not peer-reviewed)
OpenAI, Anthropic technical reports
AWS, Google Cloud, Microsoft technical documentation
Agentic RAG is essentially a combination/application of:
Basic RAG (Lewis et al., 2020) - for retrieval mechanism
ReAct (Yao et al., 2022) - for reasoning and action planning
Tool Learning (various papers) - for dynamic tool selection
Multi-step reasoning - from planning literature
Academic Status:
The components are academically established
The specific combination as "Agentic RAG" is primarily industry terminology
Industry Sources:
LangChain/LangGraph documentation
LlamaIndex agent frameworks
OpenAI Assistants API documentation
Anthropic Claude function calling
AWS Bedrock Agents
1. Basic/Naive RAG
AWS Services: Bedrock Knowledge Bases, OpenSearch, Kendra
Implementation: Direct integration with foundation models
Status: Production-ready, well-documented
2. Advanced RAG
AWS Services: Bedrock Knowledge Bases with chunking strategies, Kendra intelligent ranking
Features: Query preprocessing, result re-ranking, metadata filtering
Status: Supported through configuration options
3. Agentic RAG
AWS Services: Bedrock Agents (dedicated service)
Capabilities: Function calling, tool orchestration, multi-step reasoning
Integration: Lambda functions, API Gateway, other AWS services
Status: Native support, actively developed
4. Multi-Modal RAG
AWS Services: Bedrock (Claude 3, GPT-4V), Rekognition, Textract
Support: Text + image retrieval and generation
Status: Supported through multimodal foundation models
5. Conversational RAG
AWS Services: Bedrock with conversation memory, DynamoDB for session storage
Features: Context preservation, multi-turn conversations
Status: Supported through application design patterns
6. Self-RAG
Implementation: Custom logic using Bedrock APIs + Lambda
Components: Retrieval quality assessment, response verification
Status: Possible but requires significant custom development
7. Corrective RAG (CRAG)
Implementation: Bedrock + custom orchestration + web search APIs
Components: Quality assessment, fallback to web search
Status: Achievable through multi-service architecture
8. GraphRAG
AWS Services: Neptune (graph database) + Bedrock
Implementation: Custom integration between graph queries and LLM generation
Status: Technically possible, limited native support
9. Fusion/Hybrid RAG
AWS Services: Kendra (keyword) + OpenSearch (semantic) + custom ranking
Implementation: Multi-retriever setup with result fusion
Status: Achievable through architecture design
10. Hierarchical RAG
Implementation: Custom chunking strategies in Bedrock Knowledge Bases
Features: Document structure preservation, multi-level retrieval
Status: Limited native support, mostly custom implementation
11. Modular RAG
Status: Framework concept, not a specific AWS service feature
Implementation: Achievable through microservices architecture
12. Adaptive RAG
Implementation: Custom routing logic + multiple Bedrock configurations
Status: Requires significant custom orchestration
13. Federated RAG
Implementation: Cross-account/cross-region custom setup
Status: Architecturally possible but complex
14. Temporal RAG
Implementation: Custom time-aware indexing + metadata filtering
Status: Limited native temporal awareness
15. Streaming RAG
AWS Services: Kinesis + Lambda + Bedrock for real-time processing
Status: Possible through event-driven architecture
Vector storage (OpenSearch, Pinecone, Redis)
Automatic chunking and embedding
Metadata filtering and hybrid search
Integration with S3, SharePoint, Confluence, Salesforce
Function Calling: Lambda integration
Action Groups: Custom tool definitions
Memory: Conversation persistence
Orchestration: Multi-step reasoning and planning
Guardrails: Safety and content filtering
Anthropic Claude: Text and multimodal
Amazon Titan: Text embeddings and generation
Cohere: Text generation and embeddings
Meta Llama: Text generation
Stability AI: Image generation
Basic RAG: Bedrock Knowledge Bases + Foundation Models
Agentic RAG: Bedrock Agents + Lambda functions
Conversational RAG: Bedrock + DynamoDB + API Gateway
Hybrid RAG: Kendra + OpenSearch + custom fusion logic
Multi-Modal RAG: Bedrock multimodal models + S3 + Rekognition
Self-RAG: Custom orchestration with quality assessment
1. Basic/Naive RAG
Paper: "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (Lewis et al., 2020, NeurIPS)
Status: Foundational paper, highly cited (~3000+ citations)
2. Self-RAG
Paper: "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection" (Asai et al., 2023, ICLR 2024)
Status: Peer-reviewed, significant impact
3. Corrective RAG (CRAG)
Paper: "Corrective Retrieval Augmented Generation" (Yan et al., 2024, ICLR 2024)
Status: Recently accepted at top-tier venue
4. GraphRAG
Papers: Multiple works including "Graph-RAG" papers and Microsoft's GraphRAG implementation
Status: Mixed - concept is established, specific implementations vary
5. Fusion/Hybrid RAG
Papers: "Precise Zero-Shot Dense Retrieval without Relevance Labels" (Gao et al., 2023) and related fusion techniques
Status: Rank fusion methods well-established in IR literature
6. Modular RAG - More of an engineering pattern from LangChain, LlamaIndex
7. Agentic RAG - Emerging from agent frameworks, not formalized academically yet
8. Hierarchical RAG - Implementation pattern, limited formal research
9. Adaptive RAG - Engineering optimization, some academic work emerging
10. Multi-Modal RAG - Active research area but fragmented approaches
Others - Mostly industry terminology or implementation patterns
ACL, EMNLP, NAACL - Main NLP venues
ICLR, NeurIPS, ICML - ML conferences with RAG research
SIGIR, ECIR - Information retrieval conferences
arXiv - Preprints (not peer-reviewed but influential)
Know your model's capabilities and limitations
It is important to consider the selected model's known capabilities and limitations while prompting the model. In this lab, we are using the Claude3 models which have the following guidelines for effective prompting:
Claude 3 Models:
Input Format: Images need to be provided in a base64-encoded format.
Image Size: Individual image size cannot exceed 5MB
Multiple Images: Claude 3 models supports prompting with up to 5 images.
Image Format: Supported image formats: 'PNG', 'JPEG', 'WebP' and 'GIF'.
Image Clarity: Clear images which are not too blurry are more effective with Claude 3 models.
Image Placement*: Claude 3 models work better when images come before text while prompting. However, if the use case requires, images can follow text or can be interpolated with text.
Image Resolution: If the image's long edge is more than 1568 pixels, or the image is more than ~1600 tokens, it will first be scaled down, preserving aspect ratio, until it is within size limits. If the input image is too large and needs to be resized, it will increase latency of time-to-first-token, without giving you any additional model performance. Very small images under 200 pixels on any given edge may lead to degraded performance.