RAG-Architecture patterns

Description: Simple retrieval-then-generate approach
Flow: Query → Retrieve relevant documents → Generate response using retrieved context
Components: Single vector database, basic embedding model, LLM for generation
Use Case: Simple Q&A applications, proof of concepts
Limitations: No query optimization, basic relevance matching

2. Advanced RAG

Description: Enhanced retrieval with pre/post-processing
Features: Query rewriting, document re-ranking, result filtering
Components: Query optimizer, multiple retrieval strategies, re-ranking models
Improvements: Better relevance, reduced hallucination, contextual understanding
Use Case: Production applications requiring higher accuracy

3. Modular RAG

Description: Flexible, component-based architecture
Features: Interchangeable modules (retrieval, generation, reasoning)
Components: Pluggable retrievers, generators, evaluators, orchestrators
Benefits: Customizable pipelines, easier testing and optimization
Use Case: Complex applications with varying requirements

4. Agentic RAG

Description: Agent-driven retrieval with dynamic decision-making
Features: Multi-step reasoning, adaptive retrieval strategies, tool usage
Components: Planning agents, retrieval agents, synthesis agents
Capabilities: Complex query decomposition, iterative refinement
Use Case: Research assistance, complex problem-solving

5. Hierarchical RAG

Description: Multi-level retrieval with document hierarchy
Structure: Document → Section → Paragraph → Sentence levels
Features: Coarse-to-fine retrieval, contextual preservation
Benefits: Better long-document understanding, improved context relevance
Use Case: Technical documentation, legal documents, research papers

6. Fusion RAG (Hybrid RAG)

Description: Combines multiple retrieval methods
Approaches: Dense + sparse retrieval, multiple embedding models
Techniques: Reciprocal rank fusion, weighted combination
Benefits: Improved recall and precision, robust retrieval
Use Case: Diverse content types, comprehensive search requirements

7. Self-RAG

Description: Self-reflective retrieval with quality assessment
Features: Retrieval necessity prediction, relevance scoring, response verification
Components: Reflection tokens, quality critics, adaptive triggering
Benefits: Reduced unnecessary retrievals, improved factual accuracy
Use Case: High-stakes applications requiring reliability

8. Corrective RAG (CRAG)

Description: Error-correcting retrieval with web fallback
Features: Retrieval quality assessment, web search integration, knowledge correction
Flow: Assess retrieval → Correct if needed → Generate with verified knowledge
Benefits: Handles knowledge gaps, improves factual accuracy
Use Case: Dynamic knowledge domains, fact-checking applications

9. Adaptive RAG

Description: Dynamic strategy selection based on query complexity
Features: Query classification, strategy routing, adaptive processing
Strategies: No retrieval, single-step, multi-step, iterative
Benefits: Optimized processing, cost efficiency, improved performance
Use Case: Mixed query types, production optimization

10. GraphRAG

Description: Graph-based knowledge representation and retrieval
Features: Entity relationships, graph traversal, community detection
Components: Knowledge graphs, graph databases, relationship reasoning
Benefits: Complex relationship understanding, multi-hop reasoning
Use Case: Knowledge-intensive domains, relationship-heavy queries

11. Multi-Modal RAG

Description: Retrieval across text, images, audio, video
Features: Cross-modal embeddings, multi-modal fusion, diverse content types
Components: Vision encoders, audio processors, multi-modal LLMs
Use Case: Rich media applications, comprehensive content search

12. Conversational RAG

Description: Context-aware retrieval for multi-turn conversations
Features: Conversation history integration, context tracking, query refinement
Components: Memory management, context windows, dialogue state tracking
Use Case: Chatbots, virtual assistants, interactive applications

13. Temporal RAG

Description: Time-aware retrieval considering temporal relevance
Features: Temporal embeddings, time-based filtering, recency weighting
Components: Time-aware indexing, temporal ranking, freshness scoring
Use Case: News applications, time-sensitive information, evolving knowledge

14. Federated RAG

Description: Retrieval across multiple distributed knowledge sources
Features: Cross-source querying, source-specific optimization, result aggregation
Components: Multiple vector stores, federation layer, source routing
Use Case: Multi-tenant systems, organizational silos, diverse data sources

15. Streaming RAG

Description: Real-time retrieval and generation for continuous data
Features: Incremental updates, real-time indexing, streaming responses
Components: Stream processing, dynamic indexing, real-time APIs
Use Case: Live data feeds, real-time monitoring, continuous learning

Selection Criteria:

Query Complexity: Simple → Basic RAG; Complex → Agentic RAG
Accuracy Requirements: High → Self-RAG, CRAG; Standard → Advanced RAG
Data Types: Text → Basic RAG; Mixed → Multi-Modal RAG
Scale: Small → Naive RAG; Enterprise → Modular/Federated RAG
Real-time Needs: Static → Basic RAG; Dynamic → Streaming/Adaptive RAG
Relationship Importance: Entity-heavy → GraphRAG; Document-focused → Hierarchical RAG

Each pattern addresses specific use cases and requirements, and modern implementations often combine multiple patterns for optimal performance.

Academic Citations for Peer-Reviewed RAG Patterns

1. Basic/Foundational RAG

Primary Paper:

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). "Retrieval-augmented generation for knowledge-intensive NLP tasks." Advances in Neural Information Processing Systems, 33, 9459-9474.
Venue: NeurIPS 2020
Citations: 3,000+ (highly influential)

Follow-up Work:

Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., ... & Yih, W. T. (2020). "Dense passage retrieval for open-domain question answering." arXiv preprint arXiv:2004.04906.

2. Self-RAG

Primary Paper:

Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2023). "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection." International Conference on Learning Representations (ICLR), 2024.
Venue: ICLR 2024 (accepted)
arXiv: arXiv:2310.11511

3. Corrective RAG (CRAG)

Primary Paper:

Yan, S., Gu, J. C., Zhu, Y., & Ling, Z. H. (2024). "Corrective Retrieval Augmented Generation." International Conference on Learning Representations (ICLR), 2024.
Venue: ICLR 2024 (accepted)
arXiv: arXiv:2401.15884

4. GraphRAG (Academic Foundations)

Core Concept Papers:

Yasunaga, M., Leskovec, J., & Liang, P. (2021). "QA-GNN: Reasoning with language models and knowledge graphs for question answering." NAACL-HLT, 2021.
Venue: NAACL 2021
Zhang, Y., Li, X., Cui, L., Wu, B., Gu, Y., & Dublish, N. (2023). "Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models." arXiv preprint arXiv:2309.01219.

Microsoft's Implementation (Technical Report):

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., ... & Larson, J. (2024). "From Local to Global: A Graph RAG Approach to Query-Focused Summarization." arXiv preprint arXiv:2404.16130.

5. Fusion/Hybrid RAG

Rank Fusion Foundations:

Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). "Reciprocal rank fusion outperforms condorcet and individual rank learning methods." Proceedings of the 32nd international ACM SIGIR conference.
Venue: SIGIR 2009

Dense-Sparse Hybrid:

Gao, L., Ma, X., Lin, J., & Callan, J. (2023). "Precise Zero-Shot Dense Retrieval without Relevance Labels." Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.
Venue: ACL 2023
Lin, S. C., Yang, J. H., & Lin, J. (2021). "Distilling dense representations for ranking using tightly-coupled teachers." arXiv preprint arXiv:2010.11386.

6. Multi-Modal RAG (Emerging Academic Work)

Foundation Papers:

Chen, J., Lin, H., Han, X., & Sun, L. (2023). "M-BEIR: A Multi-domain Benchmark for Multi-modal Information Retrieval." arXiv preprint arXiv:2308.14565.
Li, J., Li, D., Xiong, C., & Hoi, S. (2022). "BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation." ICML, 2022.
Venue: ICML 2022

7. Conversational RAG

Academic Foundations:

Qu, C., Yang, L., Qiu, M., Croft, W. B., Zhang, Y., & Iyyer, M. (2019). "BERT with history answer embedding for conversational question answering." Proceedings of the 42nd International ACM SIGIR Conference.
Venue: SIGIR 2019
Anantha, R., Vakulenko, S., Tu, Z., Longpre, S., Pulman, S., & Chappidi, S. (2021). "Open-domain question answering goes conversational via question rewriting." NAACL-HLT, 2021.
Venue: NAACL 2021

Key Academic Venues for RAG Research:

NLP: ACL, EMNLP, NAACL, EACL
ML: ICLR, NeurIPS, ICML
IR: SIGIR, ECIR, CIKM, WSDM
AI: AAAI, IJCAI

Citation Format Note:

These papers represent the academic foundation. Many "industry patterns" (like Modular RAG, Agentic RAG) are implementation frameworks from companies like:

LangChain/LangSmith documentation
LlamaIndex research papers (often arXiv, not peer-reviewed)
OpenAI, Anthropic technical reports
AWS, Google Cloud, Microsoft technical documentation

Agentic RAG

Agentic RAG is essentially a combination/application of:

Basic RAG (Lewis et al., 2020) - for retrieval mechanism
ReAct (Yao et al., 2022) - for reasoning and action planning
Tool Learning (various papers) - for dynamic tool selection
Multi-step reasoning - from planning literature

Industry Implementation vs Academic Foundation:

Academic Status:

The components are academically established
The specific combination as "Agentic RAG" is primarily industry terminology

Industry Sources:

LangChain/LangGraph documentation
LlamaIndex agent frameworks
OpenAI Assistants API documentation
Anthropic Claude function calling
AWS Bedrock Agents

RAG Patterns Supported on AWS/Bedrock

Fully Supported on AWS Bedrock

1. Basic/Naive RAG

AWS Services: Bedrock Knowledge Bases, OpenSearch, Kendra
Implementation: Direct integration with foundation models
Status: Production-ready, well-documented

2. Advanced RAG

AWS Services: Bedrock Knowledge Bases with chunking strategies, Kendra intelligent ranking
Features: Query preprocessing, result re-ranking, metadata filtering
Status: Supported through configuration options

3. Agentic RAG

AWS Services: Bedrock Agents (dedicated service)
Capabilities: Function calling, tool orchestration, multi-step reasoning
Integration: Lambda functions, API Gateway, other AWS services
Status: Native support, actively developed

4. Multi-Modal RAG

AWS Services: Bedrock (Claude 3, GPT-4V), Rekognition, Textract
Support: Text + image retrieval and generation
Status: Supported through multimodal foundation models

5. Conversational RAG

AWS Services: Bedrock with conversation memory, DynamoDB for session storage
Features: Context preservation, multi-turn conversations
Status: Supported through application design patterns

Partially Supported (Requires Custom Implementation)

6. Self-RAG

Implementation: Custom logic using Bedrock APIs + Lambda
Components: Retrieval quality assessment, response verification
Status: Possible but requires significant custom development

7. Corrective RAG (CRAG)

Implementation: Bedrock + custom orchestration + web search APIs
Components: Quality assessment, fallback to web search
Status: Achievable through multi-service architecture

8. GraphRAG

AWS Services: Neptune (graph database) + Bedrock
Implementation: Custom integration between graph queries and LLM generation
Status: Technically possible, limited native support

9. Fusion/Hybrid RAG

AWS Services: Kendra (keyword) + OpenSearch (semantic) + custom ranking
Implementation: Multi-retriever setup with result fusion
Status: Achievable through architecture design

10. Hierarchical RAG

Implementation: Custom chunking strategies in Bedrock Knowledge Bases
Features: Document structure preservation, multi-level retrieval
Status: Limited native support, mostly custom implementation

Custom Implementation

11. Modular RAG

Status: Framework concept, not a specific AWS service feature
Implementation: Achievable through microservices architecture

12. Adaptive RAG

Implementation: Custom routing logic + multiple Bedrock configurations
Status: Requires significant custom orchestration

13. Federated RAG

Implementation: Cross-account/cross-region custom setup
Status: Architecturally possible but complex

14. Temporal RAG

Implementation: Custom time-aware indexing + metadata filtering
Status: Limited native temporal awareness

15. Streaming RAG

AWS Services: Kinesis + Lambda + Bedrock for real-time processing
Status: Possible through event-driven architecture

AWS Bedrock-Specific Capabilities:

Bedrock Knowledge Bases

Vector storage (OpenSearch, Pinecone, Redis)
Automatic chunking and embedding
Metadata filtering and hybrid search
Integration with S3, SharePoint, Confluence, Salesforce

Bedrock Agents

Function Calling: Lambda integration
Action Groups: Custom tool definitions
Memory: Conversation persistence
Orchestration: Multi-step reasoning and planning
Guardrails: Safety and content filtering

Supported Foundation Models

Anthropic Claude: Text and multimodal
Amazon Titan: Text embeddings and generation
Cohere: Text generation and embeddings
Meta Llama: Text generation
Stability AI: Image generation

Recommended Architecture Patterns on AWS:

Production-Ready (Native Support):

Basic RAG: Bedrock Knowledge Bases + Foundation Models
Agentic RAG: Bedrock Agents + Lambda functions
Conversational RAG: Bedrock + DynamoDB + API Gateway

Advanced (Custom Implementation):

Hybrid RAG: Kendra + OpenSearch + custom fusion logic
Multi-Modal RAG: Bedrock multimodal models + S3 + Rekognition
Self-RAG: Custom orchestration with quality assessment

Academically Established RAG Patterns (Peer-Reviewed):

1. Basic/Naive RAG

Paper: "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (Lewis et al., 2020, NeurIPS)
Status: Foundational paper, highly cited (~3000+ citations)

2. Self-RAG

Paper: "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection" (Asai et al., 2023, ICLR 2024)
Status: Peer-reviewed, significant impact

3. Corrective RAG (CRAG)

Paper: "Corrective Retrieval Augmented Generation" (Yan et al., 2024, ICLR 2024)
Status: Recently accepted at top-tier venue

4. GraphRAG

Papers: Multiple works including "Graph-RAG" papers and Microsoft's GraphRAG implementation
Status: Mixed - concept is established, specific implementations vary

5. Fusion/Hybrid RAG

Papers: "Precise Zero-Shot Dense Retrieval without Relevance Labels" (Gao et al., 2023) and related fusion techniques
Status: Rank fusion methods well-established in IR literature

Industry/Practitioner Patterns

6. Modular RAG - More of an engineering pattern from LangChain, LlamaIndex
7. Agentic RAG - Emerging from agent frameworks, not formalized academically yet
8. Hierarchical RAG - Implementation pattern, limited formal research
9. Adaptive RAG - Engineering optimization, some academic work emerging
10. Multi-Modal RAG - Active research area but fragmented approaches
Others - Mostly industry terminology or implementation patterns

Academic Research Sources:

ACL, EMNLP, NAACL - Main NLP venues
ICLR, NeurIPS, ICML - ML conferences with RAG research
SIGIR, ECIR - Information retrieval conferences
arXiv - Preprints (not peer-reviewed but influential)

Know your model's capabilities and limitations

It is important to consider the selected model's known capabilities and limitations while prompting the model. In this lab, we are using the Claude3 models which have the following guidelines for effective prompting:

Claude 3 Models:

Input Format: Images need to be provided in a base64-encoded format.
Image Size: Individual image size cannot exceed 5MB
Multiple Images: Claude 3 models supports prompting with up to 5 images.
Image Format: Supported image formats: 'PNG', 'JPEG', 'WebP' and 'GIF'.
Image Clarity: Clear images which are not too blurry are more effective with Claude 3 models.
Image Placement*: Claude 3 models work better when images come before text while prompting. However, if the use case requires, images can follow text or can be interpolated with text.
Image Resolution: If the image's long edge is more than 1568 pixels, or the image is more than ~1600 tokens, it will first be scaled down, preserving aspect ratio, until it is within size limits. If the input image is too large and needs to be resized, it will increase latency of time-to-first-token, without giving you any additional model performance. Very small images under 200 pixels on any given edge may lead to degraded performance.

Page updated

Google Sites

Report abuse