Technical Demo by Suranjan Das

Retrieval-Augmented Generation

A demo showcasing how RAG works using Nvidia's 10-K and 10-Q reports as source documents.

I've been getting asked about RAG systems by clients a lot lately. So I built this techincal demo to demonstrate how I approach RAG systems and shed a little light on why they work better when it comes to data retreival over just raw LLMs.

How RAG Works

RAG combines retrieval with generation. When you ask a question, the system first searches through the documents to find relevant chunks of information. These specific pieces are then fed to the LLM as context, allowing it to generate accurate answers grounded in your actual data rather than just its training knowledge.

Why RAG Beats Raw LLMs

Raw LLMs are limited to what they learned during training and can't access your specific documents or recent data. RAG gives the LLM exactly the context it needs from your documents in real-time, resulting in accurate, cited answers instead of hallucinations or outdated information.

What Does this Demo Do?

This demo lets you query a vector database that contains Nvidia's 10-K and 10-Q reports from 2021 to 2026. It lets you ask a natural language query on any information you want to extract from the files, uses its RAG system to fetch the correct citations from the source documents and then output a highly-relevant answer with sources included.

What Powers This Demo

This system is built with production-grade tools: LlamaIndex for orchestrating the RAG pipeline, ChromaDB as a high-performance vector database storing document embeddings, OpenAI API for intelligent responses, and Flask serving the backend. It includes enterprise features like year-range filtering to query specific time periods, semantic search using OpenAI embeddings to find relevant context, citation tracking so every answer shows its sources, built-in content moderation to filter inappropriate queries, rate limiting to prevent abuse.

Note: This instance has a global limit of 10 requests per hour. If you're a serious prospect and want to try it out, feel free to contact me - I'll hook you up. I should respond within 5-10 minutes during working days.

NVIDIA RAG Demo

Database 3984 chunks

Queries Used 0/10

Remaining 10

Year Range: to