A serverless, production-style GenAI chatbot designed to allow users (e.g., maintenance technicians) to ask natural-language questions about large PDF documents (like operational manuals) and receive accurate, contextual answers.
Built as part of a Founding Product Engineer interview challenge to demonstrate real-world problem-solving, cloud architecture, and GenAI integration.
Organizations maintain extensive collections of technical manuals, operational guides, and safety documents that span hundreds or thousands of pages. Extracting precise answers from these documents is time-consuming, as relevant information is often buried across multiple files and sections.There is a need for a system that allows users to ask natural-language questions directly to a large set of documents and receive concise, accurate answers without manually searching through each document.
Example Queries:
This project implements Retrieval-Augmented Generation (RAG) to replace manual document search with an intelligent, GenAI-powered chat interface.
The solution is Fully Serverless, Scalable, and AWS-Native.

Layer | Technologies | Description |
|---|---|---|
Frontend | React, S3, CloudFront | User interface for document upload and chat; hosted statically. |
Edge/API | API Gateway | Exposes a secure REST API endpoint. |
Backend / Compute | AWS Lambda (Node.js) | Handles all core logic including ingestion, RAG, and communication. |
GenAI | Amazon Bedrock (LLAMA2, Claude) | Large Language Models used for generating contextual answers. |
Vector DB | Pinecone | Stores embedded document chunks for efficient retrieval. |
Storage | Amazon S3 | Stores raw PDF documents. |
IaC | Serverless Framework | Infrastructure as Code for deploying the entire stack. |
The system transforms raw PDFs into a searchable vector index:

The architecture supports switching the underlying LLM:
This allows for easy comparison of outputs from the same prompt across different models.
Optimized the LLM interaction for clarity and reduced hallucination:
[INST] prompt formatting for improved structure and quality.Challenge | Problem Description | Solution / Learning |
|---|---|---|
LLAMA Output Quality | Vague and slow responses initially. | Applied structured |
Context Isolation | Model retrieved context from other sessions or documents. | Implemented Pinecone namespaces per chat or session. |
The project successfully delivered a real-world GenAI RAG system while navigating key engineering and cloud constraints:
Note: This project was developed as part of a confidential interview assignment. No proprietary documents or company data are included.
How did this piece land for you? React or drop your thoughts below.
Explore other projects and systems I've engineered.