What It Does
An AI-powered web app enabling users to chat with uploaded PDFs, leveraging embeddings + LLMs. Full RAG pipeline from PDF upload to context-aware answer synthesis.
Key Features
- RAG pipeline — PDF upload → text extraction → recursive chunking → embeddings → vector search → LLM synthesis
- OCR fallback — Tesseract OCR for scanned documents when PyPDF2 fails
- Real-time WebSocket Q&A — Structured JSON messaging with processing status tracking
- Context-aware answers — Semantic search with graceful fallback when no relevant context found
- Full-stack — Next.js 15 frontend, FastAPI backend, NeonDB PostgreSQL, Cloudinary storage