100%
Local / No Cloud
Hybrid
Search Mode
RAG
With Citations
A document manager I built for my USC coursework. It does OCR on handwritten notes and PDFs, runs semantic search with pgvector, and answers questions about your docs using a local LLM.

I built Knowledge Hub because I was tired of digging through hundreds of course PDFs and lecture notes during my MS in CS at USC. I wanted one place where I could dump all my documents and actually find what I needed quickly. The system takes in PDFs and images, runs OCR to extract text (even from handwritten notes), chunks everything up, and stores vector embeddings in PostgreSQL with pgvector. That gives me two ways to search: regular full-text search and semantic search, where I can ask a question in plain English and get back the most relevant passages. The part I'm most proud of is the Q&A feature. It uses RAG with a local LLM running on Ollama (gemma3:1b) to answer questions about my documents and cite exactly where the answer came from. No API keys, no cloud dependency, everything runs locally. The whole thing is a Flask API with SQLAlchemy, containerized with Docker so setup is just `docker-compose up`. It's genuinely useful. I still use it to prep for exams and review research papers.




Upload docs, and the system pulls out metadata and organizes everything automatically
Extracts text from PDFs and images using OpenCV, PyMuPDF, and Tesseract
Find related content with vector similarity search, powered by pgvector and Sentence-Transformers
Ask questions about your docs and get answers with citations, powered by a local LLM through RAG
Friction
Takeaways
Every moving part explained: system architecture diagram, ingestion pipeline, full-text search in Postgres, pgvector semantic search with IVFFlat, z-score hybrid ranking, and the RAG prompt structure — with interactive SVG diagrams.