Skip to main content

Semantic Search

·504 words·3 mins
Ian Jones
Author
Ian Jones

Purpose Of This Project
#

This project spawned from my professional experience in documentation, many times finding the most relevant information out of our documentation stores, and a lot of the time they don’t even know what’s in their own documentation either.

With my interest in Linear Algebra and Natural Language Processing as well as DevOps I figured that a solid project to work on would be to implement a full DevOps pipeline for a dockerized Semantic Search application.

Tooling Utilized (Subject To Change As The Project Progresses)
#

  • FastAPI - REST API
  • Python - Primary Programming Language
  • Docker - Containerization
  • ChromaDB - Vector database
  • VS Code - IDE Utilized
  • HTML/CSS/JavaScript - Web Frontend
  • GitHub - Version Control
  • GitHub Actions - CI/CD
  • GitHub Projects - Project Management
  • pytest - Unit Testing Framework
  • Mermaid - Diagramming

Weekly Progress
#

1/29/2026
#

Last Weeks Work
#

Last Week I spent my time thinking about what project I wanted to work on and making some neural connections between my work and the two other classes I’m taking right now with NLP and Linear Algebra.

My NLP class went over NLTK and how to process Corpora and this got me thinking about how documentation systems are just large distributed Corpora.

I researched how NLP and Linear Algebra could be applied to documentation and stumbled on this great Medium Article outlining what Semantic Search is and how these concepts might fit together, I also thought about how I’d like to incorporate a little bit of DevOps into this project so I looked into Docker and Github Actions a bit.

I cleaned up my Personal Portfolio Website it was a little neglected so I stood up a page for my project and changed some of the formatting and layout to be a bit more professional with the intent of showing off my portfolio of work.

I researched Vector Databases a bit to understand what I was getting myself into.

Work This Week
#

This week I plan to look a bit more into how to utilize existing Python libraries for handling .docx, .pdf, .txt, and .md files

Research a bit more about FastAPI

Look into the recommendation made to me by Professor Guinn Meta’s FAISS library for similarity search and clustering of dense vectors

Stand up my Github Repo for my project and start building

Pick my Vector Database

Impediments
#

I think the only impediment is that I’ll be a traveling a little bit over this next week so making sure that I manage my time properly and find ways to offline some of the research I wanted to do (texts, documentation sites, books, etc.) I’ll be throwing this on my iPad and utilizing that for my offline workflow.

Reflections
#

I think my process can work better if I create an architectural diagram to help me with the big picture, and get this scoped into a project tracker so that I have something to reference broken down into smaller chunks and to keep me focused and on track.