← Projects

Movie Recommender

My fiancée and I have a recurring problem: we want to watch something, but we spend half an hour scrolling through streaming apps trying to find a movie that fits the vibe we’re both in the mood for. It’s exhausting. And as someone who takes probably too much pride in finding hidden gems and getting people onto movies they wouldn’t have discovered otherwise — I figured I could build something better.

This project is also where I’ve been stress-testing what I’ve learned from ML projects at work. I wanted to think like a full-stack architect for ML problems — not just training models, but building a real app around them. That meant learning product design (my fiancée has been a relentless, if contractually obligated, feedback source) and thinking carefully about how different recommendation approaches serve different user needs.

The recommendation zones

Not every recommendation problem is the same. Sometimes you want “more like this.” Sometimes you want to discover something you’ve never heard of. So I built multiple approaches:

BERT Embeddings for Similarity

I built an embedding pipeline using BERT that encodes movies across multiple axes: title, description, genres, directors, cast, and production companies. These embeddings live in pgvector, which lets me do fast similarity searches at the database level. The idea is to capture what makes movies feel similar — beyond just matching genre tags.

Future idea: multiple embedding spaces focused on different axes. Right now everything’s in one space, but separating “vibe” from “cast” from “director style” could make things more nuanced.

Content-Based Anchor Search

Pick a movie you love, find more like it. Straightforward content-based filtering, but anchored to something you’ve explicitly said you care about.

Hidden Gems Finder

This is the one I’m proudest of. It finds movies with high similarity scores but low popularity — films that match your taste but flew under the radar. This is exactly what I want from recommendations: not “here’s what everyone’s watching” but “here’s something you’d probably love that you’ve never heard of.”

The stack

Backend (FastAPI): Auth, movies, ratings, favorites, watchlist, comments, and all the recommendation logic. pgvector handles the embedding similarity queries. Redis caches hot paths.

Frontend (React + Vite): TypeScript throughout. Typed API layer so there’s no guessing about data shapes. Combined user state endpoint (/users/me/state) reduces round trips after login.

Data: TMDB dataset (10k movies). Pipeline ingests, cleans, and computes BERT embeddings.

What I’ve learned

Product design is its own skill. Getting feedback from a real user — even one who’s contractually obligated to live with me — changed how I thought about the UI and recommendation explanations. Why is this being recommended matters as much as what is being recommended.

Also: multiple recommendation zones > one monolithic algorithm. Different people in different moods need different approaches. Sometimes you want “more like this.” Sometimes you want to be surprised.

Current state

Works, but rough. The core features are functional — auth, browsing, ratings, all three recommendation zones, watchlist. The UI needs polish, and there’s more tuning to do on the embedding quality. It’s also painfully slow on first load, the caching system works great afterwards thought.

What’s next

GPU-powered training and inference to speed things up. The BERT embedding step is slow on CPU, and there’s room to experiment with different model architectures. Collaborative filtering is also on the list — using rating patterns across users to complement the content-based approaches. I want to implement user interactions in real-time back to a specific zone.