Role
Full-Stack Developer
Stack
PyTorch · WebRTC · MediaPipe · Flask · JavaScript
Timeline
Spring 2025 Academic Project
A real-time virtual photo booth that enables two remotely connected users to appear in the same artistic-styled frame using peer-to-peer video streaming and GAN-based style transfer. Built as an end-to-end web application combining computer vision, generative AI, and real-time video processing.
Create an interactive multi-user photo booth that goes beyond single-user filters (Instagram, Snapchat) by enabling real-time artistic style transfer for two people in a shared virtual space.
Machine Learning: PyTorch, Pix2Pix GAN, U-Net architecture, PatchGAN discriminator
Computer Vision: MediaPipe, HTML5 Canvas, real-time segmentation
Web Technologies: JavaScript, WebRTC, PeerJS, RESTful APIs
Backend: Flask, base64 image processing, GPU acceleration
Development: Cross-browser testing, performance optimization
Frontend: Browser-based interface (HTML5, CSS3, JavaScript)
Video Streaming: WebRTC peer-to-peer connections via PeerJS library
Backend: Flask server hosting trained Pix2Pix model with RESTful API
Computer Vision: MediaPipe Selfie Segmentation for real-time person extraction
Challenge: Standard GAN models too slow for interactive applications
Solution: Optimized Pix2Pix architecture with reduced resolution and efficient server deployment
Result: Achieved 70-90ms processing time with 10-15 fps sustainable frame rate for real-time interaction
Challenge: Synchronizing video streams between different devices and network conditions
Solution: WebRTC peer-to-peer connections with PeerJS abstraction layer
Result: Stable connections with 300-500ms end-to-end latency and minimal server infrastructure
Challenge: Running complex segmentation models efficiently in web browsers
Solution: MediaPipe's optimized web assembly implementation for client-side processing
Result: Robust person segmentation across lighting conditions while eliminating server load
Challenge: WebRTC and canvas inconsistencies across different browsers
Solution: Browser detection with fallback handling and optimized performance for Chrome/Firefox
Result: Consistent user experience with graceful degradation for unsupported features
Challenge: Balancing visual quality with real-time performance constraints
Solution: Strategic resolution choices (128×128 processing, 640×480 display) with client-side throttling
Result: Interactive performance achieved; identified 320×240 as optimal production resolution
This project demonstrates the practical integration of advanced ML techniques with modern web technologies, showcasing skills in:
The successful combination of GANs, computer vision, and web technologies creates an interactive experience that pushes the boundaries of what's possible in browser-based machine learning applications.