Blog Logo

Selected Projects

LLMs & Foundation Models

Wi-Fi Foundation Model

Wi-Fi Foundation Model Pipeline

Designed and implemented a large-scale data management pipeline using AWS S3 and SageMaker Studio to support the training of a Wi-Fi domain-specific foundation model. The model uses Transformer architecture and self-supervised learning techniques such as contrastive loss and masking. I led the data ingestion, quality control, versioning, and orchestration of large-scale training jobs. Pipeline refactoring improved reproducibility and reduced training costs.

Machine Learning Projects

Four-Class Motion Classification

Home Monitoring 3.0 (Deep Learning)

Built a deep learning framework in PyTorch to classify motion types (human, pets, iRobots, fans) using Wi-Fi sensing data. Evaluated architectures including LeNet, ResNet, ViT, RNN, and Bi-LSTM. Leveraged transfer learning for robustness on low-quality sensors. The project contributed to multiple publications and improved classification accuracy by ~20%.

SVM Home Monitoring

Home Monitoring 3.0 (SVM)

Developed a real-time motion detection and classification engine using SVM on Wi-Fi sensing signals. Built an end-to-end pipeline for preprocessing, feature engineering, and binary classification. Production deployment achieved a 40% reduction in false alarms and supported deployment on multiple sensor devices.

Data Analysis & NLP Projects

IEEE Affiliation

Affiliation Name Disambiguation (IEEE)

Rebuilt a machine learning pipeline for cleaning and clustering author affiliation names from IEEE publications. Applied BERT embeddings and hierarchical clustering to disambiguate affiliation entities with 89% accuracy. Enhanced data preprocessing with multiprocessing and proxy-based crawling, improving runtime by 90%. Tools used include Python, AWS Redshift, and EC2.