Hi, I'm Sam

and this is me in a nutshell.

Testimonials

"... not only does he have a strong technical skill set, but he also has the ability to learn any skill. Because of his strong work ethic and general AI expertise, I recommend him to any organization."

Julien Cheng

Software & Infrastructure Engineer · Cloud9 Esports Inc.

"Sam demonstrated strong technical skills, dedication, and a strong passion ... his research skills, technical proficiency, collaborative nature, and dedication to his work make him an excellent asset"

Elizabeth Boschee

Associate Director, AI Division · USC Information Sciences Institute

"Samarth distinguished himself by being able to research exceptionally well on our use case of ML and other possibilities ... He is very committed to the task at hand and will go lengths to achieve results."

Vyshak Venkatesh

AI Team Lead · Piltover Technologies

"... exceptional motivation and perseverance towards goal ... ability to work independently ... successful projects all-around ... I would certainly like to have him in my team to work with again."

Dr. Erik Cambria

Provost Chair · NTU Singapore

"... proved to possess an abstract thinking style, innate intelligence, and solid problem-solving skills."

Dr. Bali Devi

Associate Professor · Manipal University Jaipur

"... exhibits outstanding analytical and critical thinking abilities, which he readily employs, and he has continued to develop at every chance under my wing."

Harish Sharma

Assistant Professor (Sr. Scale) · Manipal University Jaipur

"... a diligent student, Samarth displayed curiosity to learn the concepts and explore new areas of knowledge ... an excellent grasping power and logical approach towards problem-solving."

Dr. Prakash Ramani

Associate Professor · Manipal University Jaipur

"His tenacity to endure long working period with humanity and technical skill is appreciated by my colleagues as well. I strongly believe he is one of the best people I ever have worked with."

Vyshak Venkatesh

AI Team Lead · Piltover Technologies

Work Experience (2+ years)0→1 Builder

Data Scientist

Cloud9 Esports Inc.

Los Angeles, California

June 2024 – Present

•Architected an end-to-end Generative AI Content Ecosystem (Scripts, Clips, Captions) using Gemini, Claude, GPT, ElevenLabs and DeepL & Human-in-the-Loop RAG pipeline (Async FFmpeg, PostgreSQL, BM25, LangChain, yt-dlp) to automate 100% of short-form script drafting, multilingual captioning in 33+ languages, and viral highlights extraction, reducing manual curation time by 90%, increasing content output by 300% and saving 100+ labor hours and an estimated annualized $240k for the company
•Deployed a 'PMT Agent', an autonomous n8n workflow orchestrating Slack, Google Drive, and Gemini APIs to auto-capture, QA, and organize social media deliverables, eliminating manual reporting bottlenecks and ensuring 100% data accuracy for legal compliance
•Developed a League of Legends draft prediction engine using an ensemble of Random Forests and Sequential models, utilizing Reinforcement Learning for qualitative state evaluation and Riot API data to simulate probability-based draft scenarios
•Created an Automated Viewership Dashboard via a robust ETL pipeline aggregating metrics across 7 social media platforms; implemented Cron-based daily scheduling with file locking and exponential backoff, facilitating key strategic decisions, directly aiding Sales, Content, and Partnerships teams, generating an estimated $50k+ in annual labor savings
•Built a semantic search engine for 80TB of unstructured media on Google Drive using Gemini API for automated asset tagging, reducing retrieval time from ~20 minutes to less than 10 seconds via descriptive metadata indexing

Research Assistant

USC Information Sciences Institute

Los Angeles, California

Sep 2023 – May 2024

•Architected a RAG-based knowledge delivery system integrating Llama 2, Mistral, and Hugging Face embeddings, enhancing search relevance by pinpointing QA-style results and surpassing traditional passage retrieval methods through context-aware vector search
•Spearheaded prompt engineering efforts to optimize LLM performance on domain-specific queries, integrating retrieval and generation techniques to improve answer fidelity for complex research corpora

Internships (2+ Yrs)

Research Intern

Nanyang Technological University

Singapore

Jan 2022 – Jun 2022

•Developed a 'Sentic Computing' framework by scraping and processing Twitter data to assess mental well-being patterns, building a dashboard to visualize trends in online footprints via NLP-driven sentiment analysis
•Utilized a range of SenticNet APIs to enhance analysis validity, mapping unstructured social media text to psychological indicators for public health research

NLP Intern

KoiReader Technologies

India

Apr 2021 – Jul 2021

•Implemented BERT-based Named Entity Recognition (NER) models to automate information extraction from logistics documents
•Leveraged SpaCy, OCR, and StanfordNLP to perform Coreference Resolution on freight forwarding documents, significantly improving data extraction accuracy for supply chain automation, achieving 100% reduction in manual data entry efforts for billing forms

Data Science Intern

The Sparks Foundation

India

Aug 2020 – Sep 2020

•Mastered various concepts of Data Science including Keras, PyTorch, TensorFlow, XGBoost and EDA on industrial data

Machine Learning Intern

Piltover Technologies

India

Aug 2019 – Aug 2020

•Directed research on state-of-the-art technologies for EMG data processing and gained extensive knowledge in developing CNN models to assess and analyze EMG data to replicate gestures of a human arm and integrate them for 3D printed prosthetic limbs

Education

MS in Applied Data Science

University of Southern California

Los Angeles, California

Graduated Dec 2024

•CGPA: 3.9/4.0
•Relevant Coursework: NLP, Data Mining, Data Management, Big Data, Databases, Deep Learning, Machine Learning, Artificial Intelligence

BTech in Computer Science

Manipal University Jaipur

India

Graduated Jul 2022

•CGPA: 9.17/10.0
•Co-Curricular & Leadership: Head of Events (Founding Member) for Glitch! Esports Society of MUJ & Organizer for ACM MUJ

Projects (31)

Nuclear and Aviation Safety using NLP

RAG Docu-'Mentor' ChatAI

Session-Based RecSys with Continual Learning

Yelp Recommendation System

Smart Scout Football Analytics

Reddit Forum Analysis & Clustering

Algorithmic Trading System

Multi-Domain EDA & Business Insights

Research Work

NLP for Nuclear Safety

Award-winning NLP research automating safety trait violation detection using topic modeling and text analytics.

2023

•Recognition: Awarded Best Project and Best Data Science Developer at CKIDS DataFest '23 at USC, competing against 50+ teams across diverse technical domains.
•Problem Statement: Nuclear power plants generate thousands of safety reports annually. Manual review by safety analysts is time-consuming, expensive, and prone to human error. Developed an automated NLP system to identify safety trait violations, improving regulatory compliance and reducing operational risk.
•Methodology: Engineered a novel GuidedLDA (Guided Latent Dirichlet Allocation) topic modeling approach incorporating domain-specific seed words from nuclear safety experts. Unlike traditional unsupervised LDA, GuidedLDA steers topics toward predefined safety categories while discovering latent patterns in unstructured text data.
•Technical Pipeline: Built an end-to-end NLP pipeline featuring document parsing, text preprocessing, stopword removal, lemmatization, TF-IDF vectorization, and probabilistic topic modeling. Processed 10,000+ nuclear safety reports from NRC (Nuclear Regulatory Commission) databases.
•Key Results: Achieved 87% accuracy in identifying safety trait violations against expert-labeled ground truth. Successfully categorized reports into 8 distinct safety trait categories including Human Performance, Equipment Reliability, and Procedure Compliance.
•Impact: Reduced manual review time by ~60%, enabling safety analysts to prioritize high-risk violations. Methodology is transferable to other regulated industries (aviation, healthcare) requiring safety documentation analysis and compliance automation.

SenticNet & Affective Computing

Research on concept-level sentiment analysis leveraging knowledge graphs, commonsense reasoning, and hybrid AI.

2022

•Research Focus: Conducted in-depth research on SenticNet, a comprehensive knowledge base for concept-level sentiment analysis that combines symbolic AI reasoning with deep learning to understand semantics and sentics (emotion-related semantics) in natural language.
•Beyond Bag-of-Words: Traditional sentiment analysis treats text as isolated tokens, losing contextual meaning. SenticNet represents concepts in a multidimensional vector space capturing semantic and affective information. Example: 'birthday party' → positive, 'surprise test' → negative, despite neutral individual words.
•Commonsense Reasoning: Investigated how SenticNet leverages commonsense knowledge graphs for inference. The framework understands implicit sentiment (e.g., 'I bought a new car' implies positive sentiment) by reasoning about typical human experiences—enabling context-aware NLP.
•Affective Computing Dimensions: Studied the Hourglass of Emotions model categorizing sentiments across four dimensions: Pleasantness, Attention, Sensitivity, and Aptitude. This enables fine-grained emotion detection beyond binary positive/negative classification.
•Hybrid AI Architecture: Analyzed integration of symbolic AI (knowledge graphs, ontologies) with sub-symbolic AI (neural networks, embeddings) achieving both interpretability and accuracy. Hybrid approach shows 15-20% improvement over pure deep learning methods on benchmark datasets.
•Applications: Evaluated real-world applications including social media monitoring, brand sentiment analysis, customer feedback analysis, and emotion detection in customer service. Strong performance in nuanced domains like sarcasm detection and implicit sentiment recognition.

Auto Tagging System

ML-powered multi-label content classification system achieving high-precision automated tagging at scale.

2021

•Problem Context: Content platforms manage millions of documents requiring accurate categorization for search, recommendation engines, and content organization. Manual tagging is expensive and inconsistent. Built an automated classification system matching human-level tagging quality.
•Multi-Label Classification: Implemented multi-label classification assigning multiple relevant tags per content piece using Binary Relevance and Classifier Chains approaches. Example: tech article tagged with 'Machine Learning', 'Python', 'Tutorial', 'Beginner-Friendly' simultaneously.
•Feature Engineering: Developed comprehensive feature extraction pipeline combining TF-IDF vectors, word embeddings (Word2Vec, GloVe), and document embeddings (Doc2Vec). Incorporated metadata features, structural analysis, and Named Entity Recognition (NER) to boost classification accuracy.
•Model Architecture: Experimented with Logistic Regression (One-vs-Rest), Random Forest, XGBoost, and LSTM with attention mechanism. Final ensemble model combined top performers using weighted voting for robust predictions.
•Performance Results: Achieved Precision@5 of 0.89 and F1-Score of 0.84 on 50,000+ documents across 200+ tag categories. Strong performance on frequent tags with hierarchical classification maintaining accuracy on long-tail categories.
•Scalability & Deployment: Optimized for real-time inference with <100ms latency per document. Batch processing capability handling 10,000+ documents/hour. Modular architecture supports adding new categories via transfer learning without full retraining.

Get in Touch

I'm always open to discussing new opportunities for collaboration, or just having a chat about AI/ML, Formula 1, Barça or Golf. Feel free to reach out!

samarth.saxena1337@gmail.com