Hi, I am
Hardeek Agarwaal
I am a
A Statistical Storyteller based in United States. Welcome to my digital space where data comes to life through compelling narratives. I specialize in transforming complex datasets into clear, actionable insights that drive impactful decisions. Leveraging advanced machine learning tools and techniques, I uncover statistically significant results that tell the true story behind the numbers. Whether you're looking to optimize processes, predict trends, or unlock hidden patterns, I'm here to help you navigate the data landscape with precision and creativity. Let's turn your data into a powerful story together.

Data Scientist

Built AI-powered leadership enablement agents at TARIY using AutoGen, LangChain, LangGraph, and LangSmith to deliver personalized coaching, real-time roleplay simulations, and adaptive training experiences through multi-agent orchestration. Collaborated with cross-functional teams and business units to implement agents as Microsoft Teams plugins, enabling seamless integration into existing workflows.

Data Scientist

Architected comprehensive ML pipelines and ETL workflows, designed predictive models using XGBoost and neural networks for customer analytics and fraud detection. Built scalable data processing infrastructure with Tableau, Snowflake, SQL, Python, achieving 15% improvement in model accuracy. Collaborated with different product teams to deliver end-to-end data solutions across multiple business units.

Data Engineer Intern

Optimized ETL workflows for the School Fuel database using an event-driven architecture with 5+ AWS services (Lambda, S3, Glue, Athena, IAM), integrated real-time alerts via SNS, and enhanced monitoring with CloudWatch—achieving up to 40% faster processing and 25% improved reliability.

Data Analyst Intern

Developed interactive visualization dashboards, handled missing data inconsistencies, automated workflows, collaborated with Data scientists to improve the model accuracy and maintain data integrity.

AI Notes Architecture

AI Notes Generator and Interview Simulator

Built production-scale AI-powered e-learning platform processing 50+ educational videos with automated note generation using AWS Transcribe and fine-tuned GPT-3.5-turbo. Implemented vector-based Q&A system with FAISS and RAG architecture, achieving 90% accuracy in content extraction and 85% relevance in automated assessments with <150ms response time.

Agentic RAG Architecture

Agentic RAG System for Financial Report QA

Built web scraping agent using Firecrawl, HTTP calls, and Agentic Tool Architecture to extract and answer queries from top 50 NASDAQ company reports. Integrated GPT-3.5-turbo (7B) via Azure OpenAI with LangGraph, FAISS, Postgres, and n8n; achieved <200ms latency and 24% gain in QA precision.

Stock Market Pipeline Architecture

Stock Analysis Pipeline

Built real-time stock market data pipeline processing 1M+ records daily using Apache Kafka, AWS services (S3, Athena, Glue, Lambda), and Python. Implemented streaming analytics with 99.9% uptime, automated ETL workflows, and interactive dashboards for comprehensive market analysis and predictive insights.

Customer Segmentation for ISPs (Cox & StarLink)

Developed a RoBERTa-based recommendation system analyzing 50K+ social posts for customer segmentation, achieving 92.7% accuracy and reducing latency from 250ms to 35ms using model distillation and TorchScript. Fine-tuned with LLM(RoBERTa) consisting of 100M+ tokens and optimized model efficiency by leveraging transfer learning and deploying a lightweight version on AWS EC2 with auto-scaling.

Perplexity-like Search Engine

Built production-scale search engine using Flask, GPT-3.5-turbo, and vector embeddings processing 10K+ queries daily. Implemented semantic search with FAISS, real-time web scraping, and deployed on AWS Fargate (ECS) with auto-scaling, achieving <300ms response time and 94% user satisfaction.

Movie Recommendation System

Built production-scale recommendation system processing 10M+ movie records using Two-Tower architecture and SVD collaborative filtering. Integrated LLM-powered content generation achieving variability and personalized recommendations improvement with 92% recommendation accuracy using Amazon SageMaker and S3 storage for efficient deployment.

USA Airlines Delay Analysis Dashboard

Developed interactive Tableau dashboard analyzing 2M+ flight records across all US airlines, processing real-time delay data with 99.5% accuracy. Implemented advanced analytics with Python, SQL, and Tableau Server, enabling stakeholders to identify delay patterns and operational inefficiencies, resulting in 15% improvement in decision-making speed.

Contact me