Portfolio

Niraj Mohabey

Data Scientist building machine learning, retrieval, and AI systems for production use.

I got into data science to build systems that people can actually rely on.

I build machine learning and retrieval systems delivered through data pipelines and product-facing tools.

PythonSQLML SystemsRetrieval Systems

About

What I build

I build practical machine learning systems that work reliably in real-world environments.

Impact comes from more than model accuracy. Systems need to be scalable, reliable, and aligned with the decisions they support.

My work spans document intelligence, pricing, fraud detection, and customer analytics, connecting modeling with infrastructure such as retrieval pipelines, real-time inference, and internal tools.

Most of what I build sits at the intersection of machine learning, backend systems, and business context.

Projects

Selected work

These projects show how I work: framing problems, building systems, and driving real decisions.

Customer Churn Prevention System

End-to-end churn analytics platform integrating modeling, explainability, SQL analysis, and ROI simulation.

Built XGBoost churn model with 90% precision, reducing detection latency by 40% using real-time analytics.

PythonXGBoostKafkaPostgreSQL

Real-Time Graph Fraud Detection

Graph fraud detection system for real-time transaction monitoring and fraud-ring identification at production latency.

Detected 95% of fraud rings using a real-time graph pipeline with ~100ms latency for transaction monitoring.

PyTorchKafkaFlinkPython

Customer Support Agent Workflow

Multi-step agent system for handling customer queries using tool selection, retrieval, and escalation workflows.

Designed agent routing with tool selection and confidence scoring, enabling automated resolution and fallback to human escalation.

LangGraphOpenAIQdrantFastAPI

Streaming Data Pipeline for Event Processing

Distributed streaming pipeline for ingesting, processing, and storing real-time event data.

Built Kafka-Spark pipeline processing streaming data into Cassandra with Airflow orchestration and fault-tolerant ingestion.

KafkaSparkAirflowCassandra

Experience

Work history

Data Science Intern

Liquidity AI Capital Corp.

02/2026 - Present

  • Built an internal AI assistant using Python APIs and WhatsApp workflows to automate deal alerts and streamline analyst coordination.

  • Designed lead intelligence pipelines that transformed raw prospecting data into structured signals for M&A analysis and deal sourcing.

  • Developed data ingestion and enrichment workflows that improved investment data quality and significantly accelerated deal screening decisions.

AI Intern

Right Skale Inc.

09/2025 - 11/2025

  • Architected a multi-tenant RAG platform enabling secure document ingestion and retrieval, reducing enterprise document lookup time by 60 percent.

  • Improved retrieval accuracy by 35 percent using embedding-based search, hybrid retrieval strategies, reranking pipelines, and scalable vector databases.

  • Optimized LLM inference pipelines with caching and parallel execution, reducing latency by 40 percent while increasing system throughput threefold.

Data Science Intern

Sunrise Group USA LLC

06/2022 - 11/2023

  • Built a loan risk prediction model using support vector machines achieving 83 percent accuracy, improving reliability of credit evaluation decisions.

  • Developed a medical cost prediction model with an interactive Streamlit interface enabling scenario analysis and better decision-making for stakeholders.

  • Implemented preprocessing and feature engineering pipelines transforming raw financial and insurance datasets into structured, high-quality model-ready inputs.

Sponsored Work

Industry projects

Applied projects with external partners, built with the same systems mindset as my internships.

Enterprise GenAI Assistant

Fallon Health

08/2025 - 12/2025

  • Developed a Copilot Studio assistant enabling secure question answering over healthcare provider contract data within a controlled enterprise environment.

  • Designed prompt engineering and evaluation frameworks improving response accuracy, consistency, and robustness across complex healthcare contract queries.

  • Integrated structured retrieval workflows to enable reliable, context-aware responses over sensitive enterprise healthcare documentation and knowledge systems.

Fixed-Income ETF Pricing Engine

Mitsubishi UFJ Financial Group

01/2025 - 05/2025

  • Built a fixed-income ETF pricing engine modeling over 5000 bonds with less than 0.5 percent deviation from Bloomberg benchmark pricing.

  • Developed real-time pricing dashboards improving portfolio risk evaluation and enabling faster, data-driven trader decision-making workflows.

  • Translated quantitative financial models into production-ready Python and SQL pipelines with integrated analytics and interactive visualization tooling.

Skills

Capabilities

ML & AI Systems

RAGLLM pipelinesPromptingEvaluationAgentsExplainability

Machine Learning

PredictionNLPTime-seriesGraph MLFeaturesValidation

Data & Infrastructure

PythonSparkAirflowSnowflakeKafkaPostgreSQLData Modeling

Tools & Platforms

DockerMLflowTableauPower BIAWSStreamlitQdrantOpenAI

Contact

Get in touch

I typically respond within 24 hours.