Research Output

Neuro-Oncology & Medical Imaging

Current research at TCG CREST, in collaboration with the University of Pennsylvania, focuses on advanced DCE–MRI analysis of glioblastoma multiforme. The work employs voxel-wise parsimonious pharmacokinetic modelling and multiparametric habitat-imaging combined with machine learning pipelines for treatment response classification. A second manuscript examines high-dimensional radiomic feature extraction across multimodal MRI sequences, achieving AUC 0.89 using LightGBM and Random Forest with SHAP-based interpretability.

T1ce — Pre-treatment
T2-FLAIR — Oedema mapping
DCE Ktrans map
Ve parameter map
Figure 1. Representative multiparametric MRI sequences from DCE-MRI glioblastoma analysis. Panels show T1-weighted contrast-enhanced, T2-FLAIR, and pharmacokinetic parameter maps (Ktrans, Ve) derived from voxel-wise Tofts modelling. Habitat regions are delineated by automated nnU-Net segmentation. Dataset ref.: TCG CREST × UPenn collaboration — Manuscript in preparation

Machine Learning Pipeline & Model Architecture

The end-to-end radiomics pipeline integrates preprocessing, nnU-Net-based tumour segmentation, PyRadiomics feature extraction (150+ features per modality), feature selection via LASSO regularization, and ensemble classification with LightGBM and Random Forest. SHAP values provide post-hoc feature attribution aligned with clinical interpretability requirements.

Figure 2 — End-to-End Radiomics ML Pipeline
MRI Input
DICOM / NIfTI
Raw Data
Preprocessing
SimpleITK
N4 / Co-reg.
nnU-Net
Segmentation
Tumour ROI
PyRadiomics
150+ features
Feature Eng.
LightGBM
+ RF Ensemble
Classification
SHAP
Explain.
XAI Output
0.89
AUC-ROC
150+
Features
5-fold
CV Strategy
LASSO
Feature Sel.
Figure 2. Schematic of the end-to-end radiomics pipeline for glioblastoma classification. Raw DICOM input undergoes N4 bias-field correction and co-registration (SimpleITK), followed by automated segmentation (nnU-Net), high-dimensional feature extraction (PyRadiomics), LASSO-based feature selection, and ensemble classification with LightGBM and Random Forest. SHAP-based explainability maps feature contributions to model predictions.

Climate Science — Spatiotemporal Deep Learning

As first and corresponding author, ongoing work designs a ConvLSTM–Transformer hybrid architecture for forecasting dry versus humid heatwaves over India using ERA5 reanalysis data. Physics-guided feature selection incorporates thermodynamic and hydrological thresholds for categorical labelling. Explainability via SHAP and Grad-CAM evaluates learned atmospheric mechanisms.

Figure 3 — ERA5 Heatwave Spatial Distribution, India (Composite)
48C 43C 38C 33C
Northwest India
Central India
Gangetic Plain
Deccan Plateau
Figure 3. Composite spatial distribution of surface air temperature anomalies associated with dry heatwave events over India, derived from ERA5 reanalysis (0.25 degree resolution). Color scale indicates temperature departure from climatological baseline. Red regions denote severe dry heatwave occurrence with thermodynamic and hydrological thresholds applied for categorical labelling. Data source: ERA5, ECMWF Copernicus Climate Data Store — Manuscript in preparation (First Author)

Professional Experience

Artificial Intelligence Research Intern
May 2025 – Present
TCG CREST × University of Pennsylvania Collaboration
Advanced DCE–MRI analysis in glioblastoma using voxel-wise parsimonious pharmacokinetic modelling, multiparametric habitat-imaging, and machine learning for response classification. Developing quantitative radiomics pipelines and contributing to multiple manuscripts on AI-driven biomarker discovery.
Subject Matter Expert — Statistics
Nov 2022 – May 2025
Chegg Inc. — Remote
700+ high-accuracy solutions across probability, regression, inference, optimization, and stochastic processes. Maintained 98%+ learner satisfaction rating across 2.5 years of continuous engagement.
Data Science Intern
Sep 2024 – Oct 2024
NIELIT Kolkata
ETL pipeline design using SQL, Hadoop, and MongoDB for heterogeneous large-scale datasets. Improved data retrieval efficiency via indexing, schema redesign, and distributed pipeline optimization (HDFS, MapReduce).