Sabari M

SABARI M

M.Tech in Computer Science & Engineering

Indian Institute of Technology, Kanpur

Email: msabari23@iitk.ac.in, sabarim@cse.iitk.ac.in, sabarim2131@gmail.com

LinkedIn: Sabari M

EDUCATION

M.Tech THESIS

Autonomous Swarm Drone System for Target Protection, Tracking, & Area Scouting (Mar ’24 - Present)

Guide: Prof. Nisheeth Srivastava

  • Designing a ROS2 + PX4 based swarm drone system with egocentric coordinates & minimal inter-bot communication.
  • Implementing LiDAR-based obstacle and collision avoidance (To be extended to computer vision) & Gazebo simulation.
  • Research Areas: Drones, Swarm Robotics, Egocentric Coordinates, Edge Computing, Computer Vision
  • INDUSTRY EXPERIENCE

    Senior Associate (Senior ML/Data Innovation Engineer) | EY [ERNST & YOUNG] (May'20 - Jul'23)

  • Part of R&D, Prototyping, and Innovation team [NITRO] within EY GDS Client Technology.
  • Managed various aspects of Data/ML-centric applications including Big Data, ETL/ELT Data Pipelines, ML Experiments, Data Augmentation, Explainable AI, Continuous Learning & Human in the loop, MLOps & DevOps Pipelines [CI/CD].
  • Data Scientist | ATTINAD SOFTWARE (Oct'18 - Apr-20)

    Software Engineer Trainee | WIPRO LIMITED (Jul'18 - Oct'18)

    INTERNSHIP

    Software Development Intern | TCS Remote Internship Programme | TATA Consultancy Services (May'17 - Aug'17)

    RESEARCH PROJECTS

    SemEval 2024: STR (Semantic Textual Relatedness) for Asian & African Languages (MTech - CS779) (Aug ’23 - Dec ’23)

  • Research Project under guidance of Prof. Ashutosh Modi
  • Investigated the current state of research in STR and proposed improvements. Evaluated multilingual models from HuggingFace with notable performance in STS (Semantic Textual Similarity) task and assessed their performance on STR.
  • Proposed and fine-tuned E5 model using Contrastive loss, achieving improved performance over benchmark. Also, proposed pretrained E5 embedding + DBSCAN for unsupervised solution.
  • Research Areas: Natural Language Processing, Semantic Textual Relatedness, Multilingual Models, Contrastive Loss, LLMs (Large Language Models)
  • A Survey of Optimizing Tools for Solidity Smart Contracts (M.Tech - CS738) (Jan ’23 - Apr ’23)

  • Research Project under guidance of Prof. Amey Karkare. Currently working on the final draft of a survey paper.
  • Reviewed Solidity smart contract optimizing tools, their functionalities and codebases. Investigated tools offering additional features such as vulnerability assessment, code coverage, & symbolic execution.
  • Proposed additional functionalities for Solidity compiler that are currently missing, but present in LLVM-based compilers.
  • Research Areas: Smart Contracts, Compiler Optimizations, Gas Optimization
  • TECHNICAL SKILLS

    Machine Learning:

  • TensorFlow, Keras, PyTorch, Sklearn, XGBoost, Huggingface, LLMs, Glove, Word2Vec, TFIDF, Elastic Search, Spacy, NLTK, Stanford Core NLP, OpenCV, YOLO, Tesseract, OCR, Azure ML, ML Flow, Azure Cognitive Services, RASA, Sentence Transformer, DialogFlow, MS Bot Framework, Plotly, Matplotlib, VOTT, Doccano
  • Data Pipelines & Data Governance:

  • PySpark, Pandas, NumPy, Modin, DBT, Prefect, Airflow, AirByte, Purview, Immuta, Azure Data Factory
  • Data Storage:

  • Delta Lake, Data Lake, Data Warehouse, PostgreSQL, MySQL, Redis Cache, Mongo DB, Synapse, ADLS
  • Languages:

  • Python, C, C++, NodeJS
  • API:

  • (REST API) FastAPI, Flask, Express, Restify & (GraphQL) Hasura
  • Dev Tools:

  • Docker, Kubernetes, GIT, Postman, Docker Compose, Jira, Azure DevOps, PyCharm, VS Code, Linux
  • Async:

  • Azure - Service Bus Queue, Functions, Event Grid; Rabbit MQ
  • Robotics:

  • ROS2, PX4 Autopilot, Gazebo
  • Platforms:

  • Databricks, Elastic, Jupyter, Azure ML Studio, Azure Data Factory, Kibana
  • Cloud:

  • Azure
  • INDUSTRY PROJECTS

    ESG Data Modeling (EY) (Aug’22 - Jun’23)


  • Worked on a modern Data Fabric Solution for ESG, to centralize access to diverse ESG data sources (Refinitiv & Ethos). Experimented with modern data stack components, including Prefect, DBT, Airflow, Airbyte. Developed ETL pipelines using Azure Data Factory, Databricks Workflow & PySpark (Python & SQL). Implemented Delta Lake Bronze-Silver-Gold data modeling with raw data in Azure Data Lake; tracked data lineage with Azure Purview. Also explored Azure Synapse.
  • Built database schemas in PostgreSQL and exposed data through GraphQL APIs using Hasura connected to the Database.
  • Automated Audit Record Processing (EY) (Jul’20 - Jun’22)


  • Solution for automated analysis of audit records by grouping similar records, rule-based scoring & industry standard mapping. Grouping text using Sentence Embedding, Cosine Similarity & learned similarity metric using Keras Siamese model+pretrained embeddings (Huggingface); Fine-tuned a Squad trained RoBERTa Model with custom dataset.
  • Developed multiple text scoring modules using word limit based, Flesch readability scores, and abbreviation expansions. Mapping to standards using Elastic Search, leveraging multi-search and weighted search features.
  • Conducted exploratory data analysis (EDA) to determine the impact of tag components (description, category), and gathered insights from stakeholders. Utilized Azure ML Batch Inference Pipeline for parallel processing on clusters. Deployed trigger-based tasks as Azure Functions & FastAPI microservices in Azure WebApps with App Insights for tracking.
  • Document Parser & Entity Extraction (EY) (Jun’20 - Aug’22)


  • Parsing audit documents using PDFMiner & PyMuPDF. Table extraction using Camelot, object detection+Tesseract OCR, and using custom clustering of text blocks based on coordinates and heuristics.
  • Implemented Named Entity Recognition (NER) with SpaCy, Azure Cognitive Services & Text classification using GloVe embeddings+Deep Learning(TensorFlow), and a multi-model solution with XGBoost+Sentence Transformer.
  • Designed & implemented MLOps pipelines for model fine-tuning, versioning, training stats reporting, and deploying across environments using Docker, MySQL, Azure ML Studio, Azure Functions & Azure DevOps Pipelines. Deployed functionalities as Python Flask microservices with Swagger and Azure AD-based Bearer token Authentication.
  • Stock Prediction & Analysis POC (EY) (Oct’20 - Jun’21)


  • Contributed to a Quantum Computing proof-of-concept(POC), focusing specifically on classical implementations for benchmarking. Performed Monte Carlo simulation for Portfolio Optimization using Markowitz Portfolio Theorem.
  • Developed backend using FastAPI having Azure AD Authentication. Accessed data from Yahoo Finance.
  • ML Model Reusability Platform (EY) (Jan’22 - Aug’22)


  • Contributed to designing and implementing a platform for reusing ML modules and models across the firm. Developed MLOps pipelines for model fine-tuning based on customer feedback and new use case data; packaged models and modules using Docker.
  • Created scripts for triggering MLOps pipelines and deploying them through Azure DevOps Pipelines.
  • Managerial Evaluation Tool (Attinad/Cymorg)(Oct’18 - Jan’20)


  • Created custom chatbot framework capable of handling context and waterfall model, utilizing Node.js, Redis Cache, MongoDB and RASA NLU. Built an Express API-based web app to serve the implementation.
  • Incorporated Semantic Similarity features using pretrained Transformer models & performed clustering of words for insight. Explored Dialogflow & Microsoft Bot Framework NodeJS SDK with LUIS.
  • Engaged in direct and continuous client interactions to integrate the latest AI trends into the product.
  • ACADEMIC PROJECTS

    ML Algorithm Implementations (M.Tech - CS771) (Aug’23 - Dec’23)


  • Prototype-Based Zero Shot Classifier: Implemented a prototype-based zero-shot classification algorithm using class attribute vectors. Computed prototypes for unseen classes using two methods: (1) a convex combination of seen class means with a cosine similarity of class attribute vectors as weights, and (2) a regularized linear multi-output regression model trained to predict unseen class means based on their attribute vectors. Processing done using Numpy & Pandas.
  • Kernel Ridge Regression: Developed Kernel Ridge Regression using an RBF kernel, experimenting with varying regularization parameters. Also implemented using landmark points for feature creation using the RBF kernel.
  • K-Means for Linearly Inseparable Data: Applied K-Means on linearly inseparable datasets by modifying features through: (1) handcrafted methods like in terms of the mean of the dataset, and (2) kernel-based techniques with landmarks.
  • Dimensionality Reduction: Utilized PCA and t-SNE for dimensionality reduction using Sklearn.
  • Machine Translation for Indian Languages (M.Tech - CS779) (Aug’23 - Dec’23)


  • Developed Seq2Seq and Transformer models in PyTorch for translating text between English and Indian languages. Explored individual models for each language pair & a single model for all.
  • Text generation using Greedy & Beam Search. Performance metrics like charF++ Score, ROUGE Score, and BLEU Score with a total performance score of 80%.
  • Performed EDA and text cleaning/preprocessing in Pandas, including grouping using DBScan+FastText Embedding, word count analysis, and filtering based on invalid characters, reducing the size of the dataset by 30%. Data Augmentation strategies explored.
  • Advanced Compiler Optimization Techniques using LLVM (M.Tech - CS738) (Jan’24 - Apr’24)


  • LLVM Infrastructure Setup & Analysis: Installed LLVM from source, analyzing source code to understand its modular design, architecture, 3-phase compilation, and various optimization passes like Dead Code Elimination & Loop Unrolling.
  • IR Customization and Optimization: Updated LLVM source code to include metadata in Intermediate Representation (IR) visible with the -emit-llvm flag. Analyzed optimization levels (-O1, -O2, -O3) with C++ examples and explored limitations of LLVM optimizations, particularly those constrained by runtime data or domain-specific knowledge.
  • Control Flow and Data Flow Analysis: Performed Available Expression and Live Variable analysis on Control Flow Graphs, assessing the impact of initialization strategies on analysis accuracy.
  • Visual Exploration of Indian Election Statistics 2019 (M.Tech - CS661)(Jan’24 - Apr’24)


  • Developed an interactive dashboard to analyze and interpret the 2019 Indian Elections, offering insights into candidate success factors and voter behavior. EDA and data cleaning/processing performed over the dataset using Pandas.
  • Created visualizations and facilitated user-friendly data exploration using DASH & Plotly.
  • Classroom Exam Proctoring System (M.Tech - CS724) (Aug’23 - Dec’23)


  • Developed a classroom proctoring system using Computer Vision techniques, including Face Detection, Head Movement Detection, and Eye Tracking using OpenCV & DLib.
  • Autonomous Navigation Bot with Computer Vision (B.Tech Final Project) (Apr’17 - Apr’18)


  • Developed an autonomous rover using an InceptionV3 Image Classification model, OpenCV for Image Processing and a Flask app for streaming the camera feed, with a Raspberry Pi as the core processing unit & Arduino for controlling motors.
  • SELF PROJECTS

    Exploration of Advanced NLP and Computer Vision Technologies (Self Project)

  • Analyzed LLM research papers and considered fine-tuning strategies. Explored LangChain for integrating language models into complex workflows, evaluated RAG (Retrieval-Augmented Generation) & Assessed CrewAI for advanced AI agent-based solutions.
  • Investigated Vision Transformers (ViTs) for improved image classification & feature analysis, Generative Adversarial Networks (GANs) for high-quality image generation & SimCLR for self-supervised learning.
  • SCHOLASTIC ACHIEVEMENTS

    CERTIFICATIONS

    POSITIONS OF RESPONSIBILITY

    RELEVANT COURSES