Mihir Bansal
Hey there! Thanks for visiting my webpage!
I am a Master's student at Carnegie Mellon University, specializing in Data Science. Previously, I worked as a Software Engineer at Microsoft on improving Search Engines at scale and improving the ranking and summarization of web search results using LLMs. I graduated from BITS Pilani with a bachelor’s degree in Computer Science and a minor degree in Data Science. I have worked on various projects involving deep learning, time series analysis, social network analysis, and NLP question answering, collaborating with prestigious institutions and organizations such as JPMorgan Chase, University of Hamburg, and CSIR-CEERI.
I am excited about building tools for reproducible AI research, working on developing new algorithms and designing scalable software which will help in training, visualizing and evaluating machine learning models in real-time on static and dynamic datasets. I am proficient in C, C++, Java, C#, and Python, and have experience with frameworks such as TensorFlow, PyTorch, and Keras.
Please feel free to reach out to me at mihirban@andrew.cmu.edu.

News

  • [Sep 2023]: Our paper on "Using Wikidata for Enhancing Compositionality in Pretrained Language Models" was presented at the "14th International Conference on Recent Advances in Natural Language Processing" at Varna, Bulgaria; published in ACL Anthology.
  • [May 2023]: Joined Carnegie Mellon University for pursuing Master's degree in Data Science.
  • [Jul 2022]: Joined Microsoft in Hyderabad as a Software Engineer.
  • [Feb 2022]: Our paper on "Cuckoo search in threshold optimization for better event detection in social networks" got published in the "Social Network Analysis and Mining" Journal by Springer.
  • [Jan 2022]: Joined JPMorgan Chase & Co. in Mumbai as a Quantitative Research Analyst Intern.
  • [Aug 2021]: Started collaborating with the Language Technology group at University of Hamburg on an NLP project.
  • [May 2021]: Joined Microsoft in Hyderabad as a Software Engineering Intern.
  • [May 2020]: Joined CEERI in Pilani as a Research Intern.
  • [Aug 2018]: Joined BITS Pilani at Hyderabad for pursuing Bachelor's degree in Computer Science.

Education

Carnegie Mellon University
Carnegie Mellon University, Pittsburgh, PA, United States
Master of Computational Data Science  
(May 2023  -  Dec 2024)
  • CGPA: 3.93/4.00
  • Teaching Assistant for the course 'Advanced Natural Language Processing'
  • Courses: Machine Learning, Advanced NLP, Search Engines, Cloud Computing
Birla Institute of Technology & Science, Pilani
Birla Institute of Technology & Science (BITS) Pilani, India
B.E.(Hons.) in Computer Science, Minor Degree in Data Science  
(Aug 2018  -  Jun 2022)
  • CGPA: 9.37/10.00
  • Teaching Assistant for the course 'Object Oriented Programming' and 'Mathematics III (Differential Equations)'
  • Activities: Competitive Programming, Member of Student's Union Technical Team (Android App Development), Member of Movie Club (Video Editing Team), Pianist, Lawn Tennis player

Experience

Software Engineer, Microsoft
(July 2022  -  Aug 2023)
Improving Search Experience for Natural Language Queries on Outlook Search: I worked in the MSAI (Microsoft Search, Assistant & Intelligence) Team in Hyderabad, India on improving search experience for queries related to calendar and acronyms on Outlook search. I improved the ranking of acronym search results by performing a multigram matching with top user AI graph nodes, improving user engagement by 14%. I also developed a webpage summarizer with webpage summary and FAQs for Bing web search results by using GPT-3.
Quantitative Research Analyst Intern, JPMorgan Chase & Co.
(January 2022  -  June 2022)
Building a Rating Migration Model with Z-Factor approach: I worked in the Wholesale Credit Risk Team in Mumbai, India on building a Rating Migration model, using a Z-factor approach, which effectively converts a Rating Migration matrix to a single value with minimum loss of information. The Z-values are then modeled with the macroeconomic scenarios in order to project scenario driven rating migrations in the future.
Research Intern, University of Hamburg
(August 2021  -  June 2022)
Enhancing Semantics in Pretrained Language Models: I worked in the Language Technology Group with Prof. Biemann on fine-tuning BERT Pre-Trained Language Model for improving semantics-based question answering. We finetuned BERT with the knowledge graph mined from Wikidata and achieved a significant improvement on the GLUE score.
Software Engineering Intern, Microsoft
(May 2021  -  July 2021)
Improving the Relevance of Natural Language Queries on Outlook Search: I worked in the MSAI (Microsoft Search, Assistant & Intelligence) Team in Hyderabad, India on improving the relevance of Person-based Natural Language queries on Outlook Search. I performed optimized REST API calls to identify group participants in user's meetings on Outlook, improving the execution time of the Time Based Assistant by 60.8% during the bootstrapping process and by 80.0% during successive executions.
(May 2020  -  July 2020)
Deep-Learning approach for Air Pollution Forecasting: I performed Time Series analysis by implementing Deep Learning and statistical analysis models like LSTM, ARIMA and SVR Regression models in Python to predict the concentration of air pollutant PM 2.5, by using the hourly Meteorological and air pollution data of Delhi-NCR region. I analyzed the changes in concentration of pollutant PM 2.5 in Delhi due to lockdown by comparing the predicted and actual concentration of PM 2.5 for a time period of 6 months.

Publications

Using Wikidata for Enhancing Compositionality in Pretrained Language Models
Meriem Beloucif, Mihir Bansal, Chris Biemann
Cuckoo search in threshold optimization for better event detection in social networks
B.S.A.S Rajita, Mihir Bansal, Bipin Sai Narwa, Subhrakanta Panda

Projects

This is an Emotion Detection project, which classifies the emotion of a person in the image into 'sad' and 'not sad' categories, with the help of Genetic Algorithms by making use of the best subset of features. It has been developed in Python, with the help of libraries like numpy, pandas, skit-learn and matplot. The image dataset has been preprocessed using the OpenFace Toolkit, which generates the facial landmarks of each image fed in as an input.
This is an Android application, developed in Java, using the Firebase database server. It has features to schedule Student-Professor meetings, a Laundry Management System and Cab sharing facilities.
This is an Online Cab Booking application designed for booking cabs as per the requirements of the customers. Developed in Java Swing, with MySQL as the backend server for the application. The communication between the server and the Java application happens through the JDBC Driver file. The application supports multiple customers simultaneously.
This is a Python application that conforms to a Client-Server architecture using the Shift Repeat protocol with a reliable UDP transport protocol. In order to implement reliability on top of UDP, we designed an application layer middle-ware protocol. The proposed protocol uses the 3-way handshaking concept to establish connections reliably. It uses timers, ACKs, and sequence numbers to ensure lost/corrupted packets can be recovered and delivered in order.
This is a Java application developed on the idea of Blockchain technology, which helps in maintaining a secure cryptocurrency exchange account for the users. The users are authenticated through a Zero Knowledge Proof problem. For a cryptocurrency transaction, the Elliptic Curve Cryptography algorithm is used to generate each user’s private and public keys in a KeyPair. The transactions between two users are then hashed with the public key of the organization and the transaction block is added to the history of transactions under the user authenticated.
This is an implementation of a compiler for a miniature programming language, 'C++++', which can parse a set of sequential statements, conditional constructs, loop constructs and functions. It has a DFA-based Lexical Analyser that recognizes the basic lexemes of the code written in C++++. The generated tokens are then output in the form of a Parse Tree. In case of errors the parser reports the errors and continues parsing.

Achievements

  • Recipient of Institute Merit Scholarship for academic excellence in every semester at BITS Pilani. [2018-2022]
  • Recipient of Scholar's Blazer for six consecutive years of academic excellence from Class VI to Class XI. [2017]