Credit risk projects for Bank, NBFC, Fintechs

This course is designed for professionals, who are working into different domains. A basic qualification would be an engineering, IT, Statistics or analyst background. No programming background is required. This course is very methodical, designed with a step by step approach, to bridge the gap to be a data scientist. Total duration of the course is 4 to 5 months.

First month

We introduce the Big picture. We start with a Churn prediction project – which is domain agnostic, since churn happens across all domains – Telecom, Banking, Ecommerce etc. We cover bivariate analysis including fine classing, coarse classing, Weight of evidence, Information value. First project we build using RPART in R, It’s a single tree model, a very accurate algorithm used in Data Science industry. We also cover the business impact part using the data science model – a must skill to crack high value interviews. Here you also learn lot of data science skills in Microsoft Excel.

Next project is building the same churn model using Random Forest, a mainstream classification model algorithm. We compare advantages of both the model algorithms and discuss an application of those in Fintech Credit Risk projects – in MSME renewal credit risk model, An application of trade off between Risk classification and Churn, a high business impact project that helped many students to crack high CTC Data Science and Credit Risk interviews.

Second month

We introduce Python and construct building. Python is the mainstream programming language today in Data science, analytics, Machine learning and Artificial intelligence. We learn 7 pillars in Python programming for Data science – Subset, aggregation, data structure, loop, function, join and Numpy. Along with those we learn 6 most frequently appeared bugs in Pythons and how to handle those. At the end of the 1 month course, you can effortlessly write and automate python codes using loops and functions.

Along with Python skills, we teach construct building in the second month. Construct is approach to solve a data science problem – a must skill to be a successful data scientist. In fact how good you are at building constructs, would decide how successful you would be in your data science career. We pick up a real business problem, and solve it using at least 5 constructs, while programming part is done in Python.

Third Month

We teach Machine learning algorithms, covering supervised and unsupervised Machine learning problems. In supervised classification, we learn industry way of variable selection using Information value, Stepwise etc. Model building is done using Weight of Evidence transformation – a global practice in classification modelling. We also learn dummy or one-hot- encoding method and the application of it.

In classification we will cover Random Forest, XgBoost algorithms, these are the most frequently used algorithms in data science industry. We will also cover latest industry techniques like – GridSearch, hyperparameter tuning, accuracy metrics – Gini, KS, AUC, ROC etc and model calibration using offset & PDO method.

In regression we will cover Linear regression, Random Forest Regression, XGboost Regression, SVM, Lasso, Ridge Regression etc. We will solve a house price prediction problem in this course- which is a famous Data Science problem. We will compare accuracy of all the regression models using accuracy metrics – R square, Adjusted R square, Root Mean Square Error. In unsupervised algorithm we cover clustering, Manhattan distance, Euclidian Distance – all industry practice in Data science.

Fourth and Fifth Month

We cover 8 data science projects – covering Credit risk, digital Marketing , Ecommerce domains.

1. Credit risk Application scorecard model – covering probability of default, Ready Reckoner, Explainability using Shapely, Reject Inferencing, alternate data. It’s a high impact project and helped many students to crack high CTC interviews.

2. Credit Risk Behaviour scorecard model – Covering probability of default later in the tenure, cross sell and upsell strategy, Renewal, PD/LGD/EAD/ECL calculation.

3. Credit Risk Collection scorecard model – Early warning model, Recovery prediction model, High-Medium-Low recovery prediction model for Written off pool.

4. Digital Lead Prioritization – How to prioritize digital leads in Premium plus, Premium, Average, Low, Junk category where Premium Plus and Premium contributes 10 times conversion and 50%+ revenue. A high impact Data science and Machine Learning use case to increase operational efficiency, helped many students to crack high CTC interviews.

5. Interest Rate propensity – An unsupervised algorithm finding look-a-like customers and recommending first-time-right interest rate. An unique solution to increase conversion and reduce processing time, an efficiency use case.

6. Credit Risk ECL calculation – PD/LGD/EAD/ECL calculation , understanding how to use macroeconomic features.

7. Custom model building for consumer, Microfinance , MSME

8. Model testing, Model monitoring & recalibration. From Fourth month students start attending interviews as well and mostly high CTC Data science interviews are cracked in these two months. Handholding and Live support – Post cracking interviews, When students start working as a data scientist, we provide complete support in initial months till students feel confident in the new role. We make sure that the career transition into data science is successful and smooth.