Manoj Patra's Blog: Credit Scoring on German Credit Dataset

----------------------------------------------------------------------------------------------------------

by Manoj Patra and Nikita Naidu

Click here to download the project

----------------------------------------------------------------------------------------------------------

Abstract

The main aim of this project is to obtain a model to perform credit scoring. Credit scoring or credit risk assessment is an important research issue in the banking industry. The major challenge of credit scoring is to categorize the profitable customers by predicting the bankrupts. The data set for the experiment is taken from the UCI Machine learning repository. German Credit data set has data on 1000 past credit applicants, described by 20 attributes. Each applicant is rated as “Good” or “Bad” credit. This model will help in deciding whether the loan should be granted to the new customer or not. We want to obtain a model that may be used to determine if new applicants present a good or bad credit risk.

----------------------------------------------------------------------------------------------------------

1.0 Introduction

Customer credit scoring model is a statistical method used to predict the probability that a loan applicant or existing borrowers will default or become delinquent, which was founded based on the characteristics in numeral of samples data in history to isolate the effects of various applicant characteristics on delinquencies and defaults. With the credit cards as well as a variety of personal consumption credit scale enlarged rapidly, the prevention of credit risk becomes highly concerned issues by financial institutions. Thus, it is essential that how to establish the credit scoring model matching the customer characteristics which can provide intellectual support for the decision makers. Credit scoring models have been widely studied in the areas of statistics, machine learning, and artificial intelligence. The advantages of credit scoring include reducing the cost of credit analysis, enabling faster credit decisions, closer monitoring of existing accounts, and prioritizing collections. With the growth of the credit industry and the large loan portfolios under management today, credit industry is actively developing more accurate credit scoring models. The main aim of this project is comparing the performance of the typical methods. In this project, we consider four different methods to classify; they are Logistic Regression, Multilayer Perception, Decision Tree and Support Vector Machine.

2.0 Problem Definition

Credit scoring or credit risk assessment is an important research issue in the banking industry. The major challenge of credit scoring is to categorize the profitable customers by predicting the bankrupts. So how to decide whether the loan should be granted to the new customer or not. We want to obtain a model that may be used to determine if new applicants present a good or bad credit risk. There are 1000 instances and 20 attributes in the data set. We have to build a model which can predict the characteristics of a new customer i.e. good or bad. We also measure the performance of each model using Accuracy, Sensitivity, Specificity and ROC Curve to obtain the best model.

3.0 Methodology Followed

Classification is a data mining (machine learning) technique used to predict class level for data instances. In this project, we are using four different classification techniques to predict the class level. Several major kinds of classification method including Logistic Regression, Multilayer Perception, Support Vector Machine and decision tree induction techniques. We are using WEKA suite for the illustration of different models. The outputs of these models are shown bellow in tabular form.

(Comparison of different model in tabular form)

1. Multilayer Perceptron:

A multilayer perceptron (MLP) is a feed forward artificial neural network model that maps sets of input data onto a set of appropriate outputs. The output we got from this model is attached bellow.

2. Logistic Regression:

3. Decision Tree:

4. Support Vector Machine:

5. Tree Generated:

6. Logistic Regression with SMOTE 100%

7. Support Vector Machine with SMOTE 100%

8. ROC Curve

9. Work flow Diagram

4.0 Research and Discussion

First of all we had gone through all the models available in WEKA suite. The best result we got is described above in tabular form. The first best solution we got was the accuracy of 73.5 using Multilayer Perceptron. Then using Logistic Regression we got an accuracy of 76.5, followed by 77.0 and 77.5 using Decision Tree and Support Vector Machine respectively. All the above results were found by splitting the data set into 80% training data and 205 test data. But this was not the best solution we wanted, so we used SMOTE with Logistic Regression and SMOTE with Support Vector Machine. The best result we got by these two model was the accuracy of 79.6 and 80.6 using Logistic Regression and Support Vector Machine respectively.

5.0 Conclusion

Classification is a form of data analysis that extracts models describing important data classes. We have developed an effective and scalable model using SMOTE in collaboration with Support Vector Machine and 10 FCV. We have evaluated the model using several metrics including accuracy, sensitivity, Specificity, Mean Absolute Error, Root Mean Square Error and Real Absolute Error. 10-fold cross-validation is recommended for accuracy estimation and Significance tests and ROC curves are useful for model selection.

1 comment:

UnknownNovember 28, 2015 at 2:36 PM
$$$ GENUINE LOAN WITH 3% INTEREST RATE CONTACT US FOR MORE DETAILS $$$.
Are you looking for a loan to clear off your dept and start up your own Business? have you being going all over yet not able to get a legit loan Company that will loan you? Here is your final solution, We can give you any amount you need provided you are going to pay back within the period of time given without any problem. Apply now and contact us for more details via email below.
Email: henski.john46@gmail.com

Application For loan.
First Name:
Last Name:
Date Of Birth:
Address:
Sex:
Phone No:
City:
Zip Code:
State:
Country:
Nationality:
Occupation:
Monthly Income:
Amount Needed:
Duration:
Purpose of the loan:
E-mail address:

Email: henski.john46@gmail.com

Wednesday, November 4, 2015

Credit Scoring on German Credit Dataset

----------------------------------------------------------------------------------------------------------

by Manoj Patra and Nikita Naidu

Click here to download the project

----------------------------------------------------------------------------------------------------------

Abstract

----------------------------------------------------------------------------------------------------------

1.0 Introduction

2.0 Problem Definition

3.0 Methodology Followed

1. Multilayer Perceptron:

2. Logistic Regression:

3. Decision Tree:

4. Support Vector Machine:

5. Tree Generated:

6. Logistic Regression with SMOTE 100%

7. Support Vector Machine with SMOTE 100%

8. ROC Curve

9. Work flow Diagram

4.0 Research and Discussion

5.0 Conclusion

1 comment: