Schimba limba in: RO / EN


DECISION TREE OR LOGISTIC REGRESSION – WHICH BASIC MODEL IS BETTER?

Kitti Fodor

Department of Business Statistics and Economic Forecasting, Faculty of Economics, University of Miskolc, Miskolc, Hungary

kitti.fodor@uni-miskolc.hu

 

Abstract: In this paper, my aim is to show which of the data in the Central Credit Information System are the ones that influence the factors that are then used to perform the analysis using a decision tree and logistic regression, and I would like to know, which of the two basic model is the better one. For the analyses, I used a random sample of 500 items, reflecting the proportions of performing and non-performing loans in the population. For both methods, one variable was found to be significant, which was the ratio of the repayment to the contract amount, so this is the most significant of the data recorded by the Central Credit Information System in terms of loan defaults. If I compare the two methods, I can conclude that both methods have a high level of accuracy, but logistic regression is the one that produced better results, as it was able to identify a higher proportion of defaulted loans. Unfortunately, the decision tree could not identify any defaulting loans despite its higher classification accuracy. The reason can be the unfavourable sample composition. Finally, the logistic regression was able to categorize the transactions with 81,1% accuracy and has better AUC value and better value for Gini coefficients.

 

 

Keywords: loan default; decision tree; logistic regression, random sample; classification; ROC curve

 

http://doi.org/10.47535/1991AUOES32(2)006

 

VIEW/DOWNLOAD ARTICLE

Comments are closed.