The process of scoring development in the ITCB (Bankárkéző) solution includes the examination of the consistency of the basic data, the expertly acceptable grouping of individual variables, and (typically) the construction of logistic regression models. The commercially interpretable scorecard is determined on the basis of the final statistical model, which can be modified on the basis of expert consultation if required. The solution also includes the initial validation of the final scorecards.

The steps for developing rating systems (scoring and rating models) were defined based on international best practice, and include the steps for scoring development accepted in international banking groups as well:


Base Data Analysis

Examination of collected basic data, the purpose of which is to filter out inconsistent data and variables, as well as to filter out variables and cases with incomplete cases. During the data analysis, we create the fixed reference data set (Reference Data Set, "RDS") either in our own data mining tool (SPSS PASW Modeler) or in the infrastructure provided by the Bank (e.g. SAS Base, SAS Enterprise Guide, SAS Enterprise Miner), based on which every step of the development can be traced during any subsequent inspection. The RDS is stored in a separate file, which the Bank must keep until the next development, ensuring reproducibility. The RDS is already the final, filtered modeling database, it does not include the so-called "grey zone", which covers during the development those customers who cannot be part of the development (e.g. customers suspected of fraud, late customers who cannot be decide exactly whether they are considered bad or good debtors).


Definition of default

Definition of default in the light of what customers the client considers bad, and, where appropriate, analysis and development of its own definition in accordance with Basel regulations. We analyze each default reason separately and try to apply the entire Basel definition.


Single Factor Analysis

Single-variable data analysis during which variables that move well with the default rate are analyzed and selected. The purpose of this process step is to establish clear trends between the values ​​of the variable and the default rate. Because of the scorecard logic, we categorize each relevant variable with the help of the data mining tool, which enables the optimal categorization of variables and the perfect capture of non-linear effects. Only variables that can be properly interpreted economically (expertly) can be included in the modeling. In the methodology we follow, the examination of stability over time, even at the level of individual variables, is very important, we leave out variables that are unstable over time.



Selection of development sample and test sample. A larger development sample and a smaller test sample (ideally 50-50%, otherwise 70%-30%) are selected. Logistic regression (Multi Factor Analysis) is used to examine the significance of individual variables on the development sample, in which step the insignificant variables are filtered out and the correlation between variables is also examined. Alternative models are created, of which we define the final model together with our client. If required, in addition to logistic regression, other modeling techniques are also tried, e.g. cluster analysis, principal component analysis, decision trees or neural networks.


Model testing

We test the model with a standard statistical measure on the testing sample. The individual statistical measures used are defined and their typical values ​​are described in the documentation. Based on the coefficients of the final model, the values ​​of the individual variables are transformed into scores (Scorecard Creation). Individual scores of the developed, statistically based scorecard can be modified on the basis of experts, the value set of variables can be expanded, and new variables can also be created.



We document the entire process in detail.

