UNIVERSITY OF CARDIFF
MAT012 Credit Risk Scoring
Assignment 2021/22
This forms your assessment (100%) of this module.
There are two parts to this assessment.
Part A contains THREE short essay-based questions and counts for 50% of the final mark.
Part B contains FOUR tasks to establish a scorecard using the given dataset and counts for 50% of the final mark. You may use Excel, SAS, R or Python to assist in the scorecard preparation.
You must answer ALL questions.
Submission must be made by 3pm on Friday 1st April via Learning Central, and instructions will follow shortly on how to do this. You will need to submit a single file containing answers to all questions; any spreadsheet analysis, workings or coding necessary can be shown in an Appendix in that file. Only the submitted file will be marked.
PART A
1. Critically examine what needs to be considered when developing a credit risk scoring model.
[20 marks]
2. Detail the history of the Basel Accords and discuss the challenges in modelling the credit risk on a portfolio of consumer loans.
[15 marks]
3. There is increasing concern that consumers have limited or no access to mainstream credit because of “outdated and unnecessarily restrictive credit scoring models” [1]. Discuss the opportunities, risks and challenges this presents to lenders from a credit risk scoring perspective.
[15 marks]
[1] https://vantagescore.com/credit-scoring-and-financial-inclusion-research-study/
PART B
The dataset underpinning the analysis here is that used in the lab sessions during lectures. It has been uploaded as a spreadsheet named ‘German’ together with the data dictionary ‘German data dictionary’ describing each attribute. You will recall that the dataset consists of data for 1000 applicants along with a variable that says whether they were subsequently Good or Bad from a credit perspective.
1. Split the dataset into two subsets as follows:
Subset 1: the applicants with Duration <= 12 months Subset 2: the applicants where Duration > 12 months
Clean the subsets if necessary.
[5 marks]
2. For each subset, establish a training set and validation set. Explain:
a. what principle you have used to decide on these;
b. why both training and validation sets are needed;
c. any issues encountered during the splitting exercise.
[5 marks]
3. For each training set choose four variables which are suitable for building a scorecard. For each training set the variables must have (i) at least one continuous variable before binning; (ii) at least one categorical variable with more than two categories, so you can see whether categories can be combined.
Explain the rationale behind your choice of variables (using supporting statistics eg chi-square). Should you be unable to choose variables satisfying the above criteria, explain the problem you have encountered and the solution you have chosen to compromise the variable selection.
[10 marks]
4. Using the binary variables obtained from the coarse classification in the above exercise to build two scorecards for each training set (so, two scorecards for those applicants with Duration <= 12 months; another two for those with Duration > 12 months), one using linear regression and one using logistic regression.
Note that the file you submit should include, in the Appendix, a table that gives the binary variables you used, together with the coefficients for those variables calculated in each regression.
[15 marks]
5. Derive ROC curves for all scorecards using the validation set applicable to each, showing in detail how sensitivity and specificity have been calculated. Estimate the Gini coefficient and KS values for each. Explain and comment on your results.
[15 marks]