Data Mining and Analysis 2020/21 Assessed Coursework Specification: Team-based Data Mining Project Overview This coursework assignment involves analysing a real-world dataset and providing meaningful insights into it in order to address some business concerns and problems identified. The objective of this assignment is to evaluate your understanding of the basic theory, concepts, and various algorithms … Continue reading “Team-based Data Mining Project | My Assignment Tutor”
Data Mining and Analysis 2020/21 Assessed Coursework Specification: Team-based Data Mining Project Overview This coursework assignment involves analysing a real-world dataset and providing meaningful insights into it in order to address some business concerns and problems identified. The objective of this assignment is to evaluate your understanding of the basic theory, concepts, and various algorithms in data mining, and assess your skills of applying SAS® Enterprise Miner and SAS® Enterprise Guide to carry out a data mining project. This team-based assignment is to be completed in groups of two. A number of real-world datasets for this project can be downloaded from the module Moodle site, and your team will be assigned a dataset by the module lecturer/tutor. Your role within your team is two-fold: working as a business client and as a data analyst. As a business client, you are expected to raise meaningful business concerns/problems in relation to the dataset you have been given. And as a data analyst, you are required to follow a proper data mining methodology and apply various techniques covered in lectures to analyse your data in order to address the business concerns and problems you have raised. Constant discussion among team members is essential. The project deliverable is a written team report (worth 80% of the total module coursework marks). The mark awarded for this assignment will be a team mark. The report will be due in week 11, Friday 30th April 2021 by themid-night. Your module tutor will check your project progress on weekly basis. In particular in weeks 7 and 10, you should have a detailed discussion with your module tutor regarding your data mining project. Tasks You are required to undertake the following tasks: Problem IdentificationRead the data description file to learn the basic characteristics of the dataset including the data source, the nature of the data, what it is about, the certain business context associated with the data, the total number of attributes (dimensions), the data type of each attribute, the value range/mode, skewness, and kurtosis of each attribute, the total number of instances, and simple data exploration with essential plotting, etc.Identify and understand the business problem of interest with regard to the data.Identify what data mining tasks need to be performed in order to address the business problem concerned. Data PreparationTransform the dataset into the proper format to be used by SAS® Enterprise Miner in order to carry out the required data mining tasks.Choose appropriate methods for data pre-processing, including detecting and dealing with missing values, outliers and imbalanced attribute values, changing data type, and conducting proper dimensionality reduction, feature extraction, data transformation, data partition, and normalisation, etc. where appropriate. Model ConstructionWith the pre-processed dataset undertake the data mining tasks you have identified. You are required to apply at least two different algorithms for both predictive and descriptive modelling. For predictive modelling, for example, you may use decision trees and artificial neural networks, or decision trees and k-nearest-neighbour based algorithm, etc. For descriptive modelling, you may choose to use the k-means clustering and histograms/bar charts/Person’s correlation coefficient, etc.In order to build the most appropriate and accurate models and identify meaningful hidden patterns, different settings for the relevant model parameters should be considered for each of the selected algorithms and approaches. Model Interpretation and EvaluationInterpret the descriptive models created.Compare the performances of different predictive models in terms of accuracy, error rate, generalisation ability (over-fitting), simplicity and cost, etc. where appropriate.Discuss the meaningfulness and usefulness of the models built and the patterns revealed, and how the models and patterns can be used to address the original business concerns. This includes both descriptive and predictive models. Final Report Your final report should be well-formatted as a formal report consisting of Cover page, Table of Contents, Abstract and References. The report should be submitted electronically for non-originality check via Turnitin on theVLE. Data Mining Project Marking Criteria Guidelines Element (% of Marks)0-4 Marks4-5 Marks5-6 Marks6-7 Marks7-10 MarksBusiness understanding and data understanding (15%)Inadequate analysis of business concerns and data mining tasks. Only simple initial data exploration performed. Lack clarity and relevance.Adequate analysis of the key business concerns and data mining tasks. Limited initial data exploration. Probably lack some relevance. Inappropriate means.Clear analysis of business concerns and relevant data mining tasks. Probably lack some in-depth view. Essential initial data exploration performed.Clear analysis of business concerns and relevant data mining tasks to a certain depth. Sensible initial data exploration performed with appropriate means.Thorough and clear analysis of business concerns and relevant data mining tasks. Excellent initial data exploration with effective means.Data pre-processing (25%)Inadequate view of data quality issues. Inappropriate approaches adopted. Poor use of SAS Enterprise Guide/Miner.Limited consideration of data quality issues. Some appropriate approaches adopted with limited understanding and limited coverage. Limited use of SAS Enterprise Guide/Miner.Reasonable consideration of data quality issues. Appropriate approaches adopted with reasonable understanding and most of the main issues covered. Good use of SAS Enterprise Guide/Miner.Good consideration of data quality issues. Appropriate approaches adopted with clear understanding and every aspect covered. Good and flexible use of SAS Enterprise Guide/Miner.Thorough consideration of data quality issues. Appropriate approaches adopted with outstanding understanding. Excellent use of SAS Enterprise Guide/Miner.Model construction (15%)Inappropriate algorithms employed. Poor use of SAS Enterprise Miner.Some appropriate algorithms employed with limited understanding. Limited use of SAS Enterprise Miner.Appropriate algorithms employed with reasonable understanding. Good use of SAS Enterprise Miner.Appropriate algorithms employed with clear understanding. Good and flexible use of SAS Enterprise Miner.Appropriate algorithms employed with outstanding understanding. Modelling with excellent working knowledge of SAS Enterprise Miner.Model evaluation (25%)Poor model interpretation and comparison with regards to business concerns. No or little meaningful models/patterns provided.Weak model interpretation and comparison with regards to business concerns. Very limited meaningfulness. Probably lack some clarity.Basic model interpretation and comparison with regards to business concerns. Reasonable models/patterns created.Clear model interpretation and comparison with regards to business concerns. Significantly meaningful models/patterns created.Thorough and clear model interpretation and comparison with regards to business concerns. Excellent meaningful models/patterns created.Report (20%)Inadequate review of project findings. Lack of clarity and accuracy. Poor presentation.Adequate review of project findings. Probably lack of some clarity. Acceptable presentation.Clear review and summary of project findings. Good presentation with proper structure and layout.Clear and concise summary of project findings. Excellent presentation. Clear structure and layout.Exceptionally clear and concise summary of project findings. May raise questions for future research. Outstanding presentation. Clear structure and layout.