School of Physics, Engineering and Computer Science
Page 1 of 7
Assignment Briefing Sheet (2020/21 Academic Year)
Section A: Assignment title, important dates and weighting
Assignment title: | Flexible REF/DEF | Group or individual: |
Individual |
Module title: | Data Mining | Module code: |
7COM1018 |
Module leader: | Paul Moggridge | Moderator’s initials: |
WJ |
Submission deadline: |
18th June 2021 17:00 |
Target date for return of marked assignment: |
5th July 2021 |
You are expected to spend about | 40 | hours to complete this assignment to a satisfactory standard. |
This assignment is worth | 40% | of the overall assessment for this module. |
Section B: Student(s) to complete
Student ID number | Year Code |
NOT NEEDED FOR ONLINE SUBMISSION |
Notes for students • For undergraduate modules, a score above 40% represent a pass performance at honours level. • For postgraduate modules, a score of 50% or above represents a pass mark. • Late submission of any item of coursework for each day or part thereof (or for hard copy submission only, working day or part thereof) for up to five days after the published deadline, coursework relating to modules at Levels 0, 4, 5, 6 submitted late (including deferred coursework, but with the exception of referred coursework), will have the numeric grade reduced by 10 grade points until or unless the numeric grade reaches or is 40. Where the numeric grade awarded for the assessment is less than 40, no lateness penalty will be applied. • Late submission of referred coursework will automatically be awarded a grade of zero (0). • Coursework (including deferred coursework) submitted later than five days (five working days in the case of hard copy submission) after the published deadline will be awarded a grade of zero (0). • Regulations governing assessment offences including Plagiarism and Collusion are available from https://www.herts.ac.uk/about-us/governance/university-policies-and-regulations-uprs/uprs (please refer to UPR AS14) • Guidance on avoiding plagiarism can be found here: https://herts.instructure.com/courses/61421/pages/referencing-avoiding plagiarism?module_item_id=779436 • Modules may have several components of assessment and may require a pass in all elements. For further details, please consult the relevant Module Handbook (available on Studynet/Canvas, under Module Information) or ask the Module Leader. |
School of Physics, Engineering and Computer Science
Page 2 of 7
Assignment Briefing Sheet (2020/21 Academic Year)
This Assignment assesses the following module Learning Outcomes (from Definitive Module Document): Successful students will typically: 2. be able to appreciate the strengths and limitations of various data mining models. 3. be able to critically evaluate, articulate and utilise a range of techniques for designing data mining systems. 4. be able to understand and reflect on the underlying ethical and legal issues and constraints on the holding and the use of data; 5. be able to critically evaluate different algorithms and models of data mining. |
Assignment Brief: In the workplace, you have been assigned to a new project, “recognizing supermarket purchase patterns”. At your next meeting with management, you have been asked to explain how the FP Tree (Association Mining) works. Your response must include: 1. A technical explanation, articulating how the algorithm works, showing how to work out the algorithm example by hand, using your own small example (14 marks) 2. Comments on the strength and limitations of the algorithm (8 marks) 3. Critically evaluate the algorithm for your given use case and compare with other similar algorithms and use-cases in research, the papers should be referenced, how you do this your choice (10 marks) 4. Describe and reflect on the ethical considerations for using this algorithm, for example could the algorithm produce bias results; how would this happen? (8 marks) In summary, the assignment is not to complete a data science project. Your task is to create a piece of work explaining an algorithm (for example a video) while considering the example of using it for recognizing supermarket purchase patterns. The flexibility is in the type of response, (report/video), the intention to allow you to perform at your best. In summary, your task is to explain how the FP Tree data mining algorithms works and comment on its fitness for “recognizing supermarket purchasing patterns”. |
Submission Requirements: You may choose from the below on how you respond to this assignment, • Video featuring a whiteboard / drawing app / pen and paper / PowerPoint (max. 16 minutes) • Voiced over PowerPoint (max. 16 minutes) • Large Poster with an Audio Recording (max. 16 minutes) • Technical Document (max. 1700 words) All length limits are flexible (+/- 10% and do not include figures, captions, and references). There are no marks for production quality although we kindly ask that make sure the video and audio quality is fit for purpose, (standard built in webcam and microphones should be suitable). For advise please speak to the module leader. The videos or documents are intended for a professional environment. Accepted formats for videos: mp4, webm, flv, mkv, avi, mov and wmv. Accepted formats for voiced over PowerPoints: pptx. |
School of Physics, Engineering and Computer Science
Page 3 of 7
Accepted formats voice over if separate to PowerPoint: mp3, wav, ogg, aac, wma and m4a. Accepted formats for posters and documents: pdf, docx, odt, png and svg. Referencing format is flexible, when using a video, references can appear on screen or be spoken either will be accepted (please identify the title, author, and the year). |
Marks awarded for: This assignment is worth 40% of the overall assessment for this module. Marks will be awarded out of 40 in the proportion: See marking scheme below. A reminder that all work should be your own. Videos/reports exceeding the maximum length may not be marked beyond length limit. |
Type of Feedback to be given for this assignment: Along with the marks, each student will receive individual written feedback on the online platform. |
School of Physics, Engineering and Computer Science
Page 4 of 7
Mark Scheme:
1.1 Explanation Quality / Algorithm Understanding
Assessment element | 0 | 1-3 | 4-6 | 7-10 | 11-14 |
A technical explanation, articulating how the algorithm works, showing how to work out different parts of the algorithm example by hand (14 marks) |
No discernable attempt at this element. |
Little/some understanding shown of the chosen algorithm. |
Good high-level understanding shown of the chosen algorithm. |
Very good understanding shown of the chosen algorithm. |
Excellent understanding shown of the chosen algorithm. |
Some steps of the algorithm are explained. With some calculations shown. |
All steps of the algorithm are explained. Most calculations shown. |
All steps are fully explained, demonstrating all calculations that need to occur at each step. |
|||
Limited use of visual aids (plots, tables, graphics) for explanation. |
Appropriate visual aids (plots, tables, graphics) have been used thought the explanation. |
Creative visual aids have been used to articulate concisely how each step works. This can be hand drawn or digital. |
|||
The original source of the algorithm has been referenced. |
The original source of the algorithm has been referenced and recent research using the algorithm has been cited. |
||||
Broad knowledge is demonstrated for example explaining how a step is like steps taken in other algorithms. |
|||||
Edge cases and/or challenging input shown. Demonstrating where the algorithm would fail or be less accurate. |
School of Physics, Engineering and Computer Science
Page 5 of 7
1.2 Knowledge of Strength and Limitations
Assessment element | 0 | 1-2 | 3-4 | 5-8 |
Comments on the strength and limitations of the algorithm (6 marks) |
No discernable attempt at this element |
The one or two commonly known strength and limitations of the chosen algorithm have been identified. |
Three strength and limitations of the chosen algorithm have been described. |
Four strengths and limitations of the chosen algorithm have been analyzed. |
Time and space requirements of the algorithm are briefly mentioned. |
Time and space requirements of the algorithm are analyzed. Big O notation is mentioned. |
|||
Artificial illustrative dataset is used to highlight strengths and limitations. |
Artificial illustrative dataset and is used to highlight strengths and limitations. Real world datasets are referenced regarding strengths and limitations too. |
|||
Updates and modifications to algorithms are discussed and recent research papers are cited. |
School of Physics, Engineering and Computer Science
Page 6 of 7
1.3 Evaluation / Comparing performance of algorithms / datasets
Assessment element | 0 | 1-5 | 6-10 |
Critically evaluate the algorithm for your use case and compare with other similar algorithms (5 marks) |
No discernable attempt at this element | One or two similar algorithms have been identified. |
Three similar algorithms have been identified. |
The strengths and limitations of the similar algorithms has been identified in comparison to the chosen algorithm. |
The strengths and limitations of the similar algorithms has been identified in comparison to the chosen algorithm and compared in relation to the challenges in the proposed project. |
||
Academic sources (journal and conference papers) have been referenced to critically evaluate the suitability of the algorithms for the proposed project. I.e. a paper using the same/similar algorithm on a similar use case. |
School of Physics, Engineering and Computer Science
Page 7 of 7
1.4 Describing ethical Issues
Assessment element | 0 | 1-3 | 4-8 |
Describe and reflect the ethical considerations for using this algorithm, could the algorithm produce bias results, how would this happen? (5 marks) |
No discernable attempt at this element | An ethical issue is raised. | More than one ethical issue is provided. |
The ethical issues could apply to the algorithm selected and how the algorithm would behave different has be briefly reflected on. |
How the issue would manifest itself into the model produced by the algorithm is explained, technical terminology is used. |
||
Methods (likely preprocessing methods) to avoid the ethical issue are identified. |