A local company requires to rent virtual machines (VRs) from cloud providers to carry out daily computational tasks.
At the moment, it rents about 500 virtual machines, or called servers. Their CPU’s hourly usages in percentage have been collected over a month from 24 Oct 2016 to 23 Nov 2016.
It has features: Server_Name, Timestamp, Number_of_Processors, Usage-percentage, Weekday (Boolean).
A report should contain the answers to the following tasks.
- Data exploration
-
- To discover the VRs whose CPU usages have never reached 0.3, i.e. 30%: (2 marks)
- To discover the VRs whose CPU usages have never fallen below 0.3, i.e. 30%: (2 marks)
- To discover the VRs whose mean (average) CPU usage are below 30% but it usage has never reached over 80%: (6 marks)
- Data Processing
To correctly add two new features and generate a new data set with has the following features:
Server_Name, Number_of_Processors, Weekday (Boolean), Workinghour, morningpeak
-
- morningpeak (from 7-11 AM): the average usage over this time period; (10 marks)
- workinghour (from 9AM-5PM): the average usage over this time period; (10 marks)
- Classification
To use an existing classification method/tool to generate decision trees. The results should describe the data in IF-THEN-ELSE format on which types of VRs always in low CPU usage (<30%), high usage (>80%), or median usage (the rest).
-
- To use the original dataset. (5 marks)
- To use the data set as the results from the Task 2. (5 marks)