Overview
In hotel industry, it is very common that customers cancel their bookings before they check-in or do not show up at the time of their check-in. Both cases are usually shown as cancellation in the hotel’s booking system. Predicting a hotel booking’s likelihood to be cancelled can help the hotel manager to effectively allocate rooms in their booking systems. In this assessment, you are going to predict the hotel booking cancellations using data-driven models.
The data related to this assessment can be downloaded from Teams. You need to write an analysis report to discuss how do
you complete the tasks and go into sufficient depth to demonstrate knowledge and critical understanding of the relevant processes involved. 100% of available marks are through the completion of the written report.
Report Guidance
Your report must conform to the below structure and include the required content as described. You must supply a written report containing three distinct sections that provide a full and reflective account of the processes undertaken.
Section I: Data Loading and Preparation (15%)
As a first step, you need to download the datasets from Teams. There are two datasets: hotel_bookings_01.csv and hotel_bookings_02.csv. The variables in both datasets are briefly explained as below:
Variable
ADR
Adults
Agent
ArrivalDateDayOfMonth
ArrivalDateMonth
ArrivalDateWeekNumber
ArrivalDateYear
AssignedRoomType
Babies
BookingChanges
Children
Country
CustomerType
DaysInWaitingList
DepositType
IsCanceled
Description
Average daily rate
Number of adults
ID of the travel agency that made the booking. Null is there is no agent.
Day of the month of the arrival date
Month of arrival date with 12 categories: “January” to “December”
Week number of the arrival date
Year of arrival date
Code for the type of room assigned to the booking. Sometimes the assigned
room type differs from the reserved room type due to hotel operation reasons
(e.g. overbooking) or by customer request. Code is presented instead of
designation for anonymity reasons
Number of babies
Number of changes/amendments made to the booking from the moment the
booking was entered on the PMS until the moment of check-in or cancellation
Number of children
Country of origin. Categories are represented in the ISO 3155–3:2013 format
Type of booking, assuming one of four categories: Contract – when the
booking has an allotment or other type of contract associated to it; Group –
when the booking is associated to a group; Transient – when the booking is not
part of a group or contract, and is not associated to other transient booking;
Transient-party – when the booking is transient, but is associated to at least
other transient booking
Number of days the booking was in the waiting list before it was confirmed to
the customer
Indication on if the customer made a deposit to guarantee the booking. This
variable can assume three categories: No Deposit – no deposit was made; Non
Refund – a deposit was made in the value of the total stay cost; Refundable – a
deposit was made with a value under the total cost of stay.
Value indicating if the booking was canceled (1) or not (0)
Page 2 of 4
IsRepeatedGuest
LeadTime
Meal
PreviousBookingsNotCanceled
PreviousCancellations
RequiredCardParkingSpaces
ReservationStatus
ReservedRoomType
StaysInWeekendNights
StaysInWeekNights
TotalOfSpecialRequests
Value indicating if the booking name was from a repeated guest (1) or not (0)
Number of days that elapsed between the entering date of the booking into the
PMS and the arrival date
Type of meal booked. Categories are presented in standard hospitality meal
packages: Undefined/SC – no meal package; BB – Bed & Breakfast; HB –
Half board (breakfast and one other meal – usually dinner); FB – Full board
(breakfast, lunch and dinner)
Number of previous bookings not cancelled by the customer prior to the
current booking
Number of previous bookings that were cancelled by the customer prior to the
current booking
Number of car parking spaces required by the customer
Reservation last status, in one of three categories: Canceled – booking was
canceled by the customer; Check-Out – customer has checked in but already
departed; No-Show – customer did not check-in and did inform the hotel of the
reason why
Code of room type reserved. Code is presented instead of designation for
anonymity reasons
Number of weekend nights (Saturday or Sunday) the guest stayed or booked to
stay at the hotel
Number of week nights (Monday to Friday) the guest stayed or booked to stay
at the hotel
Number of special requests made by the customer (e.g. twin bed or high floor)
1. You firstly merge the two datasets hotel_bookings_01.csv and hotel_bookings_02.csv using R. You need to provide screenshots of the key steps and report the dimension (i.e., number of rows and number of columns) of the merged dataset. (4%)
2. Do you realise any feature columns in the merged dataset that have missing values? If so, please report these features and deal with the missing values. Usually, there are three ways of dealing missing values:
• removing the instances with missing values.
• filling in the missing values with other values.
• removing the columns with missing values.
Please provide screenshots of the key steps and justify the way you choose to deal with the missing values. (9%)
3. Please convert and export the prepared dataset into Excel file format (e.g. xlsx). (2%)
Section II: Descriptive Analytics (25%)
In this section, you are going to perform some descriptive analytics on the prepared dataset. You can use either Excel or R to complete the questions as below:
1. How many numeric features you can identify in the prepared dataset and what are they? Please provide a summary table of descriptive statistics (as the example shown below) for these numeric features as well as calculate their correlation coefficients R. (6%)
Feature name …
Mean
…
Median Min
Max
Standard deviation …
…
…
…
Number of unique values …
2. How is the customer type distributed between city and resort hotels? Please answer this question by visualizing data and report the key steps of operations in Excel or your R codes. (3%)
3. Is the average daily rate of “No-Show” smaller than that of “Canceled” and “Check-Out” for both city and resort hotels? Please answer this question by visualizing data and report the key steps of operations in Excel or your R codes. (3%)
4. If a customer is the repeated guest, is he/she more likely to check-out or not? Please answer this question by visualizing data and report the key steps of operations in Excel or your R codes. (3%)
Page 3 of 4
5. Customers can book hotels direct or through agents. Which way is quicker and which way is cheaper? Please answer the questions by visualizing data and report the key steps of operations in Excel or your R codes. (4%)
6. Draw line plots of the average daily rate and the average booking changes against the days in the waiting list, respectively. The plots should be in one figure, with one x-axis and two different y-axes. Please answer this question by visualizing data and report the key steps of operations in Excel or your R codes. Do they have any statistically linear correlation? (6%)
Section III: Hotel Booking Cancellation Prediction (60%)
You need to use R to develop data-driven models to predict if a customer will cancel his/her booking or not.
1. Either IsCanceled or ReservationStatus can be used as the target/response variable. Here we will use IsCanceled as the target variable, can you explain why? Can you also discuss that anything/steps that we can do if we want to use ReservationStatus as the target variable? (5%)
2. You need to select two different classification models that you have studied in the course to predict cancellations in the target variable IsCanceled. Please introduce your selected two models. Each model should have a short paragraph and your description for each should be no more 300 words. (10%)
3. Show the key steps and your R codes for your model training and testing. Here we use accuracy as the model evaluation metric. (20%)
You need to:
• Describe which variables are used as model input.
• Discuss and provide screenshots of the key steps and settings of model training and testing.
• Find your random seed number in BI_Random_Seed_2021.pdf and use your allocated random seed number for all modules in your experiment if applicable.
• Show and discuss your data split ratio.
4. Select the best classification model from your developed two models, and explain why this is the best model. Your best model selection and the model settings should be clearly presented. (10%)
5. Discuss the business insights provided by your finally selected model. For example, which variables/features are important in prediction? What types/segments of customers are more likely to cancel their bookings? What do you think might be the reasons behind the findings? Your analysis needs to be reasonable, and you can include any theories or evidence from related subjects (e.g., consumer behaviour, marketing, psychology) or other empirical studies to justify your statements along with your findings. (15%)
The report must
• Contain your student number and course name.
• Be in PDF and no more than 15 pages (excluding cover page and references if they are included).
• Be formatted single-spaced with 11 pt font size.
• Do not include this briefing document.
This assessment is an individually assessed component. If you have included any citations, your citation and referencing should be by university guidelines. If you are unsure about any aspect of this assignment, please seek the advice of the course coordinator.
Page 4 of 4
The post In hotel industry, it is very common that customers cancel their bookings before they check-in or do not show up at the time of their check-in. Both cases are usually shown as cancellation in the hotel’s booking system. Predicting a hotel booking’s likelihood to be cancelled can help the hotel manager to effectively allocate rooms in their booking systems. In this assessment, you are going to predict the hotel booking ca appeared first on My Academic Papers.
The post In hotel industry, it is very common that customers cancel their bookings before they check-in or do not show up at the time of their check-in. Both cases are usually shown as cancellation in the hotel’s booking system. Predicting a hotel booking’s likelihood to be cancelled can help the hotel manager to effectively allocate rooms in their booking systems. In this assessment, you are going to predict the hotel booking ca appeared first on study tools.