NIT3171 ICT Business Analytics and DataVisualisationSemester 1 Block 2 Footscray CampusGroup Assignment: Business AnalysisCase StudyPart 1: ReportDue Tuesday 13th April, 2021Group Members:Hajjar Elakkoumi s4598593Jackson Naffa s4571028Joanne Watson s1088317INDEXIntroduction 31 – Understanding the dataset 3Bedrooms: 6Bathrooms: 7Cars: 8Auction Results: 9Price: 10Region – all properties: 102 – Relationships 14Building Area: 15Land size: 16Price by Property Type and … Continue reading “Business Analytics and Data Visualisation | My Assignment Tutor”
NIT3171 ICT Business Analytics and DataVisualisationSemester 1 Block 2 Footscray CampusGroup Assignment: Business AnalysisCase StudyPart 1: ReportDue Tuesday 13th April, 2021Group Members:Hajjar Elakkoumi s4598593Jackson Naffa s4571028Joanne Watson s1088317INDEXIntroduction 31 – Understanding the dataset 3Bedrooms: 6Bathrooms: 7Cars: 8Auction Results: 9Price: 10Region – all properties: 102 – Relationships 14Building Area: 15Land size: 16Price by Property Type and Method of Sale: 18Type of property: 193 – Potential Business Analysis Tasks 201) Increase Market Share by increasing footprint – 20analyse relationship between reach (number of suburbs an agent is operating in) andsales revenue 202) Increase revenue by improving Auction results 203) Predict Sales Prices by property type and location 21NIT3171 – H Elakkoumi, J Naffa and J Watson Page 2IntroductionThis report is written from the perspective of a Business Analyst (BA) in a RealEstate Consulting firm where an ICT Project is underway. As part of the project, theBA has been asked to review a given dataset of Melbourne Housing data to use datamining strategies to find practical solutions to business problems.The specific problems have not been given therefore as a Group we will documentour understanding of the data and methodologies, using JMP Statistical Discoverysoftware, Excel and WEKA, to identify a list of further BA tasks to pursue that couldprovide benefits to the firm.This report will be summarised in a presentation to be made as a Group to the class.This report will also form the basis for Part 2- the Individual Report and Presentation.1 – Understanding the datasetThe melbourne_data_description file explained the field names in the datasetmelbourne_house_data which contained 2,000 rows of data (observations) and 16attributes: Sale transaction (ID), suburb (suburb), address (address), number ofrooms (rooms), type of property (type), price (price), result of auction (method ofsale), real estate agent (SellerG), date of sale (date), distance from the CBD(distance), postcode (postcode), number of bedrooms (bedroom2), number ofbathrooms (bathroom), number of car spots (car), land area (landsize), Building size(buildingarea), year property was built (YearBuilt), local council (CouncilArea),latitude (Latitude), longitude (Longitude), Victorian region (RegionName) and numberof properties that exist in the suburb (property count).During the process of cleaning the data we added:● An attribute of “result” to classify as Sold not Sold● Re-formatted the Seller Attribute to remove apostrophes● Replaced codes with words i.e. Property Type and Method● Populated some of the missing dataNIT3171 – H Elakkoumi, J Naffa and J Watson Page 3Identification of the data fields as nominal or numeric along with how we treatedsome of the data fields is shown in the tables below:Table 1:NIT3171 – H Elakkoumi, J Naffa and J Watson Page 4Table 2:NIT3171 – H Elakkoumi, J Naffa and J Watson Page 5Bedrooms:Where there were missing values (3 properties), we used the average price fornumber of bedrooms to manually complete, as there were only five properties wematched the individual property price with the closest average value and allocatedthat number of bathrooms:NIT3171 – H Elakkoumi, J Naffa and J Watson Page 6Bedrooms in original data where NIL against manually populated to fill in missingdata:Bathrooms:Where there were missing values (5 properties), we used the average price fornumber of bathrooms to manually complete, as there were only five properties wematched the individual property price with the closest average value and allocatedthat number of bathrooms, see details below:NIT3171 – H Elakkoumi, J Naffa and J Watson Page 7Bathrooms in original data against manually populated to fill in missing data:Cars:There were 159 properties that were missing Car space data; there does not seemto be a strong enough correlation between the number of car spaces and price. Forthis reason, we decided not to fill in the missing data and therefore we did notinclude this in our analytics.NIT3171 – H Elakkoumi, J Naffa and J Watson Page 8Auction Results:The data shows that just under 20% of Auctions did not result in the sale of theproperty.NIT3171 – H Elakkoumi, J Naffa and J Watson Page 9Price:The data included all Auction results, not just those that resulted in a sale. The initialanalysis on Price within the whole data set was done, however for most of theresulting analysis we then removed all of the NOT SOLD instances (records)because those prices were not realised.Median property price for top ten suburbs (including not sold)Region – all properties:The graph below shows that for both NOT SOLD and SOLD the ranking of theRegions for median price did not change.NIT3171 – H Elakkoumi, J Naffa and J Watson Page 10The probability of a property being sold (0.800050) was higher than a property notselling 0.19950, as we had decided to analyse price only on realised sales wemoved forward with just the SOLD data.We then analysed who the highest seller was and tested it against region, price, theway it sold etc. to see why they were more successful and if there were anycorrelations. First we looked at the Regions and saw that North and South metro hadthe highest number of properties listed. LevelCountProbEastern Metro1980.09900Eastern vic60.00300North metro6240.31200North vic40.00200SE metro620.03100South metro6920.34600West metro4130.20650West vic10.00050 NIT3171 – H Elakkoumi, J Naffa and J Watson Page 11Nelson was the highest seller with 238 (for all the data) they mainly sold only inmetro areas; highest being North Metro (149) lowest Southern Metro (9)predominantly in Moonee Valley (62) and Moreland (65). Bathroom average1.4159664.Most successful periods for sales for 2016 and 2017 was the period between Apriland July. In total Sold 182 properties (0.76741 PROBABILITY) , only 56 NOT SOLD(0.23529 PROBABILITY).AVERAGE PRICE $989,196; MIN:$250,000; MAX: $3,900,000. TYPE OFPROPERTY: house 169 (0.71008 prob) TownHouses 25(0.10504) unit 44 (0.18487)Sold Properties: here it shows that North Metro had the highest number ofproperties sold LevelCountprobEastern Metropolitan1530.09557Eastern Victoria60.00375Northern Metropolitan5200.32480Northern Victoria40.00250South-EasternMetropolitan570.03560Southern Metropolitan5110.31918Western Metropolitan3490.21799Western Victoria10.00062 NIT3171 – H Elakkoumi, J Naffa and J Watson Page 12 Total16011.00000 As the data above shows, price by region we see that the North Metro region has thehighest amount of properties sold, although the Southern Regions prices were soldfor much higher for instance the maximum price in North Metro was $3,520,000, andSouthern Metro $5,500,000.For SOLD ONLY data Nelson was still the highest with 182 (0.11368 prob) propertiessold with Jellis the next most successful agent with sales of 144 (0.08994 prob)propertiesNIT3171 – H Elakkoumi, J Naffa and J Watson Page 13Comparison of 2 highest real estate agents to see where they are most successful NelsonJellisLand SizeMAX:2886MIN: 0MEAN: 403.2926MED: 324MAX:4668MIN: 0MEAN: 494.84722MED: 356.5RegionNorth metro highest 112 count (0.61538 prob)South metro lowest 7 count (0.03846 prob)West metro section 53 count (0.29121 prob)South metro region highest with 64 (0.44444 prob), north metrosection 61 count (0.42361 prob)Western metro lowest 5 (0.03472 prob)How it soldSold after; 2 count (0.01099 prob)Sold auction:155 count (0.85165 prob)Sold prior 25 count (0.13736 prob)Sold after; 1 count (0.00694)Sold auction 131 count (0.90972 prob)Sold prior 12 count (0.08333 prob)BathroomMax: 4Min:1Med: 1Mean 1.4120879Max: 4Min:1Med: 1Mean 1.5069444SuburbBrunswick with 13 count (0.07143 prob), keilor eastsection with 10 countBrunswick with 10 count 0.06944 prob), with Kew and south yarraequal second 7 countDateOCT 2016 – JAN 2017 highest sellingApril 2017-July 2017PriceMAX:3900000MIN: 323000MEAN: 1003875MED: 913000MAX:3950000MIN: 305000MEAN: 1369406.3MED: 1275000Type of propertyHouse- 130; prob 0.71429TH 20; Prob: 0.10989Unit 32; Prob: 0.17582House- 91; prob 0.63194TH 22; Prob: 0.15278Unit 31; Prob: 0.21528 The comparisons between real estate agents Nelson and Jellis are quite close andalthough Jellis sold 38 less properties although, they did sell the most expensiveproperty.2 – Relationships– Sold data onlyNIT3171 – H Elakkoumi, J Naffa and J Watson Page 14In this table it is shown that sellerG, suburb and building area have a small butpositive correlation to properties.Building Area:Nelson may have higher sales due to selling properties with a bigger building area.Positive correlation between building area and seller for numbers. Nelson usuallygets bigger properties therefore their sales are higher.Some sellers and the sum of building areas – we tested price * land size:NIT3171 – H Elakkoumi, J Naffa and J Watson Page 15 SellerGSum of buildingNelson17,089Jellis11,286Hocking Stuart10,956Barry7,729Ray6,898Buxton6,694Marshall6,657Biggin4,316Woodards3,258Brad2,771Jas2,186Fletchers2,154YPA1,997Greg1,942McGrath1,761Noel1,495Stockdale1,472Gary1,333Sweeney1,247Village1,153Love1,146Harcourts1,111Rendina1,016Bells902 Land size:There were 305 properties listed with NO land size, but as land size is of greatinterest for most property purchases, we filled in the missing data by using theaverage land size by property type in the same suburb. This reduced the missingvalues down to 4.5%, being 90 properties.NIT3171 – H Elakkoumi, J Naffa and J Watson Page 16At one end of the range in the original data, we had land size for a Unit in Richmondof 14,196sqm and the other end of the scale a House in Northcote of only 5sqm.This indicates that perhaps the original data is incorrect and with the 47% ofproperties with NIL value Land size, this may not be a useful measure for analytics.JMP and EXCEL showed a minor correlation, but due to missing variables andinconsistencies it shows that Land Size is an unreliable attribute as an indicator ofprice.NIT3171 – H Elakkoumi, J Naffa and J Watson Page 17Price by Property Type and Method of Sale:The following Data is only for the Southern Metro region (sold only data) and we arelooking to see if there is a correlation between price and other variables: Mean1,364,119.2Std Dev:790,205.3Upper 95%1,432,795.9Lower 95%1,295,442.4Max5,500,000Min247,500Median1,220,00025% lower quartile803,00025% upper quartile1,735,000 The average price for south metro is $1,364,119.2 they also had the highest selling price at$5,500,000NIT3171 – H Elakkoumi, J Naffa and J Watson Page 18Type of property: TypeCountProbHouse2920.57143Townhouse520.10176Unit1670.32681Total5111.0000 Houses were the highest sellers followed by units. The probability 57% of a housebeing sold in South Metro.How it sold TypeCountProbSold after Auction90.01761Sold at Auction4560.89237Sold prior to Auction460.09002 The method of sale with 89%, was sold at auction.NIT3171 – H Elakkoumi, J Naffa and J Watson Page 193 – Potential Business Analysis TasksOur analysis of the data so far shows that there are potentially three businessbenefits to investigate further:.1) Increase Market Share by increasing footprint –Analyse relationship between reach (number of suburbs an agent is operating in)and sales revenue.The most successful agents in terms of number of sales do not necessarily operatein the most suburbs, as the data shows Hocking Stuart has sold 40 less propertiesthan Nelson but operated in 20 more suburbs. If a correlation is found then it couldsupport a business case for expansion.2) Increase revenue by improving Auction results– investigate reasons for lower success rates at Auction. If cause can be identified abusiness case may be made to change systems, process or increase staff training.Inexperienced staff may be over-valuing properties leading vendors to haveunrealistic expectations which result in setting reserve prices too high and theproperties not being sold.In this sample it shows that with similar numbers of properties Brad (Agent) is muchmore successful and selling properties than McGrath.NIT3171 – H Elakkoumi, J Naffa and J Watson Page 203) Predict Sales Prices by property type and location– The business benefit would be to improve strategic planning of resources, revenueand cash flow forecasting as well as decisions around future growth of the business.Using past data from the dataset predictive price model could be built.Conclusion:There are many different ways to mine the data and identify metrics which can beused by a Business Analyst to identify projects or tasks that are beneficial to thebusiness.Although this dataset was small and could be manually cleaned and modified, inlarge databases the task would be impossible without some machine learningtechnology with the actual applications used being either established by theorganisation or a preference of the analyst.However, three must always be a identified business benefit as an outcome of thework spent mining and analysing the data, to be presented to management forreview and approval (or not) to proceed further into the project.Significant amounts of time can be spent on a business case for change for a verygood project but management may decide not to proceed with the project, that istheir prerogative and the work performed to date will be filed. Without a clearbusiness benefit any work performed in analysis is effectively a waste of time andresources.NIT3171 – H Elakkoumi, J Naffa and J Watson Page 21