Write My Paper Button

WhatsApp Widget

Assignment 1: Map-Reduce | My Assignment Tutor

COMP3210/COMP6210 – Big DataAssignment 1: Map-ReduceSemester 1, 2021Macquarie University, Department of Computing Dataset: 10000 Tweets dataset included in “tweets.zip” on iLearn Programming environment: You should use Pymongo and Mrjob in Python to implement your Map-Reduce algorithms. Task 1 (60%): MapReduce: Calculate the total number of tweets posted on each day by personal account (i.e., theobjectType … Continue reading “Assignment 1: Map-Reduce | My Assignment Tutor”

COMP3210/COMP6210 – Big DataAssignment 1: Map-ReduceSemester 1, 2021Macquarie University, Department of Computing Dataset: 10000 Tweets dataset included in “tweets.zip” on iLearn Programming environment: You should use Pymongo and Mrjob in Python to implement your Map-Reduce algorithms. Task 1 (60%): MapReduce: Calculate the total number of tweets posted on each day by personal account (i.e., theobjectType of actor is “person”) that appears in the dataset.{“_id” : ObjectId(“603d975915cb074610ddb000”),“id” : “tag:My number 1 tweet”,“objectType” : “activity”,“actor” : {“objectType” : “person”,“id” : “id:twitter.com:123123”,“link” : “http://www.twitter.com/Intelledox”,“displayName” : “Intelledox”,“postedTime” : “2008-12-11T23:47:55.000Z”,“image” : “https://pbs.twimg.com/profile_images/485981380585603072/inMuMtJ7_normal.png”,“summary” : “Intelledox’s mobile-ready digitalization software helps over 1 million people to do businessfaster, smarter & efficiently Digitalize your business process now!”,“links” : [{“href” : “http://www.intelledox.com”,“rel” : “me”}],“friendsCount” : NumberInt(486),“followersCount” : NumberInt(549),“listedCount” : NumberInt(24),“statusesCount” : NumberInt(1188),personal accountposted time pThe partial results are similar to the following, Task 2 (40%): MapReduce: Implement either the Merge Sort1 algorithm or Bucket Sort2 algorithm using MapReduce to sort the posted dates (from Task 1) according to the number of tweets (ascendingorder).The partial results are similar to the following,1 https://en.wikipedia.org/wiki/Merge_sort2 https://en.wikipedia.org/wiki/Bucket_sortposted datenumber of tweets Workflow for Tasks 1 and 2: Task 1 (60%): Step 0: Import the JSON file ‘10000 Tweets’ into MongoDB. Step 1 (30%): Connect to MongoDB from Python application, extract the posted dates intweets and save them in a txt file, such as ‘postedDates.txt’. Please extract the first 10characters from the posted time of a tweet to form the corresponding posted date. Step 2 (30%): Implement the MapReduce algorithm for Task 1 to calculate the number oftweets posted on each day, namely, to calculate the count of each posted date in ‘postedDates.txt’ and save the results in ‘task1_output.txt’. Task 2 (40%): Implement the Merge Sort or Bucket Sort MapReduce algorithm to sort posted dates in‘task1_output.txt’ in ascending order according to the count and save the results in‘task2_output.txt’. Submission:Submit a zip file “Firstname_LastName_Assignment1.zip” to iLearn, including: A Word or PDF documentation in 2-4 pages including the Flowchart and Pseudocode for each task; Source code for Tasks 1&2, including for curation, mapper(s), reducer(s) and maybe combiner; Output file for each task, i.e., ‘postedDates.txt’, ‘task1_output.txt’ and ‘task2_output.txt’.

Don`t copy text!
WeCreativez WhatsApp Support
Our customer support team is here to answer your questions. Ask us anything!
???? Hi, how can I help?