Part III: Do Yelp Reviews Skew Negative? 5 Part IV: Should the Elite be Trusted? (Or, some other analysis of your choice) 5 Requirements This project is very simple: you are to provision a Spark cluster on AWS EMR, connect it to a Jupyter Notebook and then run a series of queries (in python with DataFrame API or Spark SQL) that answer a few simple questions about the Yelp Data available. In doing so, you are demonstrating your ability to configure and provision infrastructure using the AWS Elastic Map Reduce ecosystem. Also you are demonstrating your understanding of how to leverage transformations and actions (as per the Spark terminology) with PySpark in performing basic data analysis tasks o
Analyzing 10Gb of Yelp Reviews Data For this project, you will be tasked with provisioning a Spark Cluster on AWS EMR for loading and running some analysis on Yelp’s Reviews and Businesses dataset (about 10gb) from Kaggle. You will run your analysis via Jupyter Notebook and the expected output artifact is a .ipynb file. Requirements … Read more