Introduction
As the volume of data is increasing at an exponential rate, it has become of utmost importance to analyze that Bigdata efficiently for the betterment of an organization or for research. Query processing is a major aspect to evaluate huge data and obtain meaningful results from it. The major tools used for this query processing are MapReduce-based Hive and Spark which includes RDD. Hive, created by Facebook, is a data warehouse upon which SQL queries can be executed. These queries get converted to MapReduce tasks under the hood and run on underlying Hadoop and HDFS. PySpark is a data processing framework that works on real-time data. It is an open-source framework based on Resilient Distributed Datsets(RDDs) and includes the PySparkSQL modes to make queries to structured data in a quick manner. With the verge of these query technologies, cloud platforms are readily providing these services on the go so as to minimize the installation & configuration effort and focus more on the actual data analysis and enhancing query capabilities. A good ETL Framework will help in giving an environment to consider data wrangling, data handling, and scaling as per load incrementally. Extracting massive amounts of data from diverse platforms and loading it into a data warehouse is the goal of the ETL process. develop and implement a process for extracting, transforming, and loading (ETL) raw data from a variety of different data sources into meaningful and useful information in a data warehouse/data lake. Also, orchestration of the ETL is important for which Apache Airflow can be employed for such purpose. Exploring Cloud Services capabilities to build a scalable data processing infrastructure.
Research Area
Big Data Tools, ETL framework, Orchestration, Cloud platform, Open Source
Research Questions
Q1. How to create an Open Source Framework for ETL data?
Q2. How Cloud platform’s services can be used to integrate Big Data tools for the ETL process?
Q3. How to handle the parallel processing of huge amounts of data from different types of sources and destinations?
Q4. How to orchestrate the ETL Framework/Tool?
Get Solution of this Assessment. Hire Experts to solve this assignment for you Before Deadline.
The post H9RCOMP: As the volume of data is increasing at an exponential rate, it has become of utmost importance to analyze that Bigdata efficiently for the betterment: Research In Computing Thesis, NCI, Ireland appeared first on QQI Assignments.