Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on May 30, 2015
Today data is generated in greater volumes than ever before. In addition to vast amounts of legacy data, new data sources such as application logs or social media complicate data-processing challenges. The ultimate goal is to gain insights and derive prescriptions to support decisions or develop predictive apps. On the other hand preceding steps of data integration and warehousing allowing for exploration and application of data are usually hard and require expert knowledge in order to design and implement it.Peachbox solves this by providing an agile and accessible open source solution to the Big ETL process. Peachbox is a Python framework based on and conforming to the ‘Lambda Architecture’, which in turn is an abstracted pattern providing principles and best practices for real-time and scalable data systems. The main underlying technology is PySpark.In the tutorial we will set up Peachbox and implement a general and extensible Big ETL system. Furthermore we will explore potential applications.Tutorial prerequisites and instructions.