Philipp Pahl: Peachbox: Agile and Accessible Big ETL Framework





The interactive transcript could not be loaded.


Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on May 30, 2015

Today data is generated in greater volumes than ever before. In addition to vast amounts of legacy data, new data sources such as application logs or social media complicate data-processing challenges. The ultimate goal is to gain insights and derive prescriptions to support decisions or develop predictive apps. On the other hand preceding steps of data integration and warehousing allowing for exploration and application of data are usually hard and require expert knowledge in order to design and implement it.Peachbox solves this by providing an agile and accessible open source solution to the Big ETL process. Peachbox is a Python framework based on and conforming to the ‘Lambda Architecture’, which in turn is an abstracted pattern providing principles and best practices for real-time and scalable data systems. The main underlying technology is PySpark.In the tutorial we will set up Peachbox and implement a general and extensible Big ETL system. Furthermore we will explore potential applications.Tutorial prerequisites and instructions.

Philipp Pahl

Comments are turned off
When autoplay is enabled, a suggested video will automatically play next.

Up next

to add this to Watch Later

Add to

Loading playlists...