Loading...

Nils Magnus - Dealing with TBytes of Data in Realtime

755 views

Loading...

Loading...

Transcript

The interactive transcript could not be loaded.

Loading...

Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on May 31, 2016

PyData Berlin 2016

Data processing often splits into two disjunct categories: Classic access to RDBMS is well-understood, but often scales poorly after considerable GBytes of data. Big data approaches are powerful, but complex to set up and to maintain. In a test setup we tried a compromise of both: What happens if you glue more than 1000 single SQL databases into a huge cluster? We learned a whole lotta lessons!

Data processing often splits into two disjunct categories: Classic access to RDBMS with SQL and ORMs is well-understood and convenient, but often scales poorly after considerable GBytes of data. Big data approaches are powerful, but complex to set up and to maintain. In a test setup we tried a compromise of both: What happens if you glue more than 1000 single SQL databases into a huge cluster?

Thanks to access to an unused IaaS cluster, we had the opportunity to research the behavior of many nodes clustered together. Data loading becomes a real challenge, while maintenance and monitoring such a drove of containers was no longer possible manually. We investigated the effect of changing container-vm-ratios. For our experiments, we used Crate, an open source, highly scalable, shared-nothing distributed SQL database, that comes with Python client connectors and support for several ORMs.

We share unexpected experiences about data schema design with the attendees, will explain some tweaking options that turned out to be effective, and would like to campaign for more open data projects.

Comments are disabled for this video.
When autoplay is enabled, a suggested video will automatically play next.

Up next


to add this to Watch Later

Add to

Loading playlists...