Loading...

Hydrator: Open Source, Code Free Data Pipelines, by Jon Gray CEO, Cask

572 views

Loading...

Loading...

Transcript

The interactive transcript could not be loaded.

Loading...

Loading...

Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Aug 24, 2016

Big Data Day LA 2016
July 9, 2016
Los Angeles

Link to slides: http://www.slideshare.net/caskdata/ca...

Abstract:

To efficiently create and manage an enterprise Data Lake typically requires substantial effort to ingest, process, store, secure, and manage data from a variety of sources. Hydrator is an open source framework and self-service user interface for creating data lakes that simplifies the building and managing of production data pipelines on Spark, MapReduce, Spark Streaming and Tigon.

The goal of this talk is to demonstrate broad, self-service access to Hadoop while maintaining the controls and monitors necessary within the enterprise. Hydrator provides these abilities to the enterprise and to all of the end-users the program, access, and manage enterprise data.

Some of the features that will be demonstrated:
* Supports Ingestion, ETL, Aggregations and Machine Learning. Real-time and Batch. Supports majors distros and cloud providers. Built to allow enterprises to enable self-service while maintaining enterprise requirements for security and governance.
* The Hydrator open source ecosystem contains an extensive library of plugins to enable batch and real-time ingestion from traditional and modern databases, cloud services and other common data sources. There are dozens of community plugins for machine learning and analytics as well as pre-built pipelines for common end-to-end use cases.
* Drag-and-drop user interface where you build data ingestion and data processing pipelines from included, community and custom-built plugins as well as custom MapReduce and Spark jobs. Pipelines and plugins support versioning and are configured with JSON.
* Operate pipelines with management interface. Schedule and monitor pipelines through UI or REST APIs. Powerful metadata capabilities. Automatically captures complete audit and lineage information. Integrates with Security and MDM systems.
* Customize and limit access to data sources, sinks and any other plugins to provide simplified and controlled usage by non-technical users.

The talk includes a live end-to-end demo of building and running ingestion and machine learning data pipelines.

Loading...

When autoplay is enabled, a suggested video will automatically play next.

Up next


to add this to Watch Later

Add to

Loading playlists...