Matt has taken the radix join as implemented in R's data.table and parallelized and distributed it in H2O. He will describe how the algorithm works, provide benchmarks and highlight advantages/disadvantages. H2O is open source on GitHub and is accessible from R and Python using the h2o package on CRAN and PyPI. ----------------------------------------------------------------------------------------------------------------------------------------
Scalæ By the Bay 2016 conference
http://scala.bythebay.io
-- is held on November 11-13, 2016 at Twitter, San Francisco, to share the best practices in building data pipelines with three tracks:
* Functional and Type-safe Programming
* Reactive Microservices and Streaming Architectures
* Data Pipelines for Machine Learning and AI