Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Dec 4, 2016
Of the more than 2.5 billion Gb of data being produced daily, about 75% is unstructured, and only about 0.5% is ever analyzed. The goal of big data analytics is to fish useful insights out of the rising tide of available data -- but the first step is to parse the raw data, and the most popular tools today are built on a shaky foundation. Most tools (e.g. Perl, PCRE, ElasticSearch, Splunk, most Apache parsers) for processing unstructured text rely on regexes, extensions of regular expressions. But regexes are not easy to write, and are notoriously difficult to read and maintain. Also, regexes have surprisingly variable performance in practice. So it's best to avoid putting a regex engine in your big data pipeline. In this talk, we introduce Rosie Pattern Language, an alternative to regexes. RPL shares key concepts and notation with regexes, but RPL patterns are more powerful. RPL is designed like a programming language: composable patterns are bound to identifiers; comments and whitespace are allowed within patterns; and patterns may be grouped into modules. Such features facilitate the creation, maintenance, and sharing of patterns. Finally, RPL matching (parsing) is consistently fast, often several times faster than competing tools. Rosie Pattern Language is implemented in Lua; the RPL compiler produces expressions which are then processed at run-time by the lpeg pattern matching engine. While patterns are defined (specified) in RPL, post-match processing (including data format conversion) is done in Lua. Thus, Lua is the extension language for RPL, allowing users to add new data format conversion and validation routines. Rosie is open source, released under the MIT License, and can be found at https://github.com/jamiejennings/rosi....