 Thank you. Hello everyone and welcome. My name is Abraham Almarek. I work on hue and I'm a contributor on scoop We're here to take a look at what you can do and how we can use it as a starting point for exploration and real-time interaction with the dupe so So what we're going to start off with is Describing what hue is and then I'll pass the torch to my colleague remain here and he'll give a demo of How to use it? Okay, so what is hue well hue is the Hadoop user experience and that's a it's actually a picture of the hbase browser within hue How many folks are familiar with Hadoop? Show of hands. Oh just let everyone yeah, so Something that we focused on a little bit over the past few months is Making hbase a little bit more usable and looking being able to like really interrogate your data look at your dad and even Mutate some of the cells in hbase, and this is what it looks like so hue sits on top of The what I would consider the Hadoop ecosystem So that means beeswax hive server to Paula HDFS is a job track that map reduce scoop to hbase pig Also, something we'll throw in there is search clutter at least A search service which we also have a user interface for so hue is essentially a collection of apps and we have an app for each one of these services so if you could imagine that's Hive you have you have a query that you need to execute Sometimes we'll use the command line interface or generally you would use the command line interface, right? Well hue exposes as an interface that makes it really easy with syntax highlighting etc Which which romain will show in a little bit what that looks like which makes it really easy really nice And you can even save your queries Okay, so why a UI for Hadoop well So I would imagine that most people are not engineers a and the power of Hadoop can be used by just about Anyone ranging from data scientists to students to people who are completely new to Hadoop So if you're an engineer who knows nothing about Hadoop or big data and you're very interested in learning more about it Well, I imagine that hue would be the perfect system for you to get ramped up on Also, if you're a data scientist at I don't know Facebook or maybe a startup and you have a lot of useful data hue has Hue has many apps that would assist you in that also all of these apps are in some sense Interconnected and it makes it really easy to use all of the components or excuse me many of the components in the Hadoop ecosystem So as you can see It makes it easier to use the power of big data So what is an app in hue? Let's take a step back and think about what applications are in general So if you look at your phone, I don't know if you have an Android or an iPhone It doesn't really matter. There's a concept of an app or an application It's an entity within your within your your phone that allows you to do things performs a certain function That's exactly what an app is in hue There's like I said earlier. There's an app for several of the components in the Hadoop ecosystem HBasebueze scoop high pig HDFS also it provides kind of Cohesiveness and makes it a lot easier That's kind of the goal by the way of hue to make to make it much easier to use the components in the Hadoop ecosystem So how is it how is it built? What what does it look like in terms of the internals? So It's your typical web application in some sense. We strive for single-page layouts So that means we have a very heavy front-end we use knockout.js jQuery rowdy.js Also, we have Mako templates, which is specific to Django So on the back end we use Python and Django. It's a Python and drink Django driven website So the selection of Django is actually quite interesting The apps that were mentioned previously are actually Django apps or extensions to Django apps So it's it's very easy to build your own apps all you have to do is Run this command in the command line build and then hue create desktop app followed by the name and It will create a pre-defined template that you can just fill in the blanks and you have your own app Has anyone here built a Django application before just a quick show of hands So it's it's rather easy This the ramp of costs are very minimal in my opinion So should be I would imagine no problem So going back to the back end real quick For the web servicing layer we use spawning or cherry PY. It's one or the other These also handle all of your static resources such as images CSS files and JavaScript files also, we have client interfaces for many of the Hadoop components such as HDFS web HDFS and the job tracker So what other batteries are included? Well, there's an LDAP back end what that means is that you can authenticate using an LDAP an LDAP server or We have an OAuth back end We also have spin a go and it's very it's very extensible. So if necessary you can write your own very easily So authentication and authorization lives in hue What that means is users and groups and how you access apps within hue All live in hue hue has its own database and within that database we have a Set of users which you can import from LDAP or some other system and it'll be managed within hue As said earlier resources are serviced through hue we support multiple browsers and multiple databases That means a sequel light my sequel postgres and Oracle Now we're also internationalized in over eight languages and one thing I really want to stress is the Kind of cohesiveness of the apps. Excuse me not cohesiveness the word. I'm looking for I believe is how will the apps integrate with each other Let me give you a quick example. So if you could imagine you have a MapReduce job that has executed and it creates a bunch of files on HDFS Then it's really easy in hue to just click on the the job output and go directly to Those that output in HDFS It becomes super easy right that that would be kind of kind of difficult in and of itself You would have to go through the the normal job tracker UI, etc But within hue it's just one click So we really strive for the user experience to be improved and that means the data scientists the newb I mean even developers developers like myself. I like using it because it makes my life much easier see So hue has over 2,000 commits. It's an open source project You can download it from get hue.com We do a lot of video tutorials on there as well So if you need kind of an introduction on how to do things in hue how to work with all of the different components That's all on get hue.com It's a pretty mature project. It's about four years old and There's a lot of there's a lot of supporters There's there's there's a lot of there's a lot of people in the community. Our mailing lists are just So many people comment on the mailing lists. It's it's absolutely wonderful and so without further ado To see all of this in action. I'm gonna pass the torch to my colleague Romain. Thank you So the best way to show the product I mean one of the best way is to show it in action So I'm going to show like a demo starting from some help data of review reviews of restaurants And so you can use ad-oop with you on values query tools like pig-eye to process get some insights Into the data So I'm going to switch To a browser. So when you log in to you, that's the web UI. We support different backends You can use held up or or or even build your custom authentication mechanism Here I'm just in the file browser. So people are a bit familiar with ad-oop But that's the file system where you can see all the files And so you can bruise like a regular browser like going the files Use like quick navigation. We support the HDFS trash so you can even restore like files like in a real operating system You can directly preview images of files or even PDF without having to download them then look at the Look at them on your OS So I'm going to upload a Yelp data file So this is a Yelp data sets challenge. I'm just taking one of these file So in one click I can upload it on On the HDFS so directly for my browser And here is a file so for the demo is only 200 megabytes As you can see it is adjacent files if we if you go on the website You will see the schema of the website basically a series of votes for one restaurant with the final writing on the bunch of text for the review when you upload I want to process data. Most of the time you need to do some ETL processing and clean it a little bit So here I'm going to clean a bit escape some Parts of the text and transform it in a like TSV format to make it easier to be processed by ice pig in Pala So for doing this I'm going to use the pig editor So pig is one of the query tool for Adoop. It's a bit lower level than I've so in pig if you are familiar with pig, so We provide a nice editor with syntax alighting on auto-completion on the HDFS of pass then You can also auto-complete pig keywords if you do a mistake you want highlights, so you avoid typos auto-complot auto-complete, sorry, alias is so I can reuse my ASB And to save some time I'm going to load a script that is going to take in input the file the JSON file that I was I just uploaded load it with a JSON Loader which is native in pig and that I'm going to explode like the map of ratings put each vote into a column And do some string escaping on the text of the review a little bit of cleaning then just dump it So then I can just click it will run the script So it's submitted for for you submitted for you and will print the values logs The page refreshes automatically show you the progress You could look if you're familiar with my previous to the values my produces jobs created by the pig Same it's a lot. It is a live update with a one click to get access to the logs Which is pretty neat. You don't need to go in the common line then go on the Job tracker URL get your job get the logs. It's all into one click The pick script is finishing same we put links for example if I would click on the job was on the Job this script credit warm up reduced job. I could click and just go back to the previous page I was about you also put links on the pass HDFS pass So I was loading this file and dumping it as a CSV on the HDFS So I can just directly click on it See we're just run right now Again, I can just quickly look at it in the browser directly in this case. You see everything is CSV now So now I have a CSV file, but I want to do actual queries on it a good way to do is to use one of the Like main query tool, which is called hive So it's like SQL on top of a dup Before I Need to create a table that represents this data. So you comes with a wizard helping you To create tables for example, I just give like the name of Sorry like the name of the new table I want to create I just put one of the file that I want to import So you is a bit smart and he will give you like a preview or what's the new table would you look like? So I can just quick spot that my separator looks good Then I will just next Put names in the columns and to save some time. You have like some nice shortcuts. You can Provide a list of column names and they will be already pre-filled Sometimes you need to double check the types of your data And I have two sample of rows that let you evaluate if your schema looks correct or not So then I create a table And I can still use like the table browser app in you to look at my tables Look at manually credit table even look at a sample of the data. I can quick sort the sample of the data Go or go directly to another column. So we try to improve the horizontal scrolling When I have my data I'm switching to the hive editor Which a little bit like pig provides syntax alighting we get The tables in your current is selected database or you can switch between databases. So Here are the fields in the review table So for example, I could get the text I'll just get five record Then in one shot I can semi-speak. It will submit the query report the progress On the left. I have a link to the map you see jobs And here is the data we support values Like all the work on table with all the hive properties, so if I want I could upload and use for example custom udf So I can add a bunch of jar in my script Then define a function My upper so we have a blog tutorial about this one But it basically will create a new to upper that transform to upper string text So I'm going to apply My udf on this one On the executed on same well the life progress you can share query you can save queries and Like you see you can upload Use udf quickly a nice example if you use go back to my More interesting example if I want to get the top 10 coolest reviews of restaurants That would be a script I could also use Impala so Impala is like a super fast hive which is developed by claudera I'm going to show you like the same query ought to get the top 10 coolest restaurants I'll show you the difference of speed So here I'm executing the hive Here I'm executing the Impala But I think I forgot to create it Okay, so that's a good story. I'm going to switch on this one and Yeah, so because Impala is not on this machine So yeah, so that's a quick and easy way you always use the lot for hive Now we have support for the new high server 2 on a new product called sentry That has full Provide full security. So that's for hive. So a nice query tool then because it's no SQL I'm going to demo a little bit the new HBase table So many people are using HBase But we added a new UI for example We can list tables in various clusters And provide a nice Search which is just searching on top of HBase So we provide This is a new table where for each restaurant. So the raw key is Idea of the restaurant or the name For each day. I'm going to have the average rating of these restaurants. So day by day. I will have contours that shows The average So you can scroll you can also horizontal scroll so HBase is like a Sparse sparse database. So we have a nicer layout where we can aggregate You can inline edit Data directly without having to deal with HBase API or in the shell We have without work cutting table or adding new rows There's a cooler sparse part of this app is really the search bar so we provide an Autocompete of the rookies So for example, I could select Apple Cafe He will directly pick up the records and do a scan I want the five next rookies after I put this rookie I will just add plus five then I will get five more data. I Can use column filtering? meaning if I want to have only for the month of March only data I Could also do I want from March to end of the year So put a hierarchy and to show I could also do until July So let's search into HBase directly You cannot brought sorry you cannot put data We can visualize Different type of data. So if I go in the cell editor, I will see the history of my cell I could upload pictures or any binary data showing up on the window and Yeah, so for people who want to have a quick look at their data we support also Avro You can preview PDF like in Firebrowser But so pretty nice app and everything is standard and just sits on top of HBase You just need the street service in HBase to be running then you just point To this URL and you will the table will just show up in your browser and then I'm going to finish by the last app which is a search app. So I have all these reviews on ratings I could use solar on top of HDFS to ingest all of my data Define a schema of the fields think like Google search And then search for my ratings. So by default I created a simple schema with metadata For example a bit like the I've table with like a bunch of ratings a bunch of text But I want to search for it and I want to customize a bit the look and feel So that's all the fields in my index But I'm going to show that we can create Something a bit more good-looking for example, I want to add The text of the review we show a preview of what the actual search will look like We could add for example also the rating Which is called stars on the left On go like use a we see weak editor throughout some building we support HTML at the end is just a HTML snippet of code for each result If you want to do advanced styling you can insert a custom style or you can custom javascript You can add solar facets. So facets as what you see sometimes on the left if I want to have Or like in Amazon like range categories of job of products or ratings You could add for example like a rating Facet Then I go back to my search and you see now I have the customized view if you plug it more with it Here is an example of template you could create With like the I click links to bubble maps. I Want to just rest around with five stars. So we'll put a filter on this Like I will look for sushi for example so I Like the search term so you can build your custom search engine and it depending on your use case It's a pretty powerful What to do many people knows SQL even more people knows how to search So that was it Excuse me. What are the three different colors? So here in the yep for each review people can Write if the review was helpful They can then they can say is the restaurant was cool Is the atmosphere or restaurant was funny? It's like we put some colors to show them up a little bit more But you could put your style your colors what you want So get you that calm we have a list of videos that shows in more details what I just presented and I'm also blog post On Yeah, if you have any question feel free to ask them now or follow up or on the user list. It's pretty active. Thank you