 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of DataVercity. We'd like to thank you for joining the DataVercity webinar today, improving advanced data prep and analytics and spreadsheets sponsored today by Altrix. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'll be collecting them by the Q&A in the bottom right-hand corner of your screen. If you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DataVercity. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the upper right-hand corner for that feature. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now let me introduce to you our speakers for today, Lisa Aguilera and Hassan Habibati. Lisa is a Senior Product Marketing Manager at Altrix. She is passionate about the analytics space and showing how to innovate analytic technology and how innovative technology can help analysts, I can say that well, move past mundane data tasks, evaluate their skills and expertise, and deliver ever-increasingly sophisticated insights. Hassan is a Solutions Engineer at Altrix. Hassan empowers customers by helping them get to insight fast. Hassan teaches data science-related topics at the University of California, Irvine. Hassan has a hands-on analytics experience and was instrumental in launching various analytic practices in the OCLA area. And with that, let me turn it over to Lisa and Hassan to get us started. Hello and welcome. Thank you so much. Thank you, Shannon. So as Shannon mentioned, I'm Lisa Aguilar and with me I have Hassan. I wanted to spend a few quick moments going through what we're going to be talking about today. And first and foremost, I want to say thank you so much for joining us and taking the time out of your busy schedules. We hope that you will be able to learn something interesting out of what we showcase today. Now throughout this webinar we're going to be focusing on a few things. We first and foremost recently commissioned a report by IDC and we are going to be going over some of the results of that report and really focusing on some of the most prevalent use cases, activities, and function analysts like yourself do when conducting analysis in spreadsheet-like software. Now we're going to take a quick look at the top barriers organizations face and then some of the recommendations that came out of this report. And finally my colleague Hassan here is going to be doing a quick demonstration on how you can augment your process and overcome some of the challenges when working with spreadsheets. With this report I just kind of want to level set and give a bit of a background before we dive into the details of this report and paint a picture of who participated in this survey. Feedback was aggregated from over 550 respondents across a wide range of users and they looked at the usage and challenges of advanced spreadsheet usage. And we looked at it across organizations, industries, and partners. It touched on a broad range of individuals within organizations so it wasn't skewed by one type of respondent or one size of organization. Now one of the more interesting pieces of information that was discovered from this report was the fact that self-service data press and analytics via spreadsheet-like solutions has been going on for years even prior to BI and visualization fads and it's something that still continues to be the case today. Now according to IDC, 8% of employees still engage with spreadsheets to do some sort of data preparation as their software of choice. The fuel of big data becoming mainstream has actually driven the desire to extract insights from all available data to brand new levels and is resulting in an extremely high demand for self-service data preparation and analysis by tech savvy business users that want to be able to access their data without having to depend on other departments which is also fueling the usage of spreadsheet analysis and advanced spreadsheet analysis in business units today. In addition to that, IT is actually playing a big role in this process in trying to understand and determine how as an internal organization they can help business units deliver results faster and answer the questions that impact employees and impact organizational directions quickly. Now when we look at the data of the IDC report and we look at the usage, three of the most frequent uses of spreadsheets focus on data integration, ad hoc data analysis, and data visualization. Now the typical analysis performed when using spreadsheets focuses on what if analysis, cleansing, pivoting, and prepping that data for presentations into something like a PowerPoint. Now of those common functions that analysts perform, at least on a weekly basis, a lot of the functions begin with summarizing data, manipulating it, blending that data, building what if statements, and then going into advanced statistical calculations informally. Now it may not seem like there's a lot there, it's a pretty common practice, but on average people who are doing this are spending about 26 hours per week working in spreadsheets. So finding a better way to perform these processes is actually critical. It's critical to your success, it's critical to your success if you're using spreadsheet-like software. And then it's critical to the insights that your organizations are depending on to leverage and make decisions. So how can we become more efficient? Well, it starts with some of the basic techniques that a lot of people do in beginning work with data. One of the first things that people do when they begin to work with data and spreadsheets is they make a copy of that data and paste it into a spreadsheet. Now sometimes it's done because you don't want to lose the original, sometimes it's done because it's the path of least resistance, and copying and pasting from a source is the easiest way to begin to actually work with that data. But copying and pasting data from the source into a spreadsheet is one of the biggest ways in working with data. So ask yourself, are you copying and pasting data from its original source into a spreadsheet before you've been working on it? If you are, you really need to take into consideration how wasteful this process is for you, as well as some of the insights that you're going to be developing. Now here's why. First, it contributes to wasteful repetition in the data preparation process. In addition, copying and pasting isn't even the most easiest way to get data into the source that you can use to manipulate it and work with it. It isn't the most accurate way to begin the process of working with that data. It actually greatly introduces a possibility of human error. Coupled with that, IT isn't necessarily a fan of this method either. They are responsible for data integrity, data security, and the copying and pasting data from a source can result in internal policy and external compliance violations. Now, most importantly, copying and pasting data from its source into a spreadsheet before you begin working on it actually rose the data integrity. You are not dealing with data at the source, and therefore it introduces a lack of ability to audit the data trail and validate the insights you have created. Just imagine if one of your managers or your executives discovered an error in a report that you were building and asked you to be as they could trace it back to the source and find out where it went wrong. Do you have to tell them that you copy and pasted data from a source and correctly into a spreadsheet and that's what introduced the error? It's a very uncomfortable conversation to ask, to have, and it's a very uncomfortable question to be asked why you're copying and pasting data from the source into spreadsheets to begin with. Now, in addition to data integrity issues that are introduced by copying and pasting data from the source, working with spreadsheets also introduces risk exposure. Now, those of you who work in organizations that are focused on governance and focus on Dany's lineage, you should be concerned with using spreadsheets for your data work. A lot of IT departments, when they hear self-service analytics, they often think of the wild, wild west and data marks and Excel spreadsheets running on muck and are they old, how relevant are they? And spreadsheets really contribute to this perception. In addition, as I mentioned earlier, spreadsheets introduce the human error aspect into the analysis process and many times over the news, I'm sure many of you have seen, there have been costly mistakes organizations have had to face due to the fact that there was a spreadsheet error. So, given all these concerns, why aren't more organizations turning to alternative solutions outside of spreadsheets? Now, the results of the IDC survey break it down into three major issues. It's revolving around compatibility issues with other systems within the organization that are already established and maintained. There are concerns about lengthy implementation times and how quickly can you actually get started in working with an alternative solution. And there's also concerns about high costs. So, spreadsheets tend to be free. Other alternative solutions tend to cost money. And how can you actually support that and fund that? Now, the reality of software applications is actually different from this perception. Applications today are actually more open than ever before. And software is actually becoming easier and more intuitive to use and install and get going with, reducing the learning curve and implementation times, if that's what you're worried about. It also may be more costly for organizations, as I mentioned earlier, introducing human error into spreadsheet solutions than it would be to pay for an alternative solution. The reality also is stand-alone spreadsheets never really solve the problem of preparing for data analytics. Spreadsheets are great if you're working with one or two forms of data. But when you start to expand that to three, or four, or eight, or nine, or up to 15 data sets, or on different worksheets you're trying to combine them, it's going to get difficult, not only for you to keep track of that data in those spreadsheets and those sources, but also to manipulate the data in the spreadsheet itself. You're going to get a lot of hanging up. You may get your Excel crashing. As a result, there has been an emergence of self-service analytic software, and that's putting the data preparation, data cleansing, and data merging of analytics into the hands of analysts without you having to go to different departments to get the data that you need, and without you having to turn to different departments to perform the insight that you actually need. Now, a lot of organizations say that well, these solutions are competing against free solutions. So let's take a little bit of a circle back to some of those numbers that I talked about earlier with IDC. Now, when I said that 8% of organizations and 8% of the data workers within organizations are actually using Excel or spreadsheet for their data work, I want you to take a look at what does that really mean. That means five million users in the United States alone and 16 million users in the rest of the world. But what does that also look like for you? There's a lot of people using spreadsheet like software to do analytics. What does that really mean in terms of what you do on a day-to-day basis? Now, when you look at the insights that I talked about earlier from IDC, that on average, 26 hours per week of working in the spreadsheet is wasted. Given a 40-hour work week, that equates to 65% of time wasted performing manual tasks. The majority of your week, if you think about it, and you're doing analytics in a spreadsheet, is spent wasted on manual aspects. 35% of this time that you're spending in your spreadsheet is wasted on just repeating simple efforts when data sources are updated. So not only are you spending a tremendous amount of your work week working with data in spreadsheets, but you're wasting even more, a greater amount of that time doing repetitive tasks. So what does this mean? It means that there is a huge population of data workers like yourself who are wasting a tremendous amount of time in their day in and out working with something that is a free solution. Now, when you try and quantify this, that is 1.3 billion hours wasted on duplicate work in the United States alone. That doesn't even account for the time spent on spreadsheet-type software for the other data tasks that I talked about that you guys typically perform. But the truly scary number is this one. The number of hours wasted in the U.S. alone is equivalent to flushing $60 billion a year on repetitive manual tasks and spreadsheets. This should really clearly paint the picture that free isn't always free. Of the 5 million advanced spreadsheets used in the U.S., eight of the 25 hours working spreadsheets is spent on during repetitive tasks, such as updating data. Based on a baseline salary estimate, that's 12,000 per worker per year just wasted because analysts are not able to automate routine manual process. Now, I can assure you if you or your manager walked into an executive meeting with an annual budget enhanced saying, we're going to allocate a couple million to repetitive manual tasks this year, I'm pretty sure that would not go over well. But the reality of it is it's happening and it's happening every day. It's just that no one's admitting it. Is it because of the true cost of wasting or repetitive tasks isn't really understood or that organizations really don't understand that free doesn't actually equal free when it comes to spreadsheet-like software for data analytic tasks? So what does IDC recommend based on the results of the survey? Users of standalone spreadsheets are unable to achieve the desired levels of integrity through the data preparation process and outcomes. And there are huge gaps between the importance of data and analysis and data integrity compared to what one is actually able to achieve in spreadsheet-like software. The big problem is the cost of maintaining the status quo of performing data work in spreadsheets is staggering. Free is not always free. IDC recommends that analysts and users consider self-service data preparation software and self-service data analytics software as alternative to spreadsheets. And to build business cases for self-service data analytics based on productivity cost savings. They also recommend that users explain the business case for self-service data preparation software by highlighting some of the additional benefits self-service data analytics software offers, which is better control, higher levels of data work and analysis and integrity, and it helps you to provide trust, availability, security, and compliance. Now, the purpose of this webinar is to give you a sense of the benefits of using an alternative solution over spreadsheets for data analytics. Spreadsheets are at the core of almost every single business, and they're not going to be replaced. The purpose here is really to highlight how an alternative workflow-type solution can both augment and help your processes and tasks that you may be struggling when you are working in a spreadsheet-like software to do your analytics. Now, before we can do that, let's do some brief information on why people use spreadsheets. So, most of the time, I'm sure most of you guys are using it because it's something that you're familiar with. It's easy, it's free, it's low cost, it's always available. A lot of times, people like what they can do with spreadsheets because they can instantly see the changes in the data that you're making. Now, typically, it'll cover almost everything that an analyst needs to do on a day-to-day basis for our data work. But as I mentioned earlier, there are really a lot of issues in working with spreadsheets. There's the productivity cost, which is huge when you start to quantify it. There's the ability of trying to get past two or more data sources. There are Excel role limitations for how much data you can actually work with within a spreadsheet. The other thing that tends to be a nightmare is collaboration within spreadsheets. I mean, imagine if somebody sent you a spreadsheet that had all of these formulas in them, are you going to be able to pick up that analytic cap and work it on yourself? Or is that spreadsheet going to have to kind of wait until you can sit down and have a conversation with the person who created and actually go through all the formulas and functions that are in it? The other thing is we have analysts tell us all the time, they don't trust what each other's spreadsheets and want to redo the work themselves. So those are some of the issues when working with spreadsheets. Now, what can be an alternative solution? Workflows are great augmentations to some of the spreadsheet working you may be doing. Workflows are a lot more transparent. They will allow you to understand what's taking place through each step of the analytic process, because it's self-documenting. And because it's self-documenting, others can quickly understand what's happened to the data quickly and easily, just by looking at icons and the workflow itself. The other thing that workflows do help with is because they're easy to understand, collaboration amongst analysts is actually quite easy. You can create an analytic insight, pass that on over to a colleague from a different department who can use it for their downstream process. The other thing is, in terms of visualizing the data, spreadsheets do allow you to see the data straight away. But with a workflow, you get to see the data at each step in the process by running that process. You understand exactly what's happening to your data while it's happening. Now, another reason why workflows are a great alternative is for the simple reason that a large percentage of your day-to-day job is repeatedly preparing and updating data when the data set changes on a daily, weekly, hourly, nightly, quarterly, or monthly basis. Having to do that work over and over again within a spreadsheet-like environment is time-consuming and tedious. And as we talked about earlier, extremely costly, not only for you and your productivity, but for your organizations as well. Workflows actually free you from the tedious tasks of producing the same reports over and over again. They allow you to build out a process that you want to perform on your data. And whenever that data changes, you simply point to the new data set and it runs through. It doesn't require any coding, and it allows you to actually move on from common data prep tasks to more advanced analytic work. When you update your data set, the other thing that workflows actually allow you to do is projectionalize your workflow and your analytic process. So in spreadsheets, this is typically done in tedious coding and visual basics or in building macros. And if you want to projectionalize your work, those are your only options. Workflows actually allow you to automate and schedule as data refreshes. And you don't have to worry about augmenting anything as data sets changes. If you're trying to do this in Excel, visual scripts or visual basics, you will actually have to merge or recode your workflows to get them to work together. With a drag and drop visual workflow, you can simply just add a secondary process and set everything on automatic. The other thing that workflows allow you to do is really embrace analytic flexibility. You're no longer really kept to where your data is stored or what it looks like. You don't have to worry about if your data is in a SQL server. You don't have to worry about if your data is in a Teradata. You don't have to worry if your data is in the cloud and AWS and having to write custom scripts and codes to extract that data out before you can perform analytics on it. Workflows actually break down those barriers and allow you to connect to the data at those sources and bring them together visually and intuitively without turning to another department to do it for yourself. With that, I'm going to turn this on over to Hassan, my colleague here, who's going to show you how a workflow process can help augment what you are currently doing in Excel. Thank you, Lisa. That was mind-boggling. Thank you. I'm going to go ahead and share my screen. I'm going to show you today how we would build one of these workflows that Lisa was mentioning to hopefully help save time and then get back some of that $60 billion that we are collectively wasting. What you're looking at is... This is Ulthwick's designer. Here on top, I have all the tools that I can use to visually build out a workflow and document a process that's going to describe how I'm going to prep, clean up, and blend my data. These tools are color-coded to make them easier to find, but they are also organized by functional areas. I have tools to allow me to input and output data in my workflow. I have tools to allow me to prepare my data, do some data cleansing, data filtering. I can also blend my different data sets together. I even have tools to do advanced things like predictive analytics, et cetera. Most of what I'm going to show you today is going to exist under this favorite step. In order to build my workflow, I'll just simply drag and drop these tools onto this open space here that we call the canvas. As I bring the tools to the canvas, the results of what I'm doing and how my data is being transformed will show up down here in this results window. This will allow me to see how each one of my tools is transforming my data and what my data looks like both before and after that tool was added to the workflow. Give me really transparency and data lineage. Over here on the left is the configuration window. As I bring each one of the tools, you will see that the configuration window changes, allowing me to configure that particular tool. For our workflow today that we want to build together, let's say I have customers all over the country and then I have different regions and then I have my salespeople or the different regions send me their data and region-specific files. I want to bring those files in together, make them into one dataset, clean that dataset, and maybe do some analysis on it. I noticed that I'm contributing to that money loss that is $60 billion by doing this manually month, week over week. If some new data comes in, I have to do the same repeatable task and I would like to automate this a little bit. I'm going to go ahead and build a workflow to help me with that. In order to bring in some data, the first thing you would do is bring in this input data tool onto the canvas. Notice the configuration window changed allowing me to configure this input data tool. We can bring in data as Lisa mentioned from many different types and really just making it easier for you to get in a database connection. We have a wide variety of databases. As long as you have an ODPC, an OLA, a DB driver, and then you have access to that database, we will allow you to bring in that data. But let's say I have my data in the file. Let me go ahead and navigate over to where my data is. And then let's say these are the different regions and that's how my data comes in. If I go over here to show the types. Of course, not everybody follows the same rules. Some people send me their data as a comma-separated CSV file. Some people send me their data as an Excel. Which is fine. Altrix will help me with that as well. But let's say I want to bring in this first comma-delimited CSV file representing region one. I have about eight of those and then I have an Excel sheet that has the same type of data. What I'm going to do, I selected R1 and I'm going to go ahead and click open. And you can see right away, Altrix was able to bring in that data. It shows me a sample down here in the configuration window, just letting me know that it was able to bring in some of that data. But I want to bring in more than that. I don't want to have to create another input tool for each one of those files. I'm going to go over here to my region file. I'm going to replace the one. Instead of R1, I'm going to use a wildcard and now if I click run, Altrix is a runtime environment so when I click run, it will actually go ahead and process my workflow. Notice now I have over 2,356 records that were brought in. If I look at the messages, it's actually going to show me all the files. I was able to bring in R1 through R8 in one input tool. So now I want to bring in that one additional file that is actually type as an Excel sheet. I'll go ahead and drag and drop another input tool onto the canvas and this time I'm going to have to get over and grab that Excel worksheet. R10. I'll select the sheet name and now you can see that I have my data down here. It looks a little bit different again. Some people do things differently and I have their different system so it looks like here that my data is actually not as clean as the common delimited file. First of all I notice that my force row actually has data and I'm going to select this indicating that force row contains data. But now I need to clean this up a little bit. It looks like my column headers are not where I want them to be so I'm going to actually have Altrux skip the first two rows trying to get me to the data that I really want. In order to do that I'm going to use a sample tool and I'm going to tell Altrux to actually go ahead and skip the first two rows of data and then we can examine our app and see if that's the desired result or not and that's what's really beautiful about having a visual way to build your workflow. You're actually kind of going about it doing it and seeing is this is what I wanted so notice what happened. Now my data, my customer which looks like column headings are actually in the first row of data actually what I wanted because now I have a tool that's going to help me clean that up even further. I know I need to now rename these columns with this force row of data but let's say I'm new to Altrux I'm not really sure how to do it and I'm just learning. I'm going to go up here in this global search window and I'm going to just type the word rename. Notice Altrux searches any tools within the platform that will allow you that will help you with the term that you're searching for but also searches online help documentation and Altrux community made up of customers Altrux associates and employees and we're always pushing out content to help our customers but let's say you find a tool that you like this dynamic rename is the one I'm looking for you can open up an example and actually see how that tool is being used and what's really, really amazing is I can actually find a tool that I like I can actually drag and drop that tool right onto the canvas not even having to worry about where that tool is and I think that's amazing because what that does to me it keeps me in the process it keeps me thinking about the task at hand I don't get distracted by going out and doing a Google search or anything I find a tool, I like it it looks promising I'm just going to go ahead and use it in my workflow dynamic rename tool rename my data there's several options I'm going to select the one that says take field names from the force row of data and then I'm going to click run to have it process my data and examine my results you can see that now my column headings actually came from that force row of data that was buried somewhere in my data set so at this point I have these two data sets here they, you know, they I'm ready to bring them together regions 1 through 8 this is region 10 1 through 9 and 10 I guess but these are all my regions and now I'm ready to stack these results together in one data set I'm going to go ahead and bring the union tool the union tool is going to allow me to do exactly that stack my data together in one data set over here on the left I'm going to select the manual even if all my columns are the same name you can have the auto-config as a best practice I like to do the manual config and then just look at my data and see if it's lined up correctly and it looks like most of my columns are lined up first name, last name are called differently again, not everybody is following the same rules and we all are humans so I'm going to take these two columns here first name, last name I'm going to simply just rearrange my columns to have them lined up so I'm not copying stuff in Excel I'm not, you know, control C control V and making mistakes on where I insert the column I simply aligned all my columns to make sure they're aligned correctly now if I click run I can now I have if I click on output coming out of the union I have 2,678 records representing all my regions you can see the region column over here so now at this point I have, this looks like a decent data set I'm going to take a further look at it before I build a workflow and invest my time I want to save some time by doing some data quality, data profiling on this data before I go ahead and use it I'm going to drag this Browstool over here and then hit run again the Browstool is going to allow me to get really quick and very valuable data profiling information about the data that I just bought in notice my columns are now color coded really indicating the health of this data this is demo data so it's very, very clean but if you hover over the columns it tells you how much, you know if it's nulls, if there's how many null values are there if my data is okay or not as you select a column you can come over here to the left and you see how many unique values are there so this way you're kind of getting this health check on your data before you invest your valuable time building a workflow and then realizing you have bad data but let's say that happens and it happens sometimes and then you realize there's some things here that I need to pay attention to as I mentioned this is a very clean data set we offer a data cleansing tool I'm just going to go ahead and bring a data cleansing tool on here the data cleansing tool allows you to do very common and often repeated data cleansing operations things like what do I do with nulls without replacing with blanks or zeros I have unwanted characters in my data set maybe if I came from excel I have a lot of white spaces maybe if I came from the web and I did some web scraping I have line breaks and tabs and things like that maybe there are letters where they're not supposed to be maybe people are putting numbers in the same field and it's not supposed I can clean all of that up and it's really nice but I can do it all in one tool I can even modify the case if I want to make everything lower case or upper case I can actually hit run now and what's really nice about these tools that I'm adding notice there's an incoming anchor and outgoing anchor out of each one of these tools the incoming anchor tells you exactly what the data looked like before this tool worked on it and the outgoing anchor the arrowhead shows you what happened to the tool after it was processed to the data after it was processed by this tool so you can see my first name, last name there was kind of a mixed bag of different upper case I guess they're all upper case and now they're all turned into lower case because that's what I told it to do my cleansing tool so there are different options for you as well so let's say at this point everything came in from a power in Excel and I can click on this metadata down here to show me kind of the metadata of what I have and it looks like everything came in as a string which is very common sometimes and you may have to change some of them to numbers but let's say you don't want to have to worry about this you want Alt-Rex to do that for you we have a tool called Autofield that really tells Alt-Rex what to want to select the best field for me based on the data that is contained in that field so it's going to select the most optimal size and the right type of the field for that column based on the data the one exception I make is I always unselect the zip file just in case I have some zips that are starting with zeros, I have leading zeros and I don't want to really touch those so other than that it will work very nice on the rest of my data types now if I click run and I want to see what this Autofield tool did, I can click on the tail coming in, click on the metadata and we saw everything was coming in as a string, now if I click on the arrowhead coming out I can see I now have a mixed bag of different types, there's some number there's some strings and Alt-Rex selected the best size based on the data that was coming in and this is dynamic so if I have new data coming in and this is part of my workflow and Alt-Rex needs to adjust this it's actually doing this every time you run the workflow so if you have new data that's coming in this step will be automated for you at this point these are all my customer related information coming in from all my different regions I'm going to go ahead and bring in some transaction data at this point and I want to show you how we do that connecting to a database rather than a mail so I'm going to go back to my favorite step I'm going to bring in another input data tool, at least I mentioned there's really no artificial limits of how many of these input data tools you can have or how many rows or things like that, this time I'm going to go ahead and connect to a SQL server we can connect to many different types of databases as I mentioned as long as you have the driver and then you can configure it as chances are we'll be able to create a connection for you, so you can enter that database I have a SQL server running here on my demo machine I'm going to go ahead and connect to my SQL server and I'm going to bring in a transactions file I can bring in this file as is our transactions table in this case I can select some of the columns if I have a stored procedure that I normally use to bring in my data I can leverage that in here I have to throw it away if I want to write my own SQL code I can do that in here as well but for the most part if you want to keep this as a code free environment you know you can do that so at this point I'm going to bring all my transactions I'm going to click okay and update sample to see kind of what's in here so I have things about customer ID, order ID, store number, I have some order dates, some shipping dates, shipping methods etc. so altrux is what I would call dates and time aware so it knows what time it is you can do comparisons of time and dates and things like that so let's say I want to try to figure out how long does it take to maybe ship each one of these orders you know where we notice our business is down a little bit and I want to develop some intelligence on you know the causes of that maybe it's taking too long to ship the orders and that's hurting our sales so I'm going to calculate how long it takes to ship an order in order to add any formulas or business logic to your workflow you would use this formula tool the formula tool allows you to update any existing columns or you can add your own a new column if you would like in this case I'm going to add a new column and I'm going to call it something like shipping time because I need to understand how long is it taking to actually ship my product so I'm going to compare kind of the time that a customer ordered a product and when it was actually shipped to them so I'm going to use a function if you click on this FX button here there's a lot of functions that are available to you including date time functions and then we have something called date time def somewhere over here I can select it or I can actually date time def should be over here okay I forgot my alphabet there and I can compare so date one is going to be my shipping date so I'm just fast-talk typing it shipping date I selected my second date is going to be my order date so that's when they ordered it and I'm going to say you know give me the difference in days I'll just type days here so this is going to calculate the duration between my shipping date and my order date now if I click run I'm going to see that I have this additional column and my data set over here if I flip over to the data let me just make this a little bit bigger so now I have shipping time indicating the number of days that it's taking to ship this product so this is great so now I want to see how if this is affecting the customers I want to kind of develop some aggregations on a customer level I'm going to go ahead and bring this and connect it to my data stream and the summarizer allows you to aggregate on whatever fields you make sense for you so I'm going to take the customer ID and I'm going to say you know group my customer ID because I want to understand what is happening on the customer level but I also want to know how many times this customer shows up in my transaction file so if I say count I need to say count and so that's going to count how many times this customer ID shows up and I can say this represents the transactions I also want to see the shipping time on the customer so that's the new field that I created I'm going to take shipping time and then I'm going to say you know what why don't you just average the shipping time for me notice I can't find the average function here because I missed the step notice my shipping time is the string so I can because I have a visual workflow I know what the briefest step is I know where I calculated the shipping time so imagine doing this in Excel and figuring out oh my god where did this go wrong and what did I do I can come back here and really no sweat just changes to a double and then now I fix this error so imagine compare doing that in Excel and how what a nightmare that would have been I'm going to go over here to an American and I'm going to say okay the average shipping time for the customer and while you add it I'm going to take my sales and actually I'm going to sum the sales for the customer so simply by selecting the fields and saying how I want to aggregate them I'm able to create a summary now if I run I'm going to show you what the results look like so now for this customer ID 2 he or she has one transaction average shipping time is 2 and then total sales or you know is $5.76 so that's great so at this point I want to see okay what is when was the last time I had an interaction with this customer so I want to calculate the number of days since the last order so I want to try to find what is the most recent date I can do that by actually go in the summary and then finding my order date which is one of my columns and I'm going to say what is the max order date what is the latest or the most current I guess order date that I have that this client placed an order and now I'm going to take this max order date I'm going to build a formula around it actually let me go ahead and run show you that so I have max order date here now I'm going to provide a formula another formula tool and this time I'm going to calculate the number of days since I last interacted with this customer and maybe this is meaningful for me I'm going to develop some intelligence maybe build a predictive model or something like that so I'm going to use that date time so I'm going to create a new column it's called days since last order something like that or last transaction the same date time def function that I did in the first formula this time date one is going to be now so I'm going to say date time now select that function date two is going to be this max order date that I just got in my summary so that represents the last time this customer placed an order and now I'm going to get the difference in days and then you can see in the data preview which shows me kind of a calculated result based on the first row of data just giving me a warm up fuzzy and letting me know that this data is you know it calculates correctly and I have no errors now if I click run I'm going to see that I now have a new field called days since last order and now I can actually do something with this see you know what is going on and you know sorted or trying to figure out what is going on with these customers but what I have so far I have kind of two sets one going up here on top representing you know kind of customer data I have transaction data going here on the bottom at this point I want to try to join these two data sets together this is very similar to how you would do like a field lookup and excel or you know or join in a database let me show you how much how simple this is and how easy it is to fix any problems that you may have so in order to do I join I'm going to go into all and just bring these two data sets together and I have to tell altrux how do I want to join them I'm going to join them on customer ID so I select customer ID from both the left and the right coming into the join I have a left side and the right side allowing me that data lineage so I can always trace my steps back and see where the data came from coming out of the join I have three outputs there represented very well here in this VIN diagram I have the J output which is my inner join I have my right unjoined so these are records in the right input that did not match and I have my left unjoined so this would be records in my left input that did not match down here I have total transparency in terms of what is coming in I see the columns that came in from the left I see the columns that came in from the right I can deselect some of the columns here if I would like customer ID is coming in otherwise because it's my joining field I can actually deselect that if I want I can deselect other columns I can rename some of these fields if I would like so let's say I want to take this max order field and maybe call it something else like class order so I can rename this whatever makes sense for my flow now I'm going to click run I'm going to show you the output from the join tool and really keep in mind kind of when you do a join up in Excel the kind of information that you have here that you're going to have in Excel of course we have the J which is our inner join so this represents the combined data set they can see my customer information you can also see my transaction information all in one data set joined on the customer ID but what I see additionally I see what did not join so on the right nothing I have zero here but I look on the left I see the ruckus did not join what's really nice about this I can actually take this and develop some intelligence about it so these represent customers that did not have any transactions maybe I can do more outreach more marketing to these customers so I don't lose this output here whatever fell out of the join I can grab it I can build a whole section of the workflow to handle this let's say I want to be I want to automate this workflow it's going to run on a schedule and I want to be alerted if this happens I want to be alerted if any ruckus come out of the left side here and I want to be notified so I'm going to use something called a test tools I'll go up here I find a test tool I'm just going to show you this real quick and then I'll just take it out I'm going to add a test tool so the test tool allows you to test for a condition I'm going to come up here and say you know add a new event no transactions I'm going to say you know alert me if the ruckus are greater than zero so what's going to happen now if I run my workflow and it finds that something came out of the L notice the workflow breaks because that condition is as broken that condition was met so now what I can do I can actually as far automating my workflow I can come here to events and say you know what add a new event and then send me an email I can send me an email when this workflow run with errors so now I'm notified I'm kept in the loop that something happened I haven't have an error so this you know so now I can automate my workflow and be notified when something goes wrong I don't want to do any of this so let me just go ahead and just remove this I just wanted to show you what it would look like if you choose to kind of automate your workflow so now at this point coming out of the join I have my data set combined data set let's say now I want to now that had cleansed this data it's combined it looks nice I'm going to go ahead and output it to maybe an Excel sheet and I want to do different tabs for each one of the regions that came in so the way to do that I'm going to get an output data tool and just add it to my workflow and I'm going to actually output to a flat file and I'm going to say webinar output or something like that maybe this is my customer list and I'm going to go ahead and select Excel let's see I have the different types I can select this one I guess and click save I can give it a sheet name but what I want to do I want to make the sheet name dynamic I'm going to say take the file table name from a field and I'm going to say I changed the file or table name and now I'm going to select a field in my database to drive the name of the sheet and in this case it's going to be my region so I select region here and now if I run you can see that I created an Excel sheet and I'm going to go ahead and see that I created an Excel sheet R1, R2, R3 I can actually open one of these drag it over here and you can see I was able to output the summary that I created this is all my cleaned up data and then I select a different option here I can see maybe I selected different option change file table name let me go back table name maybe because I have it sorry about that okay let me give it another name it's called my report it's going to be Excel give it a sheet name now click run and it's running so that's what happens when you're running something live and you are a human okay there you go so now it's creating my report Excel with these different sheets but let's say this is not how I finally want this maybe I'm using some visualization tool and I want to output this to a TDE file or maybe something to visualize it in another visualization tool what's really nice about this is I'm not tied down to one output I can take the same data and I can output it to different ways and I find this very handy with our customers because some of them say well I want somebody but my executives they want to see a dashboard or something like that I can definitely output this to somewhere else another file type and in this case I'm going to just go ahead and select the TDE file which is a tab load data extract and then give it a name I should say T output and then just go ahead and run it and you can see in my messages now it says 2272 records were written to this tableau TDE file now I can open this up in tableau I can build my workbook and I can visualize my data so at this point now that I have automated prepping and blending the data together cleaning it from all the different regions that are coming in instead of spending my 26 hours a week that I use on prep and blend and cleaning up this data I actually now have time to do analytics that will bring value to my organization and as you notice the more I automate the more time I'm going to have to actually go deeper and add even more analytics maybe at this point I can leverage some of the predictive tools the Ulteryx has maybe I can do spatial analytics or drive time analytics you know to find out how far my customers live from my stores or things like that so really very quickly on you know I want to predict what is the likelihood of somebody responding to a marketing campaign and my data I have the suspended fields I have been collecting responder information on my customers maybe I want to predict the likelihood of them responding in the future I'm just doing this at a very high level just to show you that these predictive analytics models are ready to go for you out of the box I'm just going to connect my data to the model and I can give it a name it's going to you know something like marketing model and I'm going to select what is it that I want to predict so it's going to be my responder field whether they will respond to a marketing campaign I can select the variables that influence that field maybe the number of transactions maybe the average shipping time maybe how much time they spend with me they have to drive to get to my store so once I select these variables I can actually run and now I have a predictive analytics model this built out of the box this is airing out because it's I don't have it on overwrite so now I can do and you can see the more I build the more time I have to do to do other things so and if you're interested in this altrux has many analytics kits to help you get started with advanced analytics if you go here under the help menu and you look at sample workflow you know you see how you can do things like presentations we can all come and see some of these kits starter kits for analytics as a starter kit for things like predictive analytics learn how to do you know different models a force model linear regression logistic regression these are all here as targets and all of these starter kits can be downloaded from a website that will be altrux.com and then slash kits I really hope you've enjoyed this I hope you've seen kind of how you can save and don't spend all this time on doing data prep and blend and actually move over and start doing some analytics Shannon I'm going to send it back to you see if there are any questions would you mind quickly just handing me the Q&A? yes I will do that somebody stole the ball from me there you go I think you have it you should do it would you mind passing me that ball while you do the Q&A? I thought I'd pass it to you it looks like you just send it to me there you go great so we're going to get to the Q&A here the presentation will be made available to the rest of you guys who did see this presentation today with that let me pass this on to Shannon who will come we'll run through some of our Q&A just to let everybody know to answer some of the most popular questions as Lisa mentioned I will send a follow-up email by end of day Thursday with links to the slides the recording and there's a lot of fantastic questions coming in so if we don't have a good chance to I'll try to get some answers and maybe we can get some of those in the follow-up email as well so keep them coming in but we'll we've got just a few minutes here to get to some of the great questions so let me ask you know is this sql engine on excel files and then what's the difference with microsoft access okay well if I understand the question correctly you know altrux is really database agnostic we can actually connect to any database so if you bring in stuff from excel from access from cql once you bring it into altrux it all looks the same to us after that point so there's some limitations with the database you have to have the driver installed and configured we respect existing database permissions so if you have permissions to that database or you have that data in an excel sheet well I like to connect to it so what's the volume of data altrux designer can handle before running into performance issues the only limitation that you know and we have customers that use it in really great ways and bring in a lot of data up to a billion records and my colleague who sits next to me at her previous shop using altrux she's a former user she was able to bring in up to a billion records would it be right on her laptop company issued all the limitation is your own physical machine and memory and hard drive that you have available so we don't forget we also have in database processing so if you're concerned if you're if you have big data characteristics type of data you can run this and then write in the database itself so we have in database processing as well. We have a few questions here so do you do the predictive functions come included with altrux or are those an additional cost and can you import Excel formulas into altrux. So regarding the predictive you have to download the predictive star kit to get the predictive categories what's important about this is they're all written they're all based on R and then you can actually open the predictive tools and edit the R code if you wanted to do something specialized or more advanced or something that's specific to you. I think the second question was about Excel formulas we unfortunately you have to rewrite some of those formulas in altrux if you have some custom formulas that you are using in your data processing currently. I'm afraid that is all the time that we have for what a great presentation Lisa and Hasan thank you so much and thanks to our attendees for everything we do. Again we didn't have time to get to a lot of the questions today but we will get those over to altrux too we'll help follow up with those answers and get those to you if you want to keep the questions coming in I will make sure and get those over appropriately. And again just a reminder to everyone I will send a follow-up email by end of day Thursday with links to the slides, links to recording and additional information again thanks to altrux and Lisa and Hasan thank you so much.