 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of DataVersity. We'd like to thank you for joining the DataVersity webinar today, improving advanced data prep and analytics and spreadsheets sponsored today by Altrix. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'll be collecting them by the Q&A in the bottom right-hand corner of your screen. If you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DataVersity. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the upper right-hand corner for that feature. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now let me introduce to you our speakers for today, Lisa Aguilera and Hassan Habibati. Lisa is a Senior Product Marketing Manager at Altrix. She is passionate about the analytics space and showing how to innovate analytic technology and how innovative technology can help analysts, I can say that well, move past mundane data tasks, evaluate their skills and expertise, and deliver ever-increasingly sophisticated insights. Hassan is a Solutions Engineer at Altrix. Hassan empowers customers by helping them get to insight fast. Hassan teaches data science-related topics at the University of California, Irvine. Hassan has a hands-on analytics experience and was instrumental in launching various analytic practices in the OCLA area. And with that, let me turn it over to Lisa and Hassan to get us started. Hello and welcome. Thank you so much. Thank you, Shannon. So as Shannon mentioned, I'm Lisa Aguilar, and with me I have Hassan. I wanted to spend a few quick moments going through what we're going to be talking about today. And first and foremost, I want to say thank you so much for joining us and taking the time out of your busy schedules. We hope that you will be able to learn something interesting out of what we showcase today. Now throughout this webinar, we're going to be focusing on a few things. We first and foremost recently commissioned a report by IDC, and we are going to be going over some of the results of that report and really focusing on some of the most prevalent use cases, activities, and function analysts like yourself do when conducting analysis in spreadsheet-like software. Then we're going to take a quick look at the top barriers organizations face and then some of the recommendations that came out of this report. And finally, my colleague Hassan here is going to be doing a quick demonstration on how you can augment your process and overcome some of the challenges when working with spreadsheets. So with this report, I just kind of want to level-set and give a bit of a background before we dive into the details of this report and paint a picture of who participated in this survey. Feedback was aggregated from over 550 respondents across a wide range of users, and they looked at the usage and challenges of advanced spreadsheet usage. And we looked at it across organizations, industries, and partners. It touched on a broad range of individuals within organizations, so it wasn't skewed by one type of respondent or one size of organization. Now, one of the more interesting pieces of information that was discovered from this report was the fact that self-service data press and analytics via spreadsheet-like solutions has been going on for years, even prior to BI and visualization fads, and it's something that still continues to be the case today. Now, according to IDC, 8% of employees still engage with spreadsheets to do some sort of data preparation as their software of choice. The fuel of big data becoming mainstream has actually driven the desire to extract insights from all available data to brand new levels and is resulting in an extremely high demand for self-service data preparation and analysis by tech-stabby business users that want to be able to access their data without having to depend on other departments, which is also fueling the usage of spreadsheet analysis and advanced spreadsheet analysis in business units today. In addition to that, IT is actually playing a big role in this process and trying to understand and determine how, as an internal organization, they can help business units deliver results faster and answer the questions that impact employees and impact organizational directions quickly. Now, when we look at the data of the IDC report and we look at the usage, three of the most frequent uses of spreadsheets focus on data integration, ad hoc data analysis, and data visualization. Now, the typical analysis performed when using spreadsheets focuses on what if analysis, cleansing, pivoting, and prepping that data for presentations into something like a PowerPoint. Now, of those common functions that analysts perform, at least on a weekly basis, a lot of the functions begin with summarizing data, manipulating it, blending that data, building what if statements, and then going into advanced statistical calculations informally. Now, it may not seem like there's a lot there. It's a pretty common practice. But on average, people who are doing this are spending about 26 hours per week working in spreadsheets. So finding a better way to perform these processes is actually critical. It's critical to your success. It's critical to your success if you're using spreadsheet-like software. And then it's critical to the insights that your organizations are depending on to leverage and make decisions. So how can we become more efficient? Well, it starts with some of the basic techniques that a lot of people do in beginning work with data. One of the first things that people do when they begin to work with data and spreadsheets is they make a copy of that data and paste it into a spreadsheet. Now, sometimes it's done because you don't want to lose the original. Sometimes it's done because it's the path of least resistance, and copying and pasting from a source is the easiest way to begin to actually work with that data. But copying and pasting data from the source into a spreadsheet is one of the biggest ways in working with data. So ask yourself, are you copying and pasting data from its original source into a spreadsheet before you've been working on it? If you are, you really need to take into consideration how wasteful this process is for you, as well as some of the insights that you're going to be developing. Now, here's why. First, it contributes to wasteful repetition in the data preparation process. In addition, copying and pasting is, even though it's the most easiest way to get data into the source that you can use to manipulate it and work with it, it isn't the most accurate way to begin the process of working with that data. It actually greatly introduces a possibility of human error. Coupled with that, IT isn't necessarily a fan of this method either. They are responsible for data integrity, data security, and the copying and pasting data from a source can result in internal policy and external compliance violations. Now, most importantly, copying and pasting data from its source into a spreadsheet before you begin working on it actually rose the data integrity. You are not dealing with data at the source, and therefore it introduces a lack of ability to audit the data trail and validate the insights you have created. Just imagine if one of your managers or your executives discovered an error in a report that you were building and asked you to be, as they could trace it back to the source and find out where it went wrong. Do you have to tell them that you copy and pasted data from a source incorrectly into a spreadsheet, and that's what introduced the error? It's a very uncomfortable conversation to ask, to have, and it's a very uncomfortable question to be asked why you're copying and pasting data from the source into spreadsheets to begin with. Now, in addition to data integrity issues that are introduced by copying and pasting data from the source, working with spreadsheet also introduces risk exposure. Now, those of you who work in organizations that are focused on governance and focus on Dany's lineage, you should be concerned with using spreadsheets for your data work. A lot of IT departments, when they hear self-service analytics, they often think of the Wild Wild West and Data Mart and Excel spreadsheets running amok. And are they old? How relevant are they? And spreadsheets really contribute to this perception. In addition, as I mentioned earlier, spreadsheets introduce the human error aspect into the analysis process. And many times, over the news, I'm sure many of you have seen, there have been costly mistakes organizations have had to face due to the fact that there was a spreadsheet error. So, given all these concerns, why aren't more organizations turning to alternative solutions outside of spreadsheets? Now, the results of the IDC survey break it down into three major issues. It's revolving around compatibility issues with other systems within the organization that are already established and maintained. There are concerns about lengthy implementation times and how quickly can you actually get started in working with an alternative solution. And there's also concerns about high costs. So, spreadsheets tend to be free. Other alternative solutions tend to cost money. And how can you actually support that and fund that? Now, the reality of software applications is actually different from this perception. Applications today are actually more open than ever before. And software is actually becoming easier and more intuitive to use and install and get going with, reducing the learning curve and implementation times, if that's what you're worried about. It also may be more costly for organizations, as I mentioned earlier, introducing human error into spreadsheet solutions than it would be to pay for an alternative solution. The reality also is stand-alone spreadsheets never really solve the problem of preparing for data analytics. Spreadsheets are great if you're working with one or two forms of data. But when you start to expand that to three or four or eight or nine or up to 15 data sets or on different worksheets, you're trying to combine them, it's going to get difficult. Not only for you to keep track of that data in those spreadsheets and those sources, but also to manipulate the data in the spreadsheet itself. You're going to get a lot of hanging up. You may get yourself crashing. As a result, there has been an emergence of self-service analytic software, and that's putting the data preparation, data cleansing and data merging of analytics into the hands of analysts without you having to go to different departments to get the data that you need and without you having to turn to different departments to perform the insight that you actually need. Now, a lot of organizations say that well, these solutions are competing against free solutions. So let's take a little bit of a circle back to some of those numbers that I talked about earlier with IDC. Now, when I said that 8% of organizations and 8% of the data workers within organizations are actually using Excel or Spreadsheet for their data work, I want you to take a look at what does that really mean? That means five million users in the United States alone and 16 million users in the rest of the world. But what does that also look like for you? There's a lot of people using Spreadsheet like software to do analytics. What does that really mean in terms of what you do on a day-to-day basis? Now, when you look at the insights that I talked about earlier from IDC, that on average, 26 hours per week of working in Spreadsheet is wasted. Given a 40-hour work week, that equates to 65% of time wasted performing manual tasks. The majority of your week, if you think about it, and you're doing analytics in Spreadsheet, is spent wasted on manual aspects. 35% of this time that you're spending in your Spreadsheet is wasted on just repeating simple efforts when data sources are updated. So not only are you spending a tremendous amount of your work week working with data in Spreadsheet, but you're wasting even more, a greater amount of that time doing repetitive tasks. So what does this mean? It means that there is a huge population of data workers like yourself who are wasting a tremendous amount of time in their day in and out working with something that is a free solution. Now, when you try and quantify this, that is 1.3 billion hours wasted on duplicate work in the United States alone. That doesn't even account for the time spent on spreadsheet type software for the other data tasks that I talked about that you guys typically perform. But the truly scary number is this one. The number of hours wasted in the U.S. alone is equivalent to flushing $60 billion a year on repetitive manual tasks in Spreadsheet. This should really clearly paint a picture that free isn't always free. Of the 5 million advanced Spreadsheet users in the U.S., eight of the 25 hours working in Spreadsheet is spent on during repetitive tasks, such as updating data. Based on a baseline salary estimate, that's 12,000 per worker per year just wasted because analysts are not able to automate routine manual process. Now, I can assure you if you or your manager walked into an executive meeting with an annual budget enhanced saying, we're going to allocate a couple million to repetitive manual tasks this year, I'm pretty sure that would not go over well. But the reality of it is it's happening and it's happening every day. It's just that no one's admitting it. Is it because of the true cost of wasting a repetitive task isn't really understood? Or that organizations really don't understand that free doesn't actually equal free when it comes to Spreadsheet-like software for data analytic tasks? So what does IDC recommend based on the results of the survey? Users of standalone Spreadsheets are unable to achieve the desired levels of integrity through the data preparation process and outcomes. And there are huge gaps between the importance of data and analysis and data integrity compared to what one is actually able to achieve in Spreadsheet-like software. The big problem is the cost of maintaining the status quo of performing data work in Spreadsheet is staggering. Free is not always free. IDC recommends that analysts and users consider self-service data preparation software and self-service data analytic software as alternative to Spreadsheet. And to build business cases for self-service data analytics based on productivity cost savings. They also recommend that users explain the business case for self-service data preparation software by highlighting some of the additional benefits self-service data analytic software offers, which is better control, higher levels of data work and analysis and integrity, and it helps you to provide trust, availability, security, and compliance. Now the purpose of this webinar is to give you a sense of the benefits of using an alternative solution over Spreadsheet for data analytics. Spreadsheets are at the core of almost every single business and they're not going to be replaced. The purpose here is really to highlight how an alternative workflow-type solution can both augment and help your processes and tasks that you may be struggling when you are working in a Spreadsheet like software to do your analytics. Now before we can do that, let's do some brief information on why people use Spreadsheets. So most of the time, I'm sure most of you guys are using it because it's something that you're familiar with. It's easy, it's free, it's low cost, it's always available. A lot of times people like what they can do with Spreadsheets because they can instantly see the changes in the data that you're making. Now typically, it'll cover almost everything that an analyst needs to do on a day-to-day basis for our data work. But as I mentioned earlier, there are really a lot of issues in working with Spreadsheets. There's the productivity cost, which is huge when you start to quantify it. There's the ability of trying to get past two or more data sources. There are Excel role limitations for how much data you can actually work with within a Spreadsheet. The other thing that tends to be a nightmare is collaboration within Spreadsheets. I mean, imagine if somebody sent you a Spreadsheet that had all of these formulas in them, are you going to be able to pick up that analytic cap and work it on yourself? Or is that Spreadsheet going to have to kind of wait until you can sit down and have a conversation with the person who created and actually go through all the formulas and functions that are in it? The other thing is we have analysts tell us all the time what each other's Spreadsheets and want to redo the work themselves. So those are some of the issues when working with Spreadsheets. Now, what can be an alternative solution? Workflows are great augmentations to some of the Spreadsheet work you may be doing. Workflows are a lot more transparent. They will allow you to understand what's taking place through each step of your analytic process, because it's self-documenting. And because it's self-documenting, others can quickly understand what's happened to the data quickly and easily, just by looking at icons and the work flow itself. The other thing that Workflows do help with is because they're easy to understand, collaboration amongst analysts is actually quite easy. You can create an analytic insight, pass that on over to a colleague who sits in a different department who can use it for their downstream process. The other thing is, in terms of visualizing the data, Spreadsheets do allow you to see the data straight away. But with a workflow, you get to see the data at each step in the process by running that process. You understand exactly what's happening to your data while it's happening. Now, another reason why Workflows are a great alternative is for the simple reason that a large percentage of your day-to-day job is repeatedly preparing and updating data when the data set changes on a daily, weekly, hourly, nightly, quarter-year, monthly basis. Having to do that work over and over again within a spreadsheet-like environment is time-consuming and tedious. And as we talked about earlier, extremely costly, not only for you and your productivity, but for your organizations as well. Workflows actually free you from the tedious tasks of producing the same reports over and over again. They allow you to build out a process that you want to perform on your data. And whenever that data changes, you simply point to the new data set and it runs through. It doesn't require any coding, and it allows you to actually move on from common data prep tasks to more advanced analytic work. When you update your data set, the other thing that Workflows actually allow you to do is productionalize your workflow and your analytic process. So new spreadsheets, this is typically done in tedious coding and visual basics or in building macros. And if you want to productionalize your work, that's those are your only options. Workflows actually allow you to automate and schedule as data refreshes. And you don't have to worry about augmenting anything as data sets changes. If you're trying to do this in Excel, visual scripts or visual basics, you will actually have to merge or recode your workflows to get them to work together. With a drag and drop visual workflow, you can simply just add a secondary process and set everything on automatic. The other thing that Workflows allow you to do is really embrace analytic flexibility. You're no longer really capped to where your data is stored or what it looks like. You don't have to worry about if your data is in a SQL server. You don't have to worry about if your data is in a Teradata. You don't have to worry if your data is in the cloud in AWS and having to write custom scripts and codes to extract that data out before you can perform analytics on it. Workflows actually break down those barriers and allow you to connect to the data at those sources and bring them together visually and intuitively without turning to another department to do it for yourself. With that, I'm going to turn this on over to Hassan, my colleague here, who's going to show you how a workflow process can help augment what you are currently doing in Excel. Thank you, Lisa. That was mind-boggling. Thank you. I'm going to go ahead and share my screen. I'm going to show you today how we would build one of these workflows that Lisa was mentioning to hopefully help save time and then get back some of that $60 billion that we are collectively wasting. What you're looking at is... This is Ulthwick's designer. Here on top, I have all the tools that I can use to visually build out a workflow and document a process that's going to describe how I'm going to prep, clean up, and blend my data. These tools are color-coded to make them easier to find, but they are also organized by functional areas. I have tools that allow me to input and output data in my workflow. I have tools that allow me to prepare my data, do some data cleansing, data filtering. I can also blend my different data sets together. I even have tools to do advanced things like predictive analytics, et cetera. Most of what I'm going to show you today is going to exist under this favorite tab. In order to build my workflow, I'll just simply drag and drop these tools onto this open space here that we call the canvas. As I bring the tools to the canvas, the results of what I'm doing and how my data is being transformed will show up down here in this results window. This will allow me to see how each one of my tools is transforming my data and what my data looks like both before and after that tool was added to the workflow. Give me really transparency and data lineage. Over here on the left is the configuration window. As I bring each one of the tools, you will see that the configuration window changes allowing me to configure that particular tool. For our workflow today that we want to build together, let's say I have customers all over the country and then I have different regions and then I have my salespeople or the different regions send me their data and region-specific files. I want to bring those files in together, make them into one dataset, clean that dataset, and maybe do some analysis on it. I noticed that I'm contributing to that money loss, so that's $60 billion. By doing this manually month, week over week, every time something new data comes in, I have to do the same repeatable task. I would like to automate this a little bit. I'm going to go ahead and build a workflow to help me with that. In order to bring in some data, the first thing you would do is bring in this input data tool onto the canvas. Notice the configuration window changed allowing me to configure this input data tool. We can bring in data, as Lisa mentioned, from many different types and really just making it easier for you. You could have your data in a database connection and we have a wide variety of databases. As long as you have an ODPC and OLA DB driver and then you have access to that database, we will allow you to bring in that data. Let's say I have my data in a file. Let me go ahead and navigate over to where my data is. Let's say these are the different regions and that's how my data comes in. If I go over here to write to show the types. Of course, not everybody follows the same rules. Some people send me their data as a comma-separated CSV file. Some people send me their data as an Excel, which is fine. Altrix will help me with that as well. Let's say I want to bring in this first comma-delimited CSV file, representing region one. I have about eight of those and then I have an Excel sheet that has the same type of data. What I'm going to do, I selected R1 and I'm going to go ahead and click Open. Altrix was able to bring in that data. It shows me a sample down here in the configuration window, just letting me know that it was able to bring in some of that data. But I want to bring in more than that. I don't want to have to create another input tool for each one of those files. I'm going to go over here to my region file. I'm going to replace the one. Instead of R1, I'm going to use a wildcard and I'm going to say bring in R star. Now if I click Run, Altrix is a runtime environment. So when I click Run, it will actually go ahead and process my workflow. Notice now I have over 2,356 records that were brought in. If I look at the messages, it's actually going to show me all the files. I was able to bring in R1 through R8 in one input tool. So now I'm going to bring in that one additional file that is actually a different type as an Excel sheet. I'll go ahead and drag and drop another input tool onto the canvas. And this time I'm going to navigate over and grab that Excel worksheet. R10. I'll select the sheet name. And now you can see that I have my data down here. It looks a little bit different again. Some people do things differently and I have their different system. So it looks like here that my data is actually kind of not as clean as the com delimited file. First of all, I noticed that my force row actually has data in it. So I'm going to select this, indicating that force row contains data. But now I need to clean this up a little bit. It looks like my column headers are not where I want them to be. So I'm going to actually have UltraX skip the first two rows, trying to get me to the data that I really want. In order to do that, I'm going to use a sample tool. And I'm going to tell UltraX to actually go ahead and skip the first two rows of data. And then we can examine our output and see if that's the desired result or not. And that's what's really beautiful about having a visual way to build your workflow. You're actually kind of going about it, doing it, and seeing is this is what I wanted. So notice what happened. Now my data, my customer, which looks like column headings are actually in the first row of data. Which is exactly what I wanted, because now I have a tool that's going to help me clean that up even further. I know I need to now rename these columns with this force row of data. But let's say I'm new to UltraX. I'm not really sure how to do it, and I'm just learning. I'm going to go up here in this global search window, and I'm going to just type the word rename. Notice UltraX searches any tools within the platform that will allow you to, that will help you with the term that you're searching for. But also searches online help documentation and UltraX community made up of customers, UltraX associates, and employees, and we're always pushing out content to help our customers. But let's say you find a tool that you like. This dynamic rename is the one I'm looking for. You can open up an example, and actually see how that tool is being used. And what's really, really amazing is I can actually, if I find a tool that I like, I can actually drag and drop that tool right onto the canvas. Not even having to worry about where that tool is. And I think that's amazing because what that does to me, it keeps me in the process. It keeps me thinking about the task at hand. I don't get distracted by going out and doing a Google search or anything. I find a tool, I like it, it looks promising. I'm just going to go ahead and use it in my workflow. Dynamic rename tool allows me to rename my data. There's several options. I'm going to select the one that says, name from the first row of data. And then I'm going to click Run to have it process my data and examine my results. You can see that now my column headings actually came from that first row of data that was buried somewhere in my data set. So at this point I have these two data sets here. I'm going to bring them together. They represent, you know, this is regions 1 through 8. This is regions 10. 1 through 9 and 10, I guess. But these are all my regions. And now I'm ready to stack these results together in one data set. I'm going to go ahead and bring the Union tool. The Union tool is going to allow me to do exactly that, stack my data together in one data set. Over here on the left I'm going to select the manual. Even if all my columns are the same name, you can have the auto-config. That's the best practice. I like to do the manual config and then just look at my data and see if it's lined up correctly. It looks like most of my columns are lined up. First name, last name are called differently. Again, not everybody is following the same rules and we all are humans. So I'm going to take these two columns here, first name, last name. I'm going to simply just rearrange my columns to have them lined up. So I'm not copying stuff in Excel. I'm not, you know, control C, control V and making mistakes on where I insert the column. I simply aligned all my columns to make sure they're aligned correctly. Now if I click run, I can now, if I click on output coming out of the Union, I have 2,678 records representing all my regions. You can see the region column over here. So now at this point I have, this looks like a decent data set. I'm going to take a further look at it. I'm going to, before I build, you know, a workflow and invest my time, I'm going to save some time by doing some data quality, data profiling on this data before I go ahead and use it. So I'm going to drag this Browstool over here and then hit run again. The Browstool is going to allow me to get really quick and very valuable data profiling information about the data that I just put in. Notice my columns are now color coded. Really indicating the health of this data. This is demo data, so it's very, very clean. But if you hover over the columns, it tells you how much, you know, if it's nulls, how many null values are there, if my data is okay or not. You can, as you select a column, you can come over here to the left and you see how many unique values are there, how many nulls, how many blanks. So this way you're kind of getting this health check on your data before you invest your valuable time building a workflow and then realizing you have bad data. But let's say that happens and it happens sometimes. And then you realize there's some things here that I need to pay attention to. As I mentioned, this is a very clean data set. We offer a data cleansing tool. I'm just going to go ahead and bring a data cleansing tool on here. The data cleansing tool allows you to do very common and often repeated data cleansing operations. Things like, you know, what do I do with nulls without replacing with blanks or zeros? You know, I have unwanted characters in my data set. Maybe if I came from Excel, I have a lot of white spaces. I have a lot of, maybe if I came from the web and I did some web scraping, I have line breaks and tabs and things like that. Maybe there are letters where they're not supposed to be. Maybe people are putting numbers in the name field and it's not supposed, I can clean all of that up. It's a really nice way I can do it all in one tool. I can even modify the case if I want to make everything lower case or uppercase. I can actually hit run now. And what's really nice about these tools that I'm adding, notice there's an incoming anchor and outgoing anchor out of each one of these tools. The incoming anchor tells you exactly what the data looked like before this tool worked on it. And the outgoing anchor, the arrowhead, shows you what happens to the tool after it was processed, after it was processed by this tool. So you can see my first name, last name, they were kind of a mixed bag of different uppercase. I guess they're all uppercase. And now they're all turned into lowercase because that's what I told it to do, my cleansing tool. So there are different options for you as well. So let's say at this point, everything came in from kind of a CSV or an Excel. And I can click on this metadata down here to show me kind of the metadata of what I have. And it looks like everything came in as a string, which is, you know, very common sometimes. And you may have to change some of them to numbers. And let's say you don't want to have to worry about this. You want Alt-Rex to do that for you. We have a tool called Autofield that really tells Alt-Rex, you know what, want to select the best field for me based on the data that is contained in that field. So it's going to select the most optimal size and the right type of the field for that column based on the data. The one exception I make is I always unselect the zip file. Just in case I have some zips that are starting with zeros, I have leading zeros, and I don't want to really touch those. So other than that, it will work very nice on the rest of my data types. Now if I click Run, and I want to see what this Autofield tool did, I can click on the arrow tail coming in, click on the metadata, and we saw everything was coming in as a string. Now if I click on the arrow head coming out, I can see I now have a mixed bag of different type. There's some number. There's some strings. And Alt-Rex selected the best size based on the data that was coming in. And this is dynamic. So if I have new data that's coming in, and this is part of my workflow, and Alt-Rex needs to adjust this, they're actually doing this every time you're on the workflow. So if you have new data that's coming in, this step will be automated for you. At this point, these are all my kind of customer-related information coming in from all my different regions. I'm going to go ahead and bring in some transaction data at this point. And I want to show you how we do that connecting to a database rather than a flat file. So I'm going to go back to my favorite step. I'm going to bring in another input data tool. At least I mentioned there's really no artificial limits of how many of these input data tools you can have or how many rows or things like that. This time I'm going to go ahead and connect to a SQL server. We can connect to many different types of databases. As I mentioned, as long as you have the driver and then you can configure it, chances are we'll be able to create a connection for you. You can create a connection to that database. I have a SQL server running here on my demo machine. I'm going to go ahead and connect to my SQL server and I'm going to bring in a transactions file. I can bring in this file as is, or a transactions table in this case. I can select some of the columns. If I have a stored procedure that I normally use to bring in my data, I can leverage that in here. I don't have to throw it away. If I want to write my own SQL code, I can do that in here as well. But for the most part, if you want to keep this as a code-free environment, you can do that. At this point, I'm going to bring all my transactions. I'm going to click OK and Update Sample to see kind of what's in here. I have things about customer ID, order ID, store number. I have some order dates, some shipping dates, shipping methods, et cetera. Altrux is what I would call dates and time-aware. It knows what time it is. You can do comparisons of time and dates and things like that. So let's say I want to try to figure out how long does it take to maybe ship each one of these orders? We notice our business is down a little bit, and I want to develop some intelligence on the causes of that. Maybe it's taking too long to ship the orders, and that's hurting our sales. So I want to calculate how long it takes to ship an order. In order to add any formulas or business logic to your workflow, you would use this formula tool. The formula tool allows you to update any existing columns or you can add a new column if you would like. In this case, I'm going to add a new column and I'm going to call it something like shipping time because I need to understand how long is it taking to actually ship my product. So I'm going to compare kind of the time that a customer ordered a product and when it was actually shipped to them. So I'm going to use a function. If you click on this FX button here, there's a lot of functions that are available to you, including date time functions. And then we have something called date time def, somewhere over here. I can select it, or I can actually, date time def should be over here. Okay, I forgot my alphabet there. And I can compare. So date one is going to be my shipping date. So I'm just fast talk typing it. Shipping date, I selected. My second date is going to be my order date. So that's when they ordered it. And I'm going to say, you know, give me the difference in days. I'll just type days here. So this is going to calculate the duration between my shipping date and my order date. So now if I click run, I'm going to see that I have this additional column in my data set over here. If I flip over to the data, let me just make this a little bit bigger. So now I have shipping time indicating the number of days that it's taking to ship this product. So this is great. So now I want to see how, if this is affecting the customers, how is this doing? I want to kind of develop some aggregations on the customer level. So I'm going to go ahead and bring this summarized tool and connect it to my data stream. And the summarized tool allows you to aggregate on whatever fields you make sense for you. So I'm going to take the customer ID. Now I'm going to say, you know, group my customer ID because I want to understand what is happening on the customer level. But I also want to know how many times this customer shows up in my transaction file. So if I say count, I need to say count. And so that's going to count how many times this customer ID shows up. And I can rename this over here so I can say this represents the transactions. I also want to see the shipping time on the customer. So that's the new field that I created. So I'm going to take shipping time. And then I'm going to say, you know what, why don't you just average the shipping time for me? Notice I can't find the average function here because I missed a step. Notice my shipping time is the string. So because I have a visual workflow, I know what the previous step is. I know where I calculated the shipping time. So imagine doing this in Excel and figuring out, oh my God, where did this go wrong? Or what did I do? I can come back here and really know sweats. Just change this to a double. And then now I fix this error. So imagine, compare doing that in Excel and how, what a nightmare that would have been. So I'm going to go over here to an American. I'm going to say, okay, why don't you give me the average shipping time for the customer. And while you add it, I'm going to take my sales and actually I'm going to sum the sales for the customer. So simply by selecting the fields and saying how I want to aggregate them, I'm able to create a summary. Now if I run, I'm going to show you what those looks like. So now for this customer, ID 2, he or she has one transaction. Average shipping time is 2 and then total sales is $5.76. So that's great. So at this point I want to see, okay, what was the last time I had an interaction with this customer? So I want to calculate the number of days since the last order. So I want to try to find what is the most recent date. I can do that by actually go in the summary and then finding my order date. It's just one of my columns. I'm going to say, you know what, what is the max order date? What is the latest or the most current, I guess, order date that I have that this client placed an order? And now I'm going to take this max order date and I'm going to build the formula around it. Actually let me go ahead and run, show you that. So I have max order date here. Now I'm going to actually add a formula, another formula tool. And this time I'm going to calculate the number of days since I last interacted with this customer. And maybe this is meaningful for me. I'm going to develop some intelligence, maybe build a predictive model or something like that. So I'm going to use that date time. So I'm going to create a new column. It's called days since last order, something like that, or last transaction. And I'm going to use the same date time def function that I did in the first formula. This time date one is going to be now. So I'm going to say date time now. Select that function. Date two is going to be this max order date that I just got in my summary. So that represents the last time this customer placed an order. And now I'm going to get the difference in days. And then you can see in the data preview, Altec shows me kind of a calculated result based on the first row of data. It's giving me warm-up fuzzy and letting me know that this data is calculated correctly and I have no errors. Now if I click run, I'm going to see that I now have a new field called days since last order. And now I can actually do something with this, see what is going on and sort it or try to figure out what is going on with these customers. But what I have so far, I have kind of two sets. One going up here on top, representing kind of customer data. I have transaction data going here on the bottom. At this point, I want to try to join these two data sets together. This is very similar to how you would do like a field lookup, an Excel or a join in a database. Let me show you how simple this is and how easy it is to fix any problems that you may have. So in order to do a join, I'm going to go ahead and click on my join tool and just bring these two data sets together. And I have to tell Altecs how do I want to join them. I'm going to show them on customer ID. So I select customer ID from both the left and the right. Coming into the join, I have a left side and a right side allowing me that data lineage. So I can always trace my steps back and see where the data came from. Coming out of the join, I have three outputs. They're represented very well here in this VIN diagram. I have the J output, which is my inner join. I have my right unjoined. So these are records in the right input that did not match and I have my left unjoined. So this would be records in my left input that did not match. Down here I have total transparency in terms of what is coming in. I see the columns that came in from the left. I see the columns that came in from the right. I can deselect some of the columns here if I would like. Customer ID is coming in twice because it's my joining field. I can actually deselect that if I want. I can deselect other columns. I can rename some of these fields if I would like. So let's say I want to take this max order field and maybe call it something else like class order. So I can rename this. Whatever makes sense for my flow. Now I'm going to click run. I'm going to show you the output from the join tool. And really keep in mind kind of when you do a join or a VLOOKUP in Excel, the kind of information that you have here that you're going to have in Excel. Of course we have the J, which is our inner join. So this represents the combined data set. You can see my customer information. You can also see my transaction information all in one data set joined on the customer ID. But what I see additionally, I see what did not join. So on the right, nothing, I have zero here, but I look on the left. I see 406 requests did not join. What's really nice about this, I can actually take this and develop some intelligence about it. So these represent customers that do not have any transactions. Maybe I can do more outreach, more marketing to these customers. So I don't lose this output here. Whatever fell out of the join. I can grab it. I can build a whole section of the workflow to handle this. Let's say I want to be, let's say I want to automate this workflow. It's going to run on a schedule. And I want to be alerted if this happens. I want to be alerted if any records come out of the left side here. And I want to be notified. So I'm going to use something called a test tools. So I'll go up here. I find a test tool. I'm just going to show you this real quick and then I'll just take it out. I'm going to add a test tool. So the test tool allows you to test for a condition. I'm going to come up here and say, add a condition. Maybe I'll say no transactions. No transactions. And I'm going to say, alert me if the records are greater than zero. So what's going to happen now if I run my workflow and it's fine that something came out of the L, notice the workflow breaks because that condition is broken. That condition was met. So now what I can do, I can actually as far as automating my workflow, I can come here to events and say, you know what, add a new event and then send me an email when this workflow run with errors. So now I'm notified. I'm kept in the loop that something happened. I have an error. So now I can automate my workflow and be notified when something goes wrong. I don't want to do any of this. So let me just go ahead and just remove this. I just wanted to show you what it would look like if you choose to kind of automate your workflow. So now at this point, coming out of the join, I have my data set, combined data set. Let's say now I want to, now that I've cleansed this data, it's combined it, it looks nice. I'm going to go ahead and output it to maybe an Excel sheet and I want to do different tabs for each one of the regions that came in. So the way to do that, I'm going to get an output data tool and just add it to my workflow. And I'm going to actually output to a flat file and I'm going to say webinar, output or something like that. Maybe this is my customer list and I'm going to go ahead and select Excel. Let's see. I have the different types. I can select this one, I guess. And click save. I can give it a sheet name. But what I want to do, I want to make the sheet name dynamic. I'm going to go ahead and say overwrite in case I run this again. Now I want to make the sheet name dynamic. So I'm going to say take the file table name from a field and I'm going to say I changed the file or table name. And now I'm going to select a field in my database to drive the name of the sheet. And in this case it's going to be my region. So I select region here. And now if I run, you can see that I created an Excel sheet, R1, R2, R3. I can actually open one of these. Drag it over here. And you can see I was able to output the summary that I created. This is all my cleaned up data. And then I select a different option here. I can see. Maybe I selected a different option, change file, table name. Let me go back. Maybe because I have it overriding. Sorry about that. Okay. Let me give it another name. I just got my report. It's going to be Excel. Give it a sheet name. Now click run. And it's running. That's what happens when you're running something live and you are a human. Okay. There you go. So now it's creating my report to Excel with these different sheets. But let's say this is not how I finally want this. Maybe I'm using some visualization tool. And I want to output this to a TDE file or maybe something to visualize it in another visualization tool. What's really nice about this is I'm not tied down to one output. I can take the same data and I can output it to different ways. And I find this very handy with our customers because some of them say, well, I have to send an Excel sheet to somebody, but my executives, they want to see a dashboard or something like that. I can definitely output this to somewhere else, to another file type. In this case, I'm going to just go ahead and select the TDE file, extract and then give it a name. I should say T, output. And then just go ahead and run it. And you can see in my messages now, it says 2,272 records were written to this tableau.tde file. Now I can open this up in Tableau. I can build my workbook and I can visualize my data. So at this point, now that I've automated, prepping and blending the data together, cleaning it from all the different regions that are coming in, instead of spending my 26 hours a week that I use on prep and blend and cleaning up this data, I actually now have time to do analytics that will bring value to my organization. And as you notice, the more I automate, the more time I'm going to have to actually go deeper and add even more analytics. Maybe at this point, I can leverage some of the predictive tools that Ulteryx has. Maybe I can do spatial analytics or drive time analytics to find out how far my customers live from my stores or things like that. So really, very quickly, let's say I want to try to predict what is the likelihood of somebody responding to a marketing campaign. In my data, I have the suspended fields. I have been collecting responder information on my customers. The likelihood of them responding in the future. I'm just doing this at a very high level just to show you that these predictive analytics models are ready to go for you out of the box. I'm just going to quickly bring this logistic progression model, connect to my data, connect my data to the model. And I can give it a name. It's going to be something like marketing model. And I'm going to select what is it that I want to predict. So it's going to be my responder field, whether they will respond to a marketing campaign. I can select the variables that influence that field, maybe the number of transactions, maybe the average shipping time, maybe how much time they spend with me. Maybe I can add a spatial tool and then figure out the distance they have to drive to get to my store. So once I select these variables, I can actually run. And now I have a predictive, a logistic analytics model that's built out of the box. This is airing out because it's, I don't have it on overwrite. So now I can do, and you can see the more I build, the more time I have to do other things. And if you're interested in this, Altrux has many analytics kits to help you get started with advanced analytics. If you go here under the help menu and you look at Istanbul workflow, you see how you can do things like presentations. We can come and see some of these kits, starter kits for analytics and starter kits for things like predictive analytics, learn how to do different models, a forced model linear regression, logistic regression. These are all here as starter kits and all of these starter kits can be downloaded from a website that will be altrux.com and then slash kits. I really hope you've enjoyed this. I hope you've seen kind of how you can save and don't spend all this time on doing data prep and then actually move over and start doing some analytics. Shannon, I'm going to send it back to you to see if there are any questions. Christiane, would you mind quickly just handing me the ball real quick for Shannon to the Q&A? Yes, I will do that. Somebody stole the ball from me. There you go. I think you have it. We should do it. Shannon, would you mind passing me that ball while you do the Q&A? I thought I'd pass it to you. It looks like you could just send it to me. There you go. Great. We're going to get to the Q&A here. The presentation will be made available to the rest of you guys who did this presentation today. With that, let me pass this on to Shannon who will run through some of our Q&A. Just to let everybody know to answer some of the most popular questions as Lisa mentioned, I was on a follow-up email by End of Day Thursday with links to the slides, the recording. There's a lot of fantastic questions coming in. If we don't have a good chance to get to your question today, I will forward them on to Altrek to get some answers and maybe we can get some of those in the follow-up email as well. Keep them coming in. We've got just a few minutes here to get to some of the great questions. Let me ask, is this SQL engine on Excel files? And then what's the difference with Microsoft Access? Okay, well, if I understand the question correctly, you know, Altrek is really database agnostic. We can actually connect to any database. So if you're bringing stuff from Excel, from Access, from SQL, once you bring it into Altrek, it all looks the same to us after that point. So there's some limitations with the database. You have to have the driver installed and configured. We respect existing database permissions. So if you have permissions to that database or you have that data in an Excel sheet, well, I'd like to connect to it. So what's the volume of data Altrek's designer can handle before running into performance issues? The only limitation that, you know, and we have customers that use it in really great ways and bring in a lot of data up to, you know, a billion records. A colleague who sits next to me at her previous shop using Altrek, she's a former user, she was able to bring in up to a billion records right on her laptop, a company-issued laptop. So, you know, really the only limitation is your own physical machine and memory and hard drive that you have available. So don't forget we also have in database processing. So if you're concerned, if you have big data characteristics type of data, you can run this and then write in the database itself. So we have in database processing as well. All right, and from one questioner, we have a few questions here. So do the predictive functions come included with Altrek's or are those an additional cost and can you import Excel formulas into Altrek's? So regarding the predictive, if you have to download the predictive star kit to get the predictive categories, what's important about this is they're all written, they're all based on R, and then you can actually open the predictive tools and edit the R code if you wanted to do something specialized or more advanced or something that's specific to you. In terms, I think the second question was about Excel formulas. Unfortunately, you have to rewrite some of those formulas in Altrek's if you have some custom formulas that you are using in your data processing currently. All right, well, I'm afraid that is all the time that we have for what a great presentation. Lisa and Hassan, thank you so much. And thanks to our attendees for everything we do. Again, we didn't have time to get into a lot of the questions today, but we will get those over to Altrek's too. We'll help follow up with those answers and get those to you. If you want to keep the questions coming in, I will make sure and get those over appropriately. And again, just a reminder to everyone, I will send a follow-up email by end of day Thursday with links to the slides, links to the recording and additional information requested. And again, thanks to our attendees for being so engaged in everything we do. We just love the participation and I hope everyone has a great day. Again, thanks to Altrek's for sponsoring and Lisa Hassan, thank you so much. Thank you. Thanks. Thank you.