 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of Data Diversity. We would like to thank you for joining this Data Diversity webinar, putting the ops and data ops, orchestrate the flow of data across data pipelines, sponsored today by Stone Branch. Just a couple of points to get us started, due to the large number of people that attend these sessions, you will be muted during the webinar. For questions we will be collecting them via the Q&A or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag dataversity. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just a note, Zoom defaults the chat to send to just the panelists, but you may absolutely change it to network with everyone. To find the Q&A or the chat panels, you may click on those icons found in the bottom middle of your screen. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and any additional information requested throughout the webinar. Now let me introduce to our speakers for today, Scott Davis and Robbie Marugison. Scott is the global vice president at Stone Branch with over 20 years of experience in the software technology industry. Scott's tour of duty has included senior leadership roles at such well-known global technology brands as SAP and Calero. At Stone Branch, he leads all corporate growth initiatives across the globe. His expertise extends to international industry market research and analysis and technical marketing. Robbie is a seasoned IT industry professional versus applications and infrastructure automation, RPA, ITPA, cloud automation, and with a solid background in DevOps and programming. At Stone Branch, a media Robbie is a senior solutions engineer responsible for leading technical conversations and developing the right solution architecture to meet enterprise customer needs in such areas as cloud automation, data pipeline automation and orchestration and integration strategy. And with that, I will give the floor to our speakers to Scott and Robbie to get today's webinar started. Hello and welcome. Hi, everybody. Thanks, Shannon. And I can't express how excited I am to share this story with this audience. So we've got a whole bunch of packed in here. I'll go ahead and just get started. What we're going to talk about today is really kind of a mix between DevOps or data ops as an orchestration layer. We're going to talk about data pipelines. More importantly, we're going to talk about how to orchestrate data pipelines. And then we're going to do a cool demo for you that illustrates how to do it from a practical sense. And of course, we'll leave the time open at the end for some Q&A. I hear this is a chatty group and I encourage everybody to ask as many questions and interact as much as possible. So let's start off with something simple. This should look very familiar to everybody on this call today. It's your simple view of a data pipeline. At the beginning of the data pipeline, typically you have some sort of data source. You're doing your collection. You're moving that data via data integration or ingestion. Typically, this is your ETL phase or your streaming data or your ELT or any number of different acronyms for it. But ultimately, you're moving that data to data storage. And this is where you have your data warehouses and your data lakes. Some of your master data management happens here or maybe even as a layer above. But we put it in here and then you're passing that information to your analysis phase. Typically, that's your machine learning, your predictive analytics. You get your data scientist in there playing with the data and then maybe storing it back into a data warehouse that is then delivered to its audience with your intended users. It could be anybody, but you're using dashboards, reports, emails, text, whatever. But this is your presentation layer. So I show this up front because I'm going to show an image similar to this throughout the presentation just to help everybody stay on the same page. So this is a lot more complex than what that last slide shows, of course. And the complexity lies in the types of tools that fall within each of these layers. And the trick is that no single company or even department has or uses the exact same tool. So you could have any number of data sources, you could have any number of data integration or ingestion tools, so on and so forth throughout this whole process. And even this is simplified because it's not as linear as this by any stretch of the imagination, but I always like to show this slide because it is such a complex thing to tackle, which is why everybody's out there on this call today. So when I think about orchestration of this data pipeline, there's some common ways that people connect the tools today. Number one, they use point-to-point integrations. A lot of times they'll use custom scripts, but at the end of the day, a lot of times they just don't connect them. And this came from actually a discussion I had with Gartner because the initial way that I was illustrating this was this, point-to-point integration custom scripts, and they're like, hey, Scott, there's really one more big category that don't connect at all. And so unfortunately, the reason that is because there's so many different tools and they can't integrate them properly. So when you're thinking about orchestration, what you're really trying to get to is a centralized view of what's going on in each of the tools, whether they're data tools or data sources or anything along the pipeline I've shown within a centralized view that can kind of tell you what's going on from an automation standpoint. You want to be able to root cause issues and you get that because you have the centralized view. You're able to go in and if something breaks, it's not like looking for a stack of needle and a stack of needles. It's a report or a dashboard that says this thing didn't fire. You need to go in and take a look at it, cuts down the time significantly. And a lot of times what you're trying to get to when you're starting to scale these data pipelines is proactive support. So as a data architect or a data engineer or anybody that's involved in the operations of this, these dashboards, these reports that you get help you understand that if something breaks, you're getting to it and fixing it before worst case scenario, your CEO calls and says, hey, my report's not updating, what's the deal? So it cuts down on this tyranny of the moment sort of scenario. But ultimately, when you're trying to get to a proper orchestration solution, it's really to achieve scale. And we'll talk more about that as we get through this, but achieving scale, anybody can go and take one of the approaches I'm about to show and build some level of data pipeline, but it winds up not being sustainable or scalable. So what are our pain points? What keeps us from being able to orchestrate versus just automate? And there's a subtle difference there. You can automate any single thing, but orchestration requires automating across a lot of different tools in the pipeline. So the pain points, these are the types of things people typically use. They'll use inbuilt schedulers. And what I mean by inbuilt schedulers is just about every data tool or application up there has some sort of built in job scheduler. These job schedulers aren't unable to connect to other tools typically. So they can schedule what's inside their tool, but they can't schedule what's to the right or left of the data pipeline. A good example of this is an Informatica, right? It's an excellent tool for PTL and for transformation and for everything that it does, but it has a built-in job scheduler that only schedules things inside of Informatica. And those of you that work with Informatica are probably very familiar with this. The next option people typically use are open source schedulers. And the most common one in the data world is Airflow, right? And data people love Airflow. Airflow and Airflow is good. It's a very good schedule. The problem is it's usually batch or time-based. There's some work rounds. Of course, I've read about them and heard from some customers that they use some work rounds to get some event trigger like things using something called DAG that does something. It can be done, but it's not efficient and it's not scalable. But oftentimes in the data world, people start with these open source schedulers, specifically Airflow. They're also doing a lot of their work in cloud schedulers. So AWS, Azure, Google Cloud, you name the cloud service provider, they have pretty decent schedulers. They have batch schedulers. They even have event-based schedulers. AWS is Lambda is a great event scheduler that does real-time triggers. The problem that people run into very quickly with these schedulers is that they're only focused on their own ecosystem. So you have this problem where you really kind of have a lock-in. You're forced to use their ecosystem of services versus trying to do what you really need to do, which is work in a multi-cloud. And to go a step further, typically a hybrid IT environment, which includes data sources and applications that are on-prem, as well as data sources and applications and data tools that are in the cloud. And then finally, we see this most often just about every company has some sort of, I call it a legacy on-premises or mainframe focused scheduler. You'll see the term workload automation out there. And there's a lot of big companies, probably three of them that own most of the market. The problem is they don't work well in the cloud and they're not going to do a very good job with hybrid IT automation. They will be able to orchestrate or even automate multiple tools, but they're all going to be on-premises. So you're kind of stuck between these options in most cases. And I would imagine a lot of people are kind of shaking their heads right now like, yeah, I use that and that and that. It's never just one of these tools. You're probably using all of them or three of them or two, some mix of these tools and it's causing your pain, right? So I want to take a moment to talk about orchestration a little bit more. So Gartner released this report on Thursday of last week. It's called Gartner Data and Analytics Essentials Data Ops. It's the second version of this report. And it's not a report as much as it is a presentation. So Robert Thanaraj, who I've had the pleasure of speaking with a whole lot, released this report. It's an update to last year's report. And I've really sort of simplified the view here a whole lot because it's in its final form filled with a whole bunch of tools that is their vendor landscape. Now I didn't want to steal their slide and put it here, but I wanted to give you guys a sense for what's available in this report and kind of promoted for Robert and for Gartner because it really is a good report about data ops. So if you're looking for, you know, what is data ops? How does it apply to my business? Go check out this report. There's a link here in this in this deck when you get it. And I believe that Nadia on my team or one of my team members is going to be posting it in the chat. So if you have a Gartner subscription, you can get it. If you don't, I'm afraid you can't. But you can always see if there's somebody in your business that does have a Gartner subscription getting ahold of it. But it's a good deck. And the thing I really wanted to point out about it is how he has this setup. So this is directionally the by category way that he's looking at data ops vendors. And at the bottom, you kind of have these cloud portfolio service providers. These are the people that are, you know, they have the closest thing to a soup to nuts data ops platform that you can use. And there is no soup to nuts data ops platforms. You usually have some mix of tools that include a lot of the ones that are in these specialist categories serve ware off to the right. These are vendors that typically do a lot for data ops, but it's more from a service side, right? So they have a service that they offer a consulting service and then they have some software that they use to help execute it. All in the middle, you guys are familiar with these terms, these companies, you'd be, you wouldn't be surprised by the vendors that you'd see show up in this list. But at the top, you have the orchestrators and he had a layer of orchestration last year, but interestingly, Robert flipped it to the top. And I think that's a better place for it because the orchestration layer is what tags into each of the tools that are mentioned in the specialist piece and help you orchestrate across that data pipeline, which would be summer to what's what I showed in my opening slides. Now orchestrators, obviously stone branches in the right, but to give you a sense of some others, you have BMC, you have airflow in there as an orchestrator and a few others that that do similar things what we do. Now we have a more unique take on it than them. Everybody has their own spin on it, but there is a legit category that is out there that you may or may not have heard of that I'm just drawing attention to. So go check out the report. It's a good one and you can see all the vendors that you can take a look at and look and explore. So let's dive into data pipeline orchestration. You remember my simple data pipeline above? I'm going to keep using this for the next few builds and we'll just use it as a way to keep us focused on this in more of a simple way. So number one, how do you accomplish this sort of concept of real-time automation, which by the way needs to include some managed file transfer or file transfer either streaming between clouds or between multi-cloud and a hybrid cloud environment or even a hybrid IT environment where you're trying to get to the on-prem world. And so what do you need to accomplish that? Well, first you need to be able to integrate to the tools that are in use within each of these categories. So you're going to essentially schedule and orchestrate all of the automated processes within those tools along your data pipeline. And the way that you're typically going to get in there are with agents, which is sort of an older way of doing it if APIs don't exist. So an agent would be a little piece of code that you drop on the tool. A lot of times you see this in main frames or distributed servers. But outside of legacy tech, typically you see APIs in use and they're pretty standard across everything on the cloud and even some of the on-prem stuff these days. So once you're connected, once you're building out this automation that is connecting the dots and the workflow across your data pipeline, you get to this level of achievement. So number one, we talked about some of this earlier, but you're getting observability. So you get all the log data, you have governance, you have security. It gives you a whole level of security that you probably don't have today because you're using open source or individual schedulers that you can't look at it all in one place. Number two, and this kind of goes back to the title, putting the data ops and data ops, you get this lifecycle management approach with some of the vendors out there. Obviously, we're one of them because I wouldn't be here talking about it if we weren't, but in this scenario, if you boil it all down, and I don't want to overly simplify it, but data ops is in a lot of ways just dev ops approaches applied to data. Now it's more convoluted and complex to that, but I've participated in a ton of discussions on it and heard analysts talk and if you just boil it all down, there's a lot of similarities. So what do you get once you apply data ops lifecycle management? Well, you get to develop and you get to have a whole bunch of people working on it, whether they're developers, cloud architects, cloud engineers, cloud architects, IT ops, whoever needs to be involved in the creation of this, then you can test it in another environment, then you can push it to production and really you can add any number of test dev, test 2, prod, whatever those environments are, you can build this whole lifecycle and you can simulate it. You can make sure that it works before you put it out there into the wild. The next thing you get when you are doing proper data pipeline orchestrations, you get that centralized control invisibility. And when I talk about that, it's not just from a data standpoint. A lot of times the data teams and a lot of the people on this call, you guys want to build and you want to give it to somebody else. You don't want to sit there and manage it. That's like a big pain in the bar. You want to go off and build the next thing. Well, what you know and what your ops team knows are two different things. And so your ops team wants to be able to get more visibility at a more aggregate level across not just your pipeline, but all of the pipelines in your company. And so a tool that is a real orchestration platform has the ability to drill into individual flows, pipelines or look at all pipelines together or however you want to look at it. The other thing you get is this ability to create workflows in a very visual fashion. And what Ravi is going to show you in a little bit is what one of those workflows might look like, so this is drag and drop capability. This is low code, no code if you want it or it could be a straight up infrastructure or jobs as code where you're right out of Visual Studio or whatever tool you want to use as a developer and pass it up there. So I've seen data people that depending upon where they started off in their careers want one or the other or both. They want to be able to write it in a code way or they want to do the drag and drop. Either way, you have both been a proper tool. And then of course, as I mentioned earlier, you need to be able to root cause the issues as they're happening, not way after the fact that the worst case scenario, the thing I always hear is I just don't want to get yelled at again when something breaks. And in this sense, not only can you simulate it and make sure it's going to work before you put it out there, but when something happens, you get a media alert, you have like a command center sort of dashboard text alerts you can set off. You can have alerts that shoot off to your help desk solution of choice. I think we'll show service now as an example. You can have alert sent teams or Slack or whatever you want to do it. But you can have the whole world blow up around you if something goes down. So you're on it right away. Now, none of this is possible if you can't integrate. And I mentioned that at the top, but here's just a few solutions that you'd want to be able to integrate with. But when you kind of look at the whole thing, I like to think about it more like meta orchestration. And this is a term that isn't widely adopted or used across the world, but I've seen it a few places and I like it a lot. So when I think about meta orchestration, what you're really doing is sometimes you're just orchestrating the automators. You're reaching into something starting on the left, like AWS batch or AWS Lambda or even Apache airflow. And you're using the orchestration tool to reach into those tools and automate them. If that's how you want to do it. And we have a lot of customers use airflow as a direct example that, you know, they want to move over to our tool and have it automate everything eventually. But you know, to get started quick, they don't want to tear down what they've built an airflow. So they just use our integration with airflow and they use our tool to centrally control what happens in airflow, but then that way they get all the monitoring. Oh, and by the way, if you're using airflow or AWS batch, you can suddenly make what is time-based automation that is native to those tools event-based. So you can use our tool to trigger the real-time automation within time-based tools, which is a big benefit. But as you go to the right, you can see just really disguise the limit in terms of the types of solutions you can tap into. So you have a lot of cloud tools, you have a lot of infrastructure tools and, you know, there's a whole set of webcasts that we can do on each one of these categories, but I just want to throw it out there so that you guys can get a sense of the types of things that you can meta-orchestrate or in other words, tap into to run their automation that's native within their tool. So of course, you can use our tool direct and go in and automate it on your own. All right, so now we know what we can orchestrate. We know sort of how to orchestrate. Let's talk about some of the things that you may get with the orchestration tool. So one of the things that I've heard a lot with some of the older approaches that people are sort of chained to right now is that the automation is centralized in a way where just like a select few are able to use the tool. So in the data world, it might be just a data team that can use Airflow. In the cloud world, it might just be the few cloud people that have access to it. And the legacy mainframe focused workload automation tools, there's usually a group, a central group that runs that. They run a bunch of background processes, but as data people, you don't get to go in there and play with it. You have to give them orders or ask them to help you and you're begging and pleading to get your stuff worked on. In this orchestration platform that we're talking about here, you get access to the whole I mean, you're able to access it either through a pipeline. I'm sorry, a portal, a central portal, where you get to create and do everything that that you're allowed that your admin allows you to do, which usually limits you to the integrations and the pipelines that you're responsible for. But you can also access that the things I talked about before service. Now you can go in and trigger workflows or see if the workflow ran, you can get alerts, you got teams, you got Slack, you got JIRA, you got a whole bunch of different tools that really you're in every day that now you get to go in and have a lot more control. This is especially important for collaboration. So if you're a data person that is used to working just in your data tools and you know that you need to work with ops and you need to work with your cloud ops team, IT ops team, your developers, whatever, now you can all work in the same tool on the same builds and it becomes a big thing. So it becomes a centralized collaboration platform. But ultimately, if you're an IT ops person on this call, you can basically provide automation as a service and the data teams are able to go in and do what they need to do. Your ops teams are able to do what they need to do. The developers are going to do what they need to do, but you still maintain that operational visibility to make sure everything's running and that your SLAs and KPIs are met. Now, I talked a lot about data ops earlier, so I want to put a little bit more visual representation around that. So in the data ops scenario, again, a lot like DevOps, what you're doing is you have in our tool or maybe some of the other tools out there depending on which tool you're looking at, you have an environment where you're doing the development of the orchestration, you have an environment where you're doing the test, you have an environment where you're doing production, right? And this all spans across your CI CD pipeline. Now, with our platform particularly, there's a few different ways you can do this. In this case, I have illustrated in a way where you basically have two development controller and a production controller. So these are identical systems and there are two ways that you can actually promote something between these controllers or between the different lifecycle stages. So the first is you can do it with a button that resides inside of our tool that just says promote to the next stage. And so it's very simple. It's done via built-in web GUI. It's all web-based, the whole platform. But of course, I mentioned earlier, some people love working in code. So choose your favorite code development platform, Visual Studio, whatever, and you can promote it via GitHub or any other repository, third-party repository that you use. And so you're able to adopt this DevOps-like process or as data ops, and you can do the lifecycle promotion through your CI CD process just like you would as a developer working on a piece of code, which is super cool. So with that said, I'm going to pass it over to Ravi. Ravi's awesome. He built this pipeline. Ravi doesn't get a chance to speak in front of a lot of people all the time like I do. So let's be very patient as he jumps in and he's going to do awesome and have any questions for us. We'll flip back around and answer them on the Q&A side. So Ravi, here you go. Why don't you take us from here? Thank you very much, Scott. So hello, everyone. This is Ravi. So today, we're going to show you how the data pipeline can be orchestrated using the Universal Automation Center. So as you might have seen from the slides from Scott, there are quite a lot of tools available in the market and customers often tend to use the best of preach across different functionalities in your data pipeline across data pipeline. So today for our demo, we created a workflow relatively with a bunch of tools that's being used across. And we show you how it looks like in Universal Automation Center. So in the demo, we have a couple of sources. So you could see some of the sources could be on cloud and some of them on on-prem or even apps that generate the data for your data pipeline, right? So we would be using cloud on-prem and apps will. And for data ingestion and transformation, we're going to use Informatica. And for your data storage perspective, we're going to use Azure Blob, Snowflake and AWS S3. So the delivery in the demo would be using Tableau. But essentially, it could be any other business intelligent tools like Power BI, ClickSense, or any other tools your data consumer is seeing the value out of the data to get some insights from the data, right? So with this, I will share my screen. So you should be able to see Tableau dashboard, so which we built it for this demo. And in this, so before I getting into this Tableau dashboard to give some to set some context. So as you might know, there are various people getting involved in the creation maintenance or fixing or monitoring the data pipeline. And there are also people who gets information out of the data pipeline. And they might not even aware there is a whole data pipeline is running in the background, right? So to talk about a couple of viewpoints on such players for your data pipeline, we created this dashboard in Tableau. And so this is something your data consumer in our case, it could be the business user who's using this dashboard and sees data out of it. And he sees there is some data missing, right? So in reality, this is a sales data, which comes across nationwide from different parts of the country, divided region-wise, east, south, and west, but we're missing with the central region data. So in this case, the business user can't do much. He's going to wait for the, he's going to wait until the central region data is going to arrive. So how meantime, he's going to wait to talk about the other view. Let me switch my screen. So here in this case, you see Universal Automation Center, which is our orchestrator. And we're going to use this to orchestrate our data pipeline. So to give the other point of view, it could be your data architects or the data engineers or it could be anyone who's from the data role, they're working on the data pipeline. And they would be at least responsible in one or the other way for maintaining it, fixing it. And when it gets into problems, they take the responsibility of this data pipeline workflow. So here in this case, this is the dashboard and specific to the data pipeline orchestration. And they see, they get to see only the jobs and the task, whichever is running across your data pipeline search. But if you look into the Universal Automation Center in a default dashboard, it supports a variety of use cases. And there could be many other jobs other than data ops, the related jobs could be running in the system. So but we have a more specific dashboard for data ops. But in other perspective, here you can see I've logged in as an administrator. So I get to see all around the system what I have, but you could also have a dedicated login or roles and responsibilities created for this data ops user. And they get to see only the items related to data ops. So for this, to give you a view on that, I've logged in with another user, say, called data ops user, he has logged in. And then he gets to see only the data ops dashboard, but he could see maybe there is a default dashboard, but he don't have access to any other dashboard. So but he's free to work on his data ops elements, he could he could create it, he could run it, and then he could build a pipeline using this login credentials, and he could have limited access only to the set of current set of elements or the automation elements with respect to data ops, right? So switching back to my original screen to give a bit of overview on this dashboard. So you could see the first box here, it gives you different set of jobs that we're going to use in our data pipeline. So as we have seen previously in the slide, so there are different sources and different tools for data ingestion and transformation and for delivery using Tableau, the same set of items you're going to see in this box, right? So there is a Tableau job, there is a Snowflake job, there is some Power BI data breaks, I should data factory, related jobs. So and more or less these jobs we're going to use when I go into the workflow you would be seeing how these jobs are getting executed using the universal automation center, right? So below you could see there are some more boxes, which deals with the SLA violation, for example, say this box essentially deals with the forecasted SLA violation. So when you run your data pipeline, so if any SLA breach is forecasted, meaning your data is going to be delivered late or some of your processing with one of your tool maybe takes a longer time than expected, then we have the capability to predict there is something going wrong and then your data might be delivered late. So in this case, there is a forecast and then the users in the downstream could be, you know, case care with the information, there's a potential delay. And also there's another box which deals with SLA violation, which actually says what are the actual violations happening in the system. So there could be late job is expected to start, it's not already started something like that, right? So and to the right, we have the data ops alerts and notifications. So this essentially speaks about what are the different kinds of notification we send across during your data pipeline execution. So during your pipeline execution, say for example, if you need to send an SLA alert, if you need to send an email, so what are the actions so far we have taken. So for this breach, we have already sent some notification and we have executed something. So throughout the execution, there might be a few other notification which would be going across different teams who are responsible for it, right? So we can see as a whole view what is happening across your data ops workflow execution, you could understand from a generic view in this dashboard. Now let's get into the workflow execution. So to get into the workflow execution, let me show the workflow. So before we get into this workflow here, there are a couple of views or a couple of perspective you could visualize from here. So one could be your data flow and another one could be your control flow. So the data flow, what I mean is moving the data from source all the way to your delivery, right? So this is your data flow and the control flow is essentially not the data flow. So it speaks about, it tells how you orchestrate the data between the tools that you are using to transform and manipulate the data and then handing it over to the delivery. So essentially, it's all about the orchestration the control flow speaks. So when we move across the workflow, we can see the actual differences, right? So there are a variety of ways we can launch the data pipeline workflow. So in this case, we are waiting or monitoring for an event and this could be essentially hooked up to your, this could be essentially hooked up to a web hook, meaning a third party or third party rather a business application when it finished with all its process. And then to give a go on the data pipeline, they could send, they could call a web hook. And then from that web hook, we could trigger this workflow in real time. So that's one possibility. Again, if you're using Kafka kind of a tool, so Kafka could publish an event and then we could pick up the event and further launch this workflow in real time. But in our demo scenario, right now we are waiting for we are monitoring an AWS SQS messaging queue. So we are looking for a message in a message or an event. That's the message in the AWS SQS queue. Once that message has been dropped in and we are going to kick start with the data extraction and further process in the workflow. To simulate this process, so I'm going to drop a message to AWS SQS messaging queue using postman. So let me open up my postman and here is my post message to the SQS queue. Let me send it as soon as I send this message to postman. The job which is doing the monitoring on SQS queue would go to success and further we would be beginning with our data extraction process. So this indicates your central region data has we got the approval to go ahead with the extraction of the central region data. So and maybe it's the end of business or end of the day. So however and whatever process been set up across your landscape. So that indicates a goal for the data extraction, right? So immediately once we get that even then we get we trigger our SAP data extraction process and this extracts the data from the SAP and then keeps it in the application server. In the same way we have a Windows job. This extracts the data from the Microsoft SQL server and keeps it in the application server. And now we need to move this data to a central repository further. So in order to move this data further we're doing some pre checks. So say for example if the central repository has sufficient space or if the directory is clear so that we can ship the freshly extracted data moved into the central repository. So this is actually a space availability check we kept in place just a prerequisite, right? So as soon as we release this this is going to check the space in the target directory. If all the space is sufficient space available then it goes to success. And then we do some cleanup activity in the central repository before we transfer the freshly transferred data. And post that we have a file transfer tasks. So to showcase a stone branch capability of file transfer we have inbuilt file transfer tasks which is capable of supporting different protocols including our own proprietary protocol for file transfer, right? So we have a file transfer protocol named universal data mover which is a stone branch proprietary protocol and which is helpful to transfer data across souls to destination using the agents and it's fast and secure. And to showcase another other traditional file transfer protocols using SFTP, FTPS and a traditional FTP kind of a protocol in this case we use SFTP to transfer the SQL server extract data, right? So for the SAP server data extract goes on where the file transfer mechanism called UDM which is from branch supported and other one goes via SFTP. So now we came to the phase where we need some approval. So to add a business layer on top of this data pipeline orchestration, so beyond this to continue with this workflow someone from the business layer need to give an approval so that it could be the workflow could be continued further. There could be a variety of ways an approval could be raised. In this case we use the manual task and we send the approval notification across different channels say your traditional email service, email approval or it could be your modern Slack messaging platform and Teams platform which could deal interactive messages, you know, we could drop an approval message there and then the user or the business user on a click of a button this job and approval is done from their messaging platform. In this case we are transferring the approval message to Slack and Slack would post an approval notification. In this case the business user gets to see what for this approval notification has come for and you know it says what are the list of files has been dropped into the central repository. Is it good to go? Whether he can approve this process further? So it's going to be compute heavy process going further, right? So he's going to click on the approve button. As soon as he clicks on the approve button the task over there it says who has approved from compliance perspective in Slack and then if you go further you could see the job here over here goes to success. So the approval is done. Now what are we going to do in our pipeline is to, you know, upload the data from the central repository to the cloud storages. So SAP data what we extracted gets into the Azure cloud Azure blob and the SQL server data extract what we have extracted it gets into the AWS S3 bucket, right? So now the data both of this source data is available in the cloud platform. Now we can trigger the Informatica process. So Informatica we are not trying to replace anything or we are not pushing the data into Informatica server in this case, right? So Informatica has its own integrations we has its own job built within the Informatica platform. It knows how to fetch the data from Azure blob and AWS S3 so and it also has its own workflow engine so with subsequent steps what it has to do. In this case we're going to Informatica once we say that we finish with this process Informatica knows it's going to fetch the data from Azure blob and then AWS S3 and loads it and transforms the data. So once it finish with its transformation so it's going to further kickstart the Snowflake process which is our cloud data warehousing, right? So Snowflake also has its own integrations so Snowflake can directly connect or talk to AWS buckets. So in this case Snowflake we're going to load Snowflake tables directly from the transformed data which Informatica has uploaded. We're going to fetch it from the cloud Azure blob for SAP and in the same way we're going to fetch the AWS S3 data which the transformed data in the other Snowflake job. So if you look at the integration here you might find the difference between the data flow and the control flow. So Informatica knows where it has to pick the data and how it transforms and where it puts the data back. So it's a point-to-point integration within Informatica so the job is there within Informatica it knows what it takes and what it does. So in the same way with Snowflake so Snowflake knows it knows to fetch the data from here and then it needs to load the data into these tables. So in this case we're just we are completely orchestrating the data we're not dictating the data flow, right? So that's the difference. So while we speak we saw the Tableau job has gone to success, right? So this means the central data might have been published and the business user would get notifications. So we could send a real-time email as soon as we finish with the Tableau job saying that the job is done and the central region data might be available. Let's refresh the Tableau dashboard and once we refresh you could see the central region data has come in already and as you could see across the other dashboards also we see the central region data. So now the business user gets all of his sales data from nationwide and he's happy with his goal but switching back to the workflow you could see there is also another part of the workflow. So it could be that the same workflow could be used by two different teams in this case our data analyst team or the data science team wants to use this transform data for their machine learning models. So they want to train this data using their machine learning models. So in this case we use all together a different set of tools we use Azure Data Factory and we use Azure Databricks for doing the machine learning models and then they're going to visualize the data in Power BI in this case. So in the previous scenario we use Tableau in this case we used Power BI, right? So this Power BI job is failing intentionally to showcase I mean what could be the real what could be the real-time scenarios which we might face during the data pipeline. So in case of a failure there are different notifications being sent across different channels right? For example your Slack gets a notification saying that a job is failed and likewise you also get a service now ticket. So if you look into the service now ticket it says there's a ticket for the Power BI failure. So we know in the data pipeline something went wrong and there is a ticket and some user has to work on it. In this case the Power BI user would get to work on it. They can immediately grab the output and see what's happening across and in the same way the modern incident management tools like PagerDuty has more advanced functions. In this case we also create an incident with PagerDuty and see you can see the job logs attached to the PagerDuty. But we also have other features say for example if the operations or the upstream needs to re-run the job or force finish set the job to okay and continue with the rest of the flow. In this case if we fairly assume that the Power BI team has done its work and need to you know they need to set this job to okay they can just click off a button we call a webhook action and then the controller knows okay this job is already set to okay from the Power BI team and then we will finish this job automatically and go to completion. So meanwhile when we switch back we already completed with the data pipeline acquisition we send the email across different teams and there's yet to date the notification right when you get to the dashboard and refresh it we have done we have we finished with the entire process there are these are the different kind of alerts we have sent to different teams right so this is more or less the end of the data pipeline demo I will hand it over back to Scott. Scott you're still muted. Yeah thank you very much thanks Ravi and very well done in case you guys couldn't tell Ravi's actually pro and does this all the time. So it is a really good use case that shows the flex of the orchestration capabilities all the way from the entire pipeline but then with a few add-ins like the self-service capabilities and the checks and the BPM like activities where you can have manual approvals you know so it really gives a good example of something simple right and of course it can become a lot more complex we have people that create workflows that you think are a mile long with tasks and jobs but it's all something that can work together. What I wanted to do real quick is just talk through a couple of at least one use case here so I can't share the name of these guys but they're one of the largest global food manufacturers headquartered out of France and once interesting with them and their sort of challenge that they face was they had a data pipeline it was built in Azure with data factory and it worked great in the in the environment right just like I talked about earlier they were kind of locked in though to the Microsoft Azure environment with their with their data pipeline but they had informatica and they had snowflake and they had all these other tools that they wanted to include in it and they had a real focus on doing it in real time so they couldn't settle for something that was done in a in a batch or a time-based time-based way so to not spend too much time on it this is an example of what they built using the same framework that I showed before so you know in their in their case they had s3 as a data source google cloud as a data source and then some other application databases they still used azure data factory for the azure side of the business but they also used informatica so they actually used two different data integration tools for ingestion they had a whole bunch of different data storage places they were putting stuff for different use cases and they had the analysis layer and then finally the liberal layer that went out to power vi and I like to tell this the story because it really showcases the pipeline but also some of the things they were looking for so file transfer was a big part of it a lot of times if you're using a standalone scheduler you don't have managed file transfer built into it with our tool you do right and some of the some of the more I don't want to be specific about our tool only really in enterprise grade schedulers things that are common in the market you'll find that they have some sort of managed file transfer built into it but data ops from a process or data ops lifecycle was very important to them being able to simulate things before they pushed them into production was a key thing it was like number one on their rfp when they first came to us and you know from a uh integration standpoint it was a big challenge for them to connect their on-prem world to their cloud world and that's a big piece of what we came in and fixed and you know one of the other things that they talked about when I had a chance to speak with them after uh after everything was said and done and that they'd been using it for you know almost a year was that one of the the other criteria wasn't evident in the rfp was this this aspect of collaboration amongst teams so they had uh in addition to all these other tools that that we've talked about they had one of these uh legacy workload automation tools and they needed to be more agile in fact what did he say it was a quote something like you know the cloud is cloud is agile and our automation needed to be too and what he meant by that was all of his uh we're all of there uh automation had to go through one centralized automation group because it was an on-premise tool and you couldn't make it so that your developers and your data people and your cloud people could all come in and collaborate on these uh on these these workflows so fast forward a year they're using it and it's not just the data pipeline they're orchestrating more they're orchestrating everything like there's they're actually orchestrating the world workload automation tool from our tool so I mean it's pretty crazy what you can do with it uh once you have it in so at the risk of doing just a very brief commercial listen we have a platform that does this stuff manage data pipelines you can see at the very bottom of this of this circle and it is a big piece of of our our sort of pillar or set of pillars of what we do but as you look around this we talked about just about all these things uh today your event driven meaning that you can be real time you have beautiful workflows you can build there's a lot of self-service capabilities built in we didn't talk about infrastructure and service automation today but that's a big part of our cloud story right uh being able to orchestrate things like terra data or sorry terraform and ansible and Jenkins and uh chef and puppet or doing infrastructure on your own so all that real time hybrid IT automation uh across your on-prem cloud and containerized microservices right so that's us uh I saw a comment come in uh in the chat when it was first there and I'll just address it real quick because I happen to have a slide you know the question was something along lines of you know how do you how do you integrate well we have a ton of pre-built integrations we put a lot of focus specifically on data pipeline uh out on our website you can check it out it's called the integration hub and you know for free for 99 percent of the integrations with the exception of like our SAP integration which is much more detailed and deeply deployed than all the other integrations they're free right you can come in you can grab them all the ones we showed today are out there so I encourage you to check it out we add new integrations daily weekly monthly it they're just constantly coming out we have a whole team of people that are building them and uh one of the other things I wanted to mention here is you know in addition to the ones that we make our customers make them too so we have a software development kit templates documentation and we have customers that you know are off building their own and deploying their own because they're either not available through us uh they don't want to come to us to get them or uh they just have the capability right they can do it fast in hours not days right um so there's uh there's a really cool place you guys can go search around and see what's available if we don't have it on there right now ask we may have one in development or in production the customer that we can just change or we can build them in days so uh what do you want to look for sort of in summary and data pipeline orchestration solution number one is real-time data flow I mean this is the differentiator between airflow and some of the other tools out there you really want something that can get you a real-time scenario data ops enabled critical especially if you're building at scale you don't want to push something out there that's a mess or it doesn't work uh so and you want to be able to get the right people in there to test it and play around with it proactive monitoring and learning built-in managed file transfer I think is an important one especially from the data pipeline world and just overall centralized control and the last thing I'll mention here is just the lower left hand corner which is Kubernetes docker container technology you know that that is something that a lot of our customers uh really care about especially in the cloud world and you know there's some vendors in this space doing data pipeline orchestration with their whole approach is built around containers well you know of course we can do containers but we can not right and so just know that that's uh that's something that you that you should be looking for because if you have not fully adopted it yet your cloud world will in the near future now with that said I think we have some time for some Q&A so I will pause let Shannon take us back and ask some questions Robbie and I will do our best to answer them if we cannot answer them here we will get back to you afterwards I love it thank you so much if there's been some questions coming in throughout the uh presentation in the chat here if you have questions feel free to submit them in the Q&A I think we have time to get in at least one or if not two um does the platform support jobs as code yeah I saw that one come through so yes absolutely jobs as code you know we didn't invent the term jobs as code BMC did but BMC is one of those big life giant companies so we adopted the term and just like with BMC you would create it if you want to using your Visual Studio your SDK or I'm sorry your your development tools and push it up to a get hub or whatever and and be able to move it around like that so you can create all this stuff as code you can do infrastructures code jobs as code and so yes is the answer I love it um you know we are right up at the top of the hour here but just answer the most commonly asked questions here just a reminder I will be sending a follow-up email to all registrants by end of day Thursday with links to the slides links to the recording um and there's a grant a couple other things requested throughout um storm ranger also asks that we post a little survey just so you can take a little survey on how um they did so feel free to fill that out it's in the chat section there for y'all um I'll put that in the follow-up email as well um and then Scott and Rami I apologize we don't have much time for any more other questions but I'll get those over to you and get the chat over to you so you have the opportunity to answer right we'll be sure to follow up with people individually with questions that we receive and we just like to say thanks for spending the time with us you know instead of the top of the hour but this is uh or at the bottom of the hour this is this is an exciting one for us we're loving the customers that are using it they're loving us and anybody wants to take a look just give us a call love it again thank you so much thanks to stone branch for sponsoring today's event and hope you all have a great day thanks everyone awesome thanks thank you