 Live from New York, it's the queue. Covering AWS Global Summit 2019. Brought to you by Amazon Web Services. Welcome back, I'm Stu Miniman and we're like co-host Corey Quinn and we're here at AWS New York City Summit. Always happy when we have users on the program to tell their story and joining us for the first time. I'm at Hatter who's the lead enterprise data science architect at Adco, an agriculture company based down in Georgia. Thanks so much for joining us. Thank you for having me. All right, so agriculture obviously, we understand in general, the joke I have for most people is, well luckily your industry isn't going through much change. And of course, yeah, that's the response we get in most. But give us a thumbnail at Adco, how long has the company been around, the focus and some of those changes that you're seeing in the industry. Sure, so let me start just by Adco. So Adco is about a $9.4 billion agricultural tech equipment manufacturer. Been around for 20 plus years and we are well known in the industry through some of our famous brands like Valtra, Fent, Massey. Coming back to your other question where it's not going through a lot of change, I get that very often and you know what? It was an eye-opener when I joined Adco. So the farming industry is actually going through a lot of change. You must have heard of agro tech. And so the farmers now, they want better efficient solutions that will help them manage their farms while they focus on their core work of farming. And they're looking at companies with manufacturer agriculture equipment to help provide that digital support, help provide solutions that help manage their farm better, help them do productive maintenance better, help them optimize the equipment and so on. And that's where we are trying to help them out. Yeah, so it's always easy to look at any industry and they're like, oh, they have it easy and it's not changing that much. You've got data science in your title. Talk a little bit about your role inside the company. We know how important data is to most companies but of course with the data scientists, it's your job to help unlock that power. Sure, definitely. Let me give you a little bit of background that will help frame this much better. So Acco realized the power of data a while ago but very recently they started working on this through something called a digital experience, digital customer experience program. What that does is basically it creates your set of connected solutions that manage the data of our customers, our dealers, our parts and machine data in a fast, reliable and secure manner. And all these digital solutions that we are creating, they are powered through analytics to leverage new market insights to unlock new opportunities to help understand our customers better. So given that particular space, I helped design the Acco's data science vision that involves, first of all, setting up a data science platform that enables us to maximize the use of our data that we have. Secondly, working with our business to identify analytics use cases which could be a part of the product roadmap and build them out and then execute this on the data science platform. And thirdly, from the point of view of architecture, understanding what things go in the design, making sure everything state of the art helps the design document and making sure that we are staying right at the top in terms of agriculture, in terms of data science and pushing it to boundaries in all the products. One of the, I guess, hidden secrets of data science across the board is the sheer amount of time and effort that has to be put into data normalization before you can start getting useful information out of it. Was that a significant concern given what you do or given the fact you more or less control the entire thing and then you can reformulate the data as it's ingested? That was a very valid concern. I mean, what most people don't talk about is the quality of data. They only talk about data science or fancy things. So we had the same challenges. Our data was distributed in different places, had different formats, had different levels of cleanser lens. So what I did was, during the building of the data science platform, recognize this challenge proactively and make sure that we do cleanse the data. We normalize it to a format that's usable for our use cases, but we don't do it all at once. We go use case by use case. We identify business priorities. We normalize the data. We cleanse it. We normalize it. Bring it to a format that can be used going forward and we do it with every use case. So over time, we'll have, majority of it will be normalized, but that will take an incremental gradual effort. All right. Amit, bring us into the role of cloud in your environment. Sure. So cloud is a very important component. Historically, we were more like an on-premises organization and when we went on cloud, cloud data, it was a very important change. More so from the point of view, if you think about it, for a company to migrate or position itself, transform itself into a software organization in terms of data science, you need a lot of accelerators. You need data scientists. You need infrastructure. You need data engineers. And you need people to manage all of this. And all that hiring talent takes time. But what cloud does is, with your ability to procure services on demand and something which is fully managed or serverless, it allows you to overcome a lot of those barriers quickly while you have time to actually build on the solutions on top of the cloud. Over time, when we understand our processes better, our demands better, then we can think about, okay, where does it make sense to go hybrid? But cloud is that great accelerator that allowed us to set up this data analytics platform, which we did in roughly about 15 weeks. Before that, I was working in another organization where we did this on-premises. And I can tell you, it took at least like three times, if not more. So that, I mean, I think that's the real value of cloud. Apart from all its machine learning services and everything, it helped us to accelerate our processes. How, the workflow that you'd wind up going through, how close is the data that you're generating to the cloud? Are you doing this at the edge? Are you doing this in the field in some cases? Or is it, I guess, where is this data entering your pipeline? Sure, so there are different forms of data that we have. We have a lot of data that is customer-related data that essentially is more or less low-moving data that we have within the organization. That is, comes to some major bulk of the data. Apart from that, we have data that are coming from machines, which are the smart machines operating in the field, and data comes through the satellites and comes to our servers. We also have data that comes from the edge, from some of these machineries that are operating in the factory, and from there, you will get data on the edge. Among all these different data sources that we have, I would say the predominant or the initial focus, the pilot focus, is to first start with the data that we have in abundance. So that's essentially the customer data, our dealer data, to be able to understand that better, derive new market insight. But our focus is to go forward, getting data from these machines, combining that with the soil data, with the farming data, with agronomy data, to deliver these very precise, things like precise planting schedules, things like predictive maintenance of machines, as they operate out in the field, and things like value-driven care. So those are things that we are hoping to do with us as well. You mentioned machine learning, where are you along your journey, kind of with the ML, AI, and the like? So that's a really good question. So as ACCO as a whole, I think we are different stages in different parts of the organization. So a lot of the organization is focused on generating value through descriptive analytics, so which is, and exploitative analytics, whereby we are exploring the data, and we are finding these insights, and then making decisions on top of them. We are going into the area of predictive analytics, fairly recently, about a year or so, and we essentially, that is our next step. So we went into predictive analytics. We are creating machine learning models. We are creating combined stacked models. We are using services like SageMaker on the cloud. We are using Spark Sembel libraries. We are using Cycler to learn. We are using R, all of that to create predictive analytics solutions. In terms of the technology that we use right now, it's actually pretty much a state of the art. We have created our own model management engines. We are using what Amazon provides and we supplement them with what we have. So we are pretty much a state of the art in terms of current, what we are doing. We are hoping to take that state of the art and apply it to large parts of the organization. So as you look at, I guess, some of the higher level differentiated services coming down in the world of machine learning, do you find that a lot of what you're doing today in a few years is going to be something that's being handled automatically and then you're able to focus on the more interesting parts of the work? Or is there really no end in sight for, I guess, sort of some of the current block and tackle that a lot of data scientists are sort of struggling with today? I'm sorry, I couldn't hear a part of your voice. No, my apologies. As you see things continuing to evolve in this space, are you predicting that there's going to be more, I guess, higher level services that solve some of this problem for you? Or is a lot of the, I guess, block and tackle not really have a relief point in sight? That's a very good question. I get that very often. So I would like to say the answer depends, but I'll describe the answer. So there are some parts of this machine learning AI that I think will be solved by newer services, by technology going forward. You can take an example. I'll give you a concrete example. SageMaker, which is fairly recent offering by Amazon, about a year ago when we started using SageMaker, didn't have a lot of components that it currently has. And we had to build a lot of their components to get towards something called model management. Now, we built all of that, but lo and behold, after re-invent, they actually added a lot of these. So over technology, they will take care of a lot of these things, which you currently do by smart automation. Now, smart automation can take care of a lot of things. It helps you identify when you need to retrain a model. It helps you to deploy your model. It helps you to identify the trigger points. But what analytics, I mean, where I think the challenge would come is, how to actually apply it to the business? Because that needs a lot of context. And for that, you need to understand where are these specific pain points? Where do you actually apply it? Does it make sense to use as a prioritization model? Does it make sense to use it as an explorative model? Does it make sense to use an attribution model? And to help define that use case in the beginning to essentially say, going from a business landscape to come to a specific problem that you want to solve, that is the part that I think will take some time and can't be readily addressed by these technologies. But everything down the line, I fairly see that in a few years, all of that will be available. Amit, are you speaking here at the conference? No, I actually spoke at the keynote in Atlanta in the summit. Great, give us a little bit about what you get out of coming to some of the regional summits here from Amazon. Yeah, definitely, so I get a lot out of it. So the biggest thing is, I get to know what are the different things that are happening in the industry from the point of business. So not just about technologies, right? Like lots of different technologies coming out, but how are people using it? How does it make an impact in their business? Because for me, the intersection of technology and business is the key point. So coming to a lot of these regional summits where they have these different business partners, they come in and they describe their work and connecting with them, that for me is the main draw. Apart from that, there's the other piece which is you get to know about the different things that are being done in this space. For example, if you go to an AWS summit, you get to know everything that is coming to the cloud and you can try and experiment that. And you can basically create like a nice ecosystem. If you go to an AWS, an Azure summit, you get something similar. So that state of the art is also important, but more important is the draw, that intersection. And I guess one just follow-up on that is the data scientist community is, what are some of your best sources of learning and sharing today? That's a very good question. Data science is one of those aspects which has two parts to it. I don't know, I mean, now there are machine learning engineers too. So, but one part is the technical part of this to be able to create these models with pinpoint accuracy. And the second is application. So in terms of the first part of learning about creating these models, the best sources in that case would be self-learning. I have, I went through, when I was doing this, I did my PhD, I learned a lot of stuff. And then I go through a lot of articles when new things come out. You go through them, whether, what were the different sources? There are lots of them. The second part, right, application. I have found the best source of learning there is actually interacting with people who use these technologies. Interacting with people, let's say who would have no experience of data science, have experience of business, and then working with them to understand how can you take this insight that's created out of a model and impart into business. For that, there's no other substitute than just talking to people, understanding the pain points, and then solving those. All right, well, Amit, thank you so much for giving the update on AgCo and your role inside. Thank you. So much for joining us. All right, for Corey Quinn, I'm Stu Miniman. We'll be back with more coverage here from AWS's New York City Summit. Thanks as always for watching theCUBE.