 Hello everyone and welcome back to the attic. Apologies for that slight delay that it had only to do with us, with our production team, nothing to do with our keynote speaker that's been patiently waiting. So we're so happy to have him here to talk to us about foundations of data teams. A successful data project needs to be built on solid foundations to tell us just how important this is and how to do this reliable. We have with us Jesse Anderson, managing director of Big Data Institute. Jesse, welcome. Welcome to Big Things conference and apologies for the delay. Looking forward to listening to you. So we'll catch up on that, on those minutes. Whenever you're ready. I'm ready. Thank you again. Thank you for an introduction. Thank you for taking the time to be with me. So let's try and make this more interactive. Let's try to be use zoom for what it's good for. I have the chat open. Tell me, are you a manager? Put a yes in the in the chat. If you're a manager, if you're an individual contributor watching this, put an N and that will help me get to know who you are and so that we can talk about this accordingly. So what we're going to talk about are the foundations of data teams. And here's our itinerary for this talk. We're going to be talking about houses. And you may not be thinking about if you're in a big data conference, a house, but we're going to be talking about that as a metaphor. And I'll explain what that metaphor means. Then we're going to talk about disclosure. What does it mean to disclose problems? Who's responsible for that? Then we'll talk about the data teams themselves. What does it look like when we do or we don't have that data team? Finally, we'll talk about the implications. There are very specific implications for this. And I think it's really important that you know that. So just put that in the chat and I'll be able to see that. So sometimes it's really implicit about what we want versus what we get. Sometimes we don't actually state this. Sometimes it's kind of implied where when somebody says, I want data, they don't specifically say, I need data in this particular system, and I need it in this format. They kind of say, well, you need to go figure that out. It's implicit in that. And very often when we when somebody says, I want data science, I want that cool data science or I want that big data, they don't specifically call out, I want this, this and this, it's implicit in that assumption. And so sometimes when we get that asked, sometimes when we get that asked from maybe our business, from our whomever, sometimes what we're focused on is that exterior, that cosmetic, that you can see that picture there on the right. As you look at that picture on the right, seems like it's Italy to me and looks just looks like Italy. And it's incredibly beautiful. The houses are all beautiful. But as we look at those beautiful houses, we don't think, Oh my goodness, what kind of foundation do they have? And I wonder who engineered and architected that. What we think about is that house looks beautiful. I'd love to live there. I'd love to have that view. But there's all kinds of things there. Their houses literally hanging on the hillside. But so there had to be some level of engineering that has to go into that. But we as consumers of that house, we focus on that exterior focus on the cosmetic. So in this kind of metaphor, what we're going to be focused on is houses. The houses are the value that we create as data engineers. So what kind of value do we create as data engineers? It's our house. And we'll kind of use that metaphor throughout. And then our construction. How do we go about creating that value? And this is what we need to do. We need to have that be constructed. We need to have something that somebody built, somebody engineered, somebody put effort into saying, this is how the house should be built. And the responsibility for that construction is actually us as data teams. We need to be doing that. So we'll be talking about what that construction means for us as data teams. A lot of times our focus is on the facade. And as you look at this picture, you'll see a movie lot. If you ever seen how movies are made, what happens is they will do what's called an exterior or a facade. And somebody will stand in front of, in this case, Pete's joke shop. And all you see on the camera is that window and that door. But could somebody actually live there? No, of course not. It's just a facade. It's just here is literally nothing. And we're missing the complete rest of the structure. Oftentimes when we are working with data teams or we're creating our data teams, we're so focused on that facade, we're so focused on that model that we forget the rest of the structure. And this can be a huge problem, which we'll get more into later. But this is something that is commonly people fall into, especially data scientists, especially business leaders who've really bought into that notion that you only need data scientists. The reality is that we need all of the teams. We need all of the data teams. Otherwise, if we just have that data science, that's just the facade. If we have the rest of our teams, then we're going to have the rest of the building. Somebody could actually live there. And this is that value being created. Other times, people will talk about foundations. They'll wonder, if we have a data team, do we just spin up that data team for three, four, six months? And then we say we don't need them anymore? The real question there is they're asking, do we need a foundation for these data teams? Or are they so transient that we don't need that? Here in this picture, we see what's called a yurt. And a yurt is really interesting because it's a type of house that doesn't need a foundation. It was created by nomadic peoples. What nomadic peoples means is that they don't stay in one place forever. In fact, they specifically created their teams, their houses, to be able to move. So they could pick that house up and they could move it elsewhere. But is that really what we need with data teams? Well, houses don't really need a foundation if you don't need it to stand for long. But data teams do need to stand for a long time. They aren't a six month thing. They aren't a one year thing. Therefore, this point onwards, that we are going to continue to have our data team. We are going to continue having that data team be a part of us. So what we have to do is we have to make sure that our foundation is solid. We have to actually create a foundation. We aren't nomadic data teams moving from one thing to the next. We have to build a solid foundation for them. And so we do. We do need a foundation because we want it to stand for a long time. So sometimes people will buy into this issue or this thought that, well, it's an easy renovation. I want data science and that's an easy thing. I'm just going to take hire a data scientist, plop them in and say, okay, we're ready to go. We've got that data scientist. Well, that isn't really the case. As you can see in this picture, here they're just doing something so simple as remodeling the outside of the building. But how many people did it take? There's all kinds of people. There's all kinds of scaffolding. So when we think about adding something like data science in, this isn't a simple addition. It's not a simple, easy renovation. Adding and these advanced analytics requires a lot. It requires more than sometimes, oftentimes what managers have been told, oftentimes what they've been seen. So rarely what we need to do is we need to see what actually has to happen. If we say I want data science, this is what actually has to happen. We're not just at this easy renovation. So this brings us to the question of disclosure. What does disclosure mean? Well, I know many of you are from around the world and oftentimes, at least in the US, we have these laws and these laws say, if you're going to sell a house, you have to disclose any problems. You actually have to say, I'm selling my house and my house has a problem with a roof. I know about it. You should know about it too. However, sometimes sellers don't actually want to disclose everything. It makes the value of their house drop. It makes it where they thought they could sell their house for 500,000 euros. Now they can only sell it for 100,000 euros because it's going 400,000 euros because it's going to cost 100,000 euros just for me to fix that route, for example, or it's going to cost that to fix the foundation. And oftentimes, depending on the country that you're in, this non-disclosure, this not telling of the problems actually has legal and monetary implications where if somebody doesn't actually say, hey, there was this problem with the house, you can go back and sue them. And you can say, you should have told me about these problems. Now I have to pay for them. And if you look at the image there on the right, this is the disclosure form for the state that I live in the U.S. And it actually says, here are the problems. If you don't say this, you will have this and this problem, failure to disclose will have these problems. So tying back to our data teams and our houses, who's responsible for this? Well, let's talk about that. Whose job is it to actually disclose problems on your teams, issues on the teams? Well, you can go through that list. Is it a vendor's job? Is it a management's job? Is it an individual contributor's job? Is it a book's job? Well, let's kind of go through each one of those. Is it a vendor's job? Do you remember how we were just talking about the issues with vendors or with the non-disclosure, where if they actually disclose and they tell you, hey, there's this problem, it actually makes their jobs difficult, their house worth less as it were. So what oftentimes vendors will do is they'll tell you it's easy. They'll actually tell you the opposite. They'll actually say, hey, I don't want to make it sound like it's too difficult. So I'm actually going to tell you the opposite and that's an issue. So vendors can't really trust the vendors to do that. It's going to make it so they can't sell. Is it up to management? Well, you're getting closer. Management needs to be able to say, hey, there's often this issue of they don't know what they don't know. They don't actually know that there was a problem. They don't actually know that you do need all the data teams. There can be lots of issues like that. Is it up to your individual contributors? Is it up to your data scientists and data engineers to say, hey, I'm just one person. I actually need more people. I need a foundation here to build on. Or is it books? Well, there's all sorts of books out there on data science. They range from rah, rah, rah data science of everybody's going to do data science and here's all kinds of value you can get. They don't really tell you, here's the work that you have to do. So there is some missing pieces of whose job it is. I mentioned this a bit earlier, but sometimes people, you'll actually hear from vendors and maybe you've even heard from vendors in this conference and they say, ah, don't worry about it. It's easy. This particular technology, my technology that I'm selling, everybody else's is difficult. Mine is easy. Mine is easy because I'm able to do this, this, and this for you. Well, is it really easy? Well, I would say some vendors are actually misleading in their disclosures. They're trying to sell you this house and they're going to say, hey, it's easy. You don't need that foundation. Don't need those data engineers. It's all easy, but is that the reality? Is that actually true? And I would say oftentimes those vendors are actually misleading in their disclosures. They're telling you something that isn't actually true. It is actually the opposite of what the industry sees, what we see in the field, and it's sad, quite frankly. So now let's talk about the data teams themselves. What do you need for data teams? So here we have a diagram of the three data teams we need. And sometimes when we talk to management, the managers will say, which one of these teams do I need? Just need one. I can just get by with data science, right? Or they've been told by somebody or they've read something or the focus of an entire book was you just need data science. Well, here's the issue. You actually need all three. Each one of these teams is equally important and they give a specific part of that value to being a part of a data team. And if we don't do this, we're going to have a problem. So in the next few slides, we're going to talk about, well, if you're missing one of these teams, what does it actually do? But each one of these teams, data science, data engineering, and operations, they form this triangle that is required. They all have to be there. So let's say we didn't have data science. What would happen? Well, let's say there is a foundation. As you can see in this picture, we have a foundation. It's more or less clean. I wouldn't say it's the cleanest thing, but no one could live there. Why can't anybody live in this place? Well, it's because there's no value being created. Without data science, I would go so far as to say, if you're not going to hire data science, if you're not going to do data science, what is the real value of what you've done through all of your data engineering or through operations? Because you can get to some level of value with the business intelligence. Definitely. We've been getting some level of value with them. But the maximum, the highest and best usage, the highest and best value possible is definitely when we have data science. They're kind of the cherry on top. If you think of that Sunday, that ice cream that you're eating, well, that cherry on top gives you that thing that you say, oh, yes, generating incredible value. So here, when we lack data science, the business doesn't get enough value created so nobody lives there. We have to have data science. But what happens when we don't have data engineers? That's a whole problem unto itself. As we can see in this picture, the lack of engineering in this house made it crumble. Maybe it was due to a large snowstorm, whatever happened. It was due to a lack of engineering. So likewise, in our data teams, when we don't have data engineers, the infrastructure, the data products are going to crumble because there is a lack of sound engineering. This is what always happens. This consistently happens. And we have different words for this. We have words such as technical debt. Well, in my experience, and I've worked with a lot of teams around the world on this, when there are just data scientists and there are no data engineers, the amount of technical debt that is created is significant. It's this huge weight that sits there on the roof and makes it crumble. And for you as managers, what this looks like is the team is far less able to actually be productive. They can't really do anything because they're always limited by that technical debt that you hear in the meetings and the stand-ups, for example, they say, I would really love to do X, but I can't. Or it's going to take too long. This is what it looks like when you don't have data engineers. You lack that sound, that infrastructure, that scalable infrastructure, and everything is always crumbling due to the technical debt. Likewise, when we have our operations, operations, when we lack it, we can see that in the picture. Hey, there's a foundation that hasn't crumbled. You could say that there's some amount of value being created. There's something, somebody lived there at some time. I wouldn't say they're living there now, but somebody lived there at some point. So without this operational excellence, the business can't use these data products. If they're just completely neglected, if they're messy, if they're unkempt, what happens is the business will never be able to trust them. I don't know if you've ever talked to a business person or the business user, and you talk to them about a data product, they say, it was so unusable, I had to create my own. I've talked to people who've had to do that. And it was because there was no operational excellence, that the data was so dirty. The data was, the infrastructure was always down. They could never wait for it. They could never use it. They could never be confident that it was ready to be used. This is a lack of operational excellence. And this very specifically will inhibit you from actually getting some usage out of this. Without operational excellence, you can do all this incredible data engineering, the most incredible models ever. But if it's not up and running, it can't generate value. You absolutely have to have operations too. As you can see now, you need all three of these. All three of these provide different levels of value. So what's the implication here? So maybe you have, you're about to start a team. Maybe you've created a team already and you're starting to see some of the issues of your foundation. Well, guess what? It's always better to put the effort into a solid foundation initially. Always better. And by doing this, we're able to get into this solid foundation. We're able to say, hey, as you can see in this picture, we're building a skyscraper here. And we should put the effort into this. As you can see, there's a lot of rebar. There's a lot of concrete that's going into this foundation. They are putting a significant amount of time, money, and effort into this. And they're saying, if we make this foundation right, we will get it right. We will be able to build on this and we'll be able to build a building that stays up forever or for a long period of time. But if we don't, there's big problems. If we don't build that solid foundation, it is going to cost significantly more to go back and fix it later. Significantly more. And this is what this looks like. There is a building in San Francisco. It's called the Millennium Tower. You may not have heard of it, but here's what happened. They had a problem with their foundation. Now, there's fingers pointing everywhere of whose fault it was. And that's interesting unto itself, a point being they had a problem with their foundation. And this tower was not a cheap tower. This was $750 million US and a pretty significant amount of money. The actual apartments that were in there, penthouses, that sort of thing, these were luxury penthouses. These were very, very expensive. And so a lot of wealthy people lived in this tower. Well, what happened? As you can see, I've taken the liberty of doing some screenshots of some headlines from it where they said, well, there's a $200 million lawsuit to go back and fix it. And to say, by not creating the solid foundation, you cost us $200 million. And oh, by the way, in addition to that $200 million, it's going to cost $100 million just to fix this foundation. It's a big problem. And you can see that from the pictures here. And oh, by the way, this picture is actually a picture of the Millennium Tower as foundation being created, being done. So who knows what the actual problem or whose fault it was. But suffice it to say, there was a problem with the foundation. And this problem with the foundation was significant. So let's talk about this in terms of your teams. Well, would you rather create your team the right way? Would you rather put the right foundation in? Or would you rather go back and spend $300 million to fix it? And this probably $300 million is probably on the low end. It's probably going to cost even more for them to fix this right. So that is some pretty significant problems. So you as a manager, maybe you're a team lead, maybe you're a person on the team, an individual contributor. What you need to go back and talk about and think about is if we don't make this solid foundation, you have to do it no matter what. As we talked about, we're not in yurts. We actually need a solid foundation. We need to put a significant amount of money and time into our foundation. And if we don't do that, if we don't get the right people, the right teams in place, we're going to have to go back and fix it later. And fixing it later when we're, for example, missing a data engineering team, it means, and I've experienced this personally first hand, it doesn't mean we spend a week or two cleaning up some of the data scientist code. It actually means we spend a year fixing data scientist code because there was no data engineering place to say, hey, that infrastructure that you did, that code that you did is not good. The way you wrote that, we need to do it this way. We need to do it this way instead. Or operationally, experience that too. There's no operations in place. The business starts having trouble. The business starts losing money because you can't be confident in your infrastructure always being ready. So there are definite issues there. Likewise with data science. Well, data science is really that facade, that last piece that we need. So for missing that, it's going to be more difficult to add this later. So we really, really have to make sure that I would say that two key ones, two key initial ones, is data engineering and operations definitely. So why would we want to do this? Let's say we have an existing team. What does this look like? And so I wanted to create a visualization for you as managers, maybe as tech leads or what have you to see what exactly it looks like and perhaps even explain what you're seeing right now to put some words and visualizations of what you've been seeing. So we have two lines. We have a red line and a blue line. One, we have the blue line with a solid foundation and we have the red line with where we're lacking a foundation. So what happens when teams lack a foundation, they'll get to a certain point and they'll just crash. As you can see, they'll get to a certain point and crash. And what will happen is they'll work, work, work, work, get to a certain level of either the infrastructure dies or they're missing the right people or they're never able to build on that solid foundation and leverage that. They're always trying to go back up and down and trying to get back to where they were. It's really sad. It's really, frankly, unfair to the teams that they're being set up for failure like this. However, if we look at that blue, that's that team with a solid foundation, they will gradually go up and they'll get to new levels and then they'll kind of level out, go up to a new level of complexity, level out. Notice there's no going down. There's no crashing. It's because they built a solid foundation each time and they're able to build on that foundation. And as you can see, once the team gets into more and more complex things, by building on that solid foundation, they don't have this crash. And this is really what's key. If you are a company or you're a manager on this team and you're seeing this crashing consistently, maybe it could be due to technical debt where you say, I want to do this. And then they crash and there's no productivity. That what you were told initially would take a month, now take six months. That's what a crash looks like. Whereas when you do have these solid foundations and these right teams in place, then you'll be able to build on top of that, build on the next level, build on to the next level. And you will have a significantly more productive team as a direct result. So if it isn't clear already, it's always more important to have a ounce of prevention. You may have heard the term or the saying an ounce of prevention is worth a pound of cure. You may not know what ounces since you're in there, but maybe it's a milligram of prevention is worth a kilogram of cure, if you were to put it in metric terms. Well in data teams, I say it's worth a milligram of prevention is worth hundreds of kilograms of cure. If you just put the effort in initially, you're going to save yourself so much on the back end, so much afterwards. And so what I want you to take away from this is it's always possible to fix your foundations. Always possible to go back and say we are going to fix this, is going to be costly, and the longer you wait, the more expensive it's going to be, but it's always possible. So do take that, those of you with existing teams, you can fix them, the longer you wait, the more difficult it's going to be, and the more technical debt. So what I want you, one thing to really take away, really to internalize on this is this is not saying this is purely a technical issue. That's what happens sometimes. People say it was an issue of spark, it was an issue with Kafka. Well, there may be issues with that, but the key issue was it was both technical and organizational, and you have to fix them both at the same time. So this ounce of prevention is to set yourselves up organizationally, so that you can leverage the technologies correctly. And that's really key. So although I don't talk specifically about set up organizational, I just wrote a book called Data Teams. And in it, I share more about what each team does, what each person on that team should be, what a data engineer is, what a data scientist should be, what an operations person should be. And that perhaps some of the parts I'm most proud of is that a good portion of the book is actually talking about how do data teams actually interface with the business. This is part of how we create that value. So once we get that solid foundation, then we need to be creating that house, we need to be able to create that house that we can live in so that the business can actually generate value with our data products. This is really key. Other things I'm really proud of is for those of you with existing teams where you're not understanding what's happening, I have an entire chapter in there talking about this is what you need to do, how to fix that. So I would encourage you to read that. There's a website for it. It's called datateams.io. So with that, I'd like to thank you. And I really appreciate you spending some time with me and attending this keynote session. Thank you. Thank you so much, Jesse. Excellent. Very interesting. And you answer a lot of the questions. We have time for a couple of them. They ask you, do you need a chief analytics officer or chief data officer? Can we have both? Should we have both? That's an interesting question. It's actually one I talk about in the book. Who should the data teams be under? So to give a kind of an adjacent answer before I answer that question directly, one of the issues is if you have your data science and your data engineering in separate parts of the org and that you have to go all the way up to the CPO for resolving any issues with resources, that's going to be a problem. So what companies are trying to do is they're trying to centralize their data teams under the same chief level officer of the company. So to that question of CDO versus CAO, I haven't seen anybody have both. It seems like it's a one or the other. And it also seems like it's a how the company or how the data teams came out. Was it chiefly a data science, really data science focused? If they're really data science focused organization, I found it's a chief analytics officer. If it's a really data engineering kind of engineering heavy organization, I found it's a CDO. But at the same time, what will happen is that CDO, that CAO, they'll be able to do the job. They may not be a strong on one part of the other. And they'll be relying on strong people, really good people on both sides to be able to lean on and say, if they aren't as good on the data science, lean on the data science manager, they aren't as good on the data engineering. But I think really what is key for CDOs and chief analytics officers is to realize that. One thing that I would suggest is I actually have an interview in the book with the chief analytics officers for Stitchfix. And we go through that and talk about how they have their team set up. Excellent. We're gonna have to read the book. I already read some reviews through Twitter about the book and people just highlight all of it. Everything is so interesting. So congratulations on that. We have time for a very quick question. If you promise a quick answer, Jesse, they ask you, one of the common reasons why data engineering teams fail. If you can pinpoint at least two or three quick mistakes, quickly some of the most common mistakes, you would say. The first and foremost ones is having DBAs as your data engineers. DBAs, in my definition, are not data engineers. They have to be software engineers. So that's one. Okay. What companies will do is they will take their data warehouse team, say your data engineers, now you're going to do that big data project to arrest the people failure. The other one is to not give the data engineers the resources to learn these new technologies. These new technologies are difficult, no matter what your vendors say, and they do need the time and resources to learn. Okay. Excellent. Well, thank you so much. I bet we can read a bit more on your book, Data Teams, that came out in September, last September, if I'm not mistaken. Actually, I think it came out on my birthday, if it's on the 23rd. So I wrote it for you. You just didn't know it yet, and we didn't know each other yet. Okay. Thank you so much. I recommend everybody gets the book and obviously listens to all of your talks and follow you around. Very interesting. Thank you so much for point pointing that we need data scientists, data engineers, and operations mainly. So, Jesse Anderson, thank you so much for being with us today in this edition, the ninth edition of Big Things Conference, and we hope to see you very soon. Stay around in the attic for the last of our keynote speakers today in a minute.