 Hello and welcome back to theCUBE, where we're now over in our very mobile and agile press room set right at the moment. So excited about this next guest, got to meet him yesterday and hear him talk about his stories and his journey with Databricks. I think really very relatable, even though the industry may not be the same as the industry that you're in, we got Alexander Booth, who's with the Texas Rangers out of the Major League Baseball. You may know them, again my beloved Red Sox are at kind of the basement of the AL East. You guys are at the top. Just such a great story about how data is really changing all of these athletics in particular. And you're on the baseball operations side, right? Yeah, that is correct. I work in baseball operations for the Texas Rangers. And so you're not dealing with the fan experience and stuff like that. You're more, hey, what are the guys doing, the players, and how are we getting them better in things of that nature? Yep, we're working with the players. We're looking for how we can develop them, how we can make them better. We work with the amateur draft, which amateur high school and college players do we acquire. We have the trade deadline coming up. How can we maximize the team? Really put together a world series caliber roster. It's surprising they let you out to come over here with the trade deadline coming up in a couple weeks. I know, if this conference was a couple weeks later, I don't know. July's our busiest month of the year. Draft, too. Draft is the same month as the trade deadline now. Yeah, and really, baseball gets real when the weather gets hot and everything like that. And I think that was actually, it was funny. I think one of the comments you had during the press event yesterday was around climatology and weather and the weather forecasting and influencing and checking all of that. Yeah, we know that certain stadiums have played differently because of weather. So Denver, Colorado, a mile high city, right? Balls fly further, there's always more offense, more likelihood of hitting a home run. But we have never really had that at the batted ball level until now. So that's on the individual hit, individual pitch, how does the wind, how does the air movement kind of affect the trajectory of the ball? So it's a crazy new data source. Yeah, I think it's awesome to see. I think it's changing how people approach it. Some people love it. Some people are like, hey, it's too much. But I think that you can never have too much data for these teams and things like that. But you were saying, and I think we were talking about this, it's not like they have this data at their fingertips necessarily inside the game unless it's preloaded or something to that. Right, during the game, we do provide reports that are looked at before the game. So if you actually watch a broadcast, you may see a player take off his hat, check the inside of his hat, or the catcher trying to check his wrists or arm bands. Those are preloaded with reports that we've generated using kind of our data and visualizations. So the hat one is the defensive positioning. They look at where they should be standing in the outfield for every batter. And then the catcher's arm band is very complicated, but it basically helps them communicate like in a two strike count, we should throw like a breaking ball or something like that to the pitcher. That's pretty amazing. I did wonder, I figured it was something defensive on the cap, and it's like always interesting. And now, even when they're talking to, on the live broadcast, and they're talking to somebody who's in the outfield during the game, and he's like, oh, like going like this. And I'm like, okay, he's checking to where he must be for this person. So there must just be tons of data that you have to bring in. Right, and the coaches do have this in the dugout too. One of my claims to fame is when they do like a little pan of the dugout and the pitching coach is holding a report that I helped make. I'm like, ah, I made that. He's using it, it was just great to see. You know it's getting actually used. You want your work product to be used. Exactly. I think that's always great. And you've been building this on top of Databricks. You guys have transitioned over the years and how did you really get started with Databricks? Absolutely, over the last couple of years there's been an explosion of new big data sources in baseball. So I already mentioned the weather data. Every five seconds we get the wind, the humidity, the temperature, and then we model kind of the stadiums now in almost like a fluid dynamics way to see kind of how the ball moves through the air. But the other big data source we've now acquired is biomechanical data. So we have pose tracking data coming from markerless motion capture systems that track the movements of the body for all the players. So every pitch that gets thrown and every swing that gets made, we can see how the players' arms are moving, how their legs are moving, how their hips are moving. Okay, really tracking their head, shoulders, and knees, and toes. Yes, I like that saying that you used yesterday. And I think it's really interesting because again, with the trade deadline coming up, with all of the draft later in the year, you were talking about how there's a lot of unstructured data that you're turning into structured data. And that seems to be a big theme for you as well. Yeah, absolutely. So these large data sources, these large unstructured data sources of kind of the biomechanics data, of the wind data, that has to be transformed into an actionable insight where we need to make that data available so our players and coaches are able to get meaning from it. And so we found that our other solutions weren't like working with this new data and Databricks came in when we did our POC with them, worked on the how we could scale up kind of these processing models, really using Databricks for that transformation layer. It was just a natural fit. You guys have used other things before other solutions as well. Yeah, so this is also where it's like, even though we work in baseball, we have the same data problems as everybody else. So we started with an on-prem system and then that on-prem system couldn't scale. We moved to the cloud. We had some issues with our traditional data warehousing, data replication issues, governance issues. Now our team has doubled in size and our data has only gotten bigger and bigger and bigger and we naturally just moved to Databricks to kind of facilitate all of that. I would assume governance is a big piece of it as well and within the organization who can see what when. Yeah, absolutely. So governance, we take that in many ways at the Rangers. So the Rangers, even though they're headquartered in Texas, we have clubs in Arizona and North Carolina and the Dominican Republic and we have teams there, our minor league teams are all there and we have analysts across all those teams as well. And then as you said, we need governance for kind of the players and coaches access as well as kind of our apprentices as well as our kind of senior executives like who should be able to see what data. And yeah, as we communicate with scouts too, like they need to see certain metrics but maybe not all the metrics that we have in the warehouse. Has the MLB come down and said here's how you use AI yet or are they still trying to figure it out as well? Yeah, not quite yet but I know that they're definitely interested in the product and kind of trying to keep all the teams on an equal playing field. But I think that being cutting edge with AI, adopting these new machine learning models, exploring use cases for LLMs, being at the lead of that I think will be a competitive advantage for us against these other teams. Yeah, no trash cans needed or anything like that. So I think that's not that I had to take a shot down the street from you guys but when you start to look at it, I think that all of this data coming together, it just enables your team to go into, and it's not like, I mean, again, I think a lot of teams are doing this, I would assume and trying this, maybe not in the same ways that you guys are doing it but all the way back to Billy Bean and Moneyball when he's doing it more in spreadsheet oriented things which is crazy to think about back then. Absolutely, but that's what this data is. It's the same idea as Moneyball and Billy Bean. We want to find new ways of analyzing players to find a market inefficiency to be able to construct a winning lineup first. And so all this data that we're trying to make available is going to cause that disruption. The win data, the time mechanics data, someone, some team, maybe us are going to figure out how to make metrics from that. That can be used to evaluate players and perform and optimize those inefficiencies. So are you using a whole bunch of the different products, the Lakehouse products and data warehousing products from Databricks? Yeah, so in Databricks we utilize Delta Live Tables and Unity Catalog for all of our governance. We've sped up a lot of our workloads with Photon and we're pretty heavy users of Autoloader too. Okay, what was the most exciting thing you heard about today kind of in the keynote? Yeah, there's a lot. I think the Lakehouse apps is really cool. I think being able to use kind of these startups and being able to get that integrated and be able to try out new things is going to be huge. And of course Delta Lake 3.0 with all those new additions in kind of, we do have some applications that utilize things with Iceberg. So being able to just reference Delta Lake objects is gonna be really cool too. Yeah, and it seemed like that the way they're going with being able to kind of be the control plane over that data would be interesting as well. Yeah, absolutely. And this is also just only on the engineering side, like we haven't even touched too much on some of the machine learning models we're using on Databricks and some of the new stuff with MLflow. I think, I don't know if they announced it, MLflow 3.0 today. They talked about the MLflow extensions that they were doing, which was, and I'm trying to remember back to, there was two that they were open sourcing. And bringing out, which is, I think for me and we travel around to a lot of these different conferences and we were up in Vancouver most recently at the open source summit. Oh nice. And so when you start to, and you know, sparks there, Trino, Presto, all of the data mesh folk and then all the Linux folks. And I think, you know, our whole thing is open has won and will continue to win, which is, I think for me, sitting back in my analyst chair and also thinking of, you know, way back in the day when I was in, you know, IT in an organization, I was in financial services, I would want to be using my models or being able to bring an open source model in to versus using a third party provider model. No, yeah, we love open source. That's one of our new tenants for our modern data strategies, invest in open source as much as we can. So it's just great seeing that community. We've submitted bugs and the community has fixed them for us and a couple of different products, which is exciting to hear. So yeah, I truly agree. I think open is one. And as Databricks investing in open source and contributing to open source, then the minute that they do has been awesome to see in our partnership. Yeah, I mean, I think for me, it's almost mandatory at this point that you have some component of open source that you're contributing back. But I also think it's even more important in AI because of transparency. Right, absolutely. And that's also an interesting kind of segue too into some of the machine learning models and communication that we have at the Rangers. So like transparency is huge. Players and coaches want to know where the state is coming from. They want to know where these predictions are coming from. So transparency in the technology and transparency in the models are really important tenants too. Well, it sounds like you have a lot of fun dealing with the data and all the kinds of data that you have. I really appreciate you coming on. And I really thank you for coming here and bringing this knowledge because I think it'll be very helpful to people see that your industry is not too unlike their industry. Just different data. Honestly, it's been absolutely my pleasure. But yeah, every single game of baseball now, you can look up so you can spot those cameras around the stadium that's doing all those tracking and then think about the player movements maybe a little bit more. Everything is data driven now and baseball is no exception. Absolutely. And maybe we'll get out of the cellar and see you guys in the playoffs or something like that. We'll see. We'll see you at Fenway. Yeah, if you're ever up there and let me know, we'll stop by for a beer. But I want to thank you, Alexander, for coming on and really helping us again. Definitely, I think it always helps you when you put things in that context to really shift how you're going to watch the game. But it's also the data behind the game. So thank you also for watching and we'll be back shortly. This is The Cube from Databricks Data and AI Summit 2023.