 Hi there, everybody. Welcome to today's event. This one at 6 p.m. Pacific, so we're running a little late, but today's session is focusing on the developer and partnering with the Linux Foundation helps Delta Lake to build a flourish in community. So, a little bit about ourselves. Oh, there we go. Hi there. My name is Denny Lee. I'm a staff developer, oh, senior staff developer. I forgot to update that for Databricks developer relations. I'm a contributor to Patti Spark, MLflow, and as well as a Delta Lake maintainer. Previous to that, I was a principal program manager at Microsoft for the Azure Cosmos DB, Project Isis Tope, which is known as HD Insight, SQL Server and Bing. Before, senior director of data science engineering at SAP Concur, and yeah, that's the most important aspects for myself. And allow Carly. Hi, everyone. My name is Carly Akerly, and I am a marketing and communication specialist at the Linux Foundation. I have been with the LF for a little over four years, and I've worked with a variety of open source projects. Currently, I work with MLflow and Delta Lake. I specialize in social media, management content creation, email and events. I'm also a graduate of the University of Dayton with a degree in marketing and business administration. Perfect. So if you have any questions for this wonderful crowd here, by all means, please ask. If you are not familiar with Delta Lake, Delta Lake is an open source storage framework that enables building a Lakehouse architecture with computer engines, including Spark, PrestoDB, Flink, Trino, and Hive, and APIs for Java, Scala, Spark, Rust, Doobie, and Python, and say that three times fast. It is the foundation of your Lakehouse. For those who are not familiar with the term, yes, I'll freely admit the fact that it's a marketing term, but the key context is that when the age of transitioning from databases to data lakes, the reality is that pros and cons to both approaches. And Lakehouses takes the best of these two worlds in terms of the manageability, simplicity, and the transactional reliability of databases with the flexibility and scalability of a data lake, and hence the concept of a Lakehouse. And so how did Delta get here before we talk about the growth of the community? Just to provide a little context. So before we even had Delta Lake in 2017, the good is that at Databricks, we had these awesome distributed Spark frameworks that were processing lots and lots of data, so that was really, really good. So we were super, super happy about that. The bad, and reality is really ugly, was that files do not equal a database. There's a lot of transactions that failed. Corruptions of the data. Data quality was a problem. There was no scheme enforcement. When you added the cloud, there was a whole new set of complexities. Loading large tablespeaking came extremely, extremely slow. So that's where the state in 2017. And during Spark and AI summit 2017, we're going, man, this got to be a better way. And so in 2018, we announced Databricks Delta. Cool project. Fully transactional storage system that preserves the best of the cloud, the flexibility of the cloud, but also provides a reliability for the data. And then we had Dominik Przenski from Apple go ahead and present on stage how his system was processing petabytes of data. So really cool. Battle tested by hundreds of customers were super happy, but we actually had to open source delta. Why? Because there's a massive output of users from the community were saying, we need to have this project and not just work on Spark, but work on other projects. So in 2019, we open source Delta Lake. We open source the protocol, we open source the features, and we ensured that every single system that went out there was battle tested. So yay. So we're all happy now, right? But we actually had to build a community. Even though we had these really cool numbers, these are the numbers that we have now. Okay. The reality is we had to build a community to ensure Delta Lake actually would be successful whether Databricks was involved or not. Okay. So right now these are the numbers from Databricks and these are the last year's numbers. Okay. So 1.7 exabytes of data processed a day, not stored, processed. Okay. 7,000 customers in production. And because of the work that Carly, myself and the rest of Delta Lake community worked together on, we were to see a 663% increase in contributor strength. That's a Linux Foundation metric that comes directly from my LFX insights, which I believe Carly will talk about a little bit later. So this is really cool. But how did we get here? And that's actually the purpose of this particular conversation today. Okay. And part of it starts with the organizations. We had lots of organizations that contribute directly to Delta Lake. This is just a small subset of some of the logos that are involved. That's cool. But how did we get here? And that's the crux of the today's conversation, which is when you build open source communities, you build them one developer at a time. That's the most crucial aspect. This idea that you can simply go ahead and say, I build an open source project, I prop into GitHub, and everything magically works well by itself is a fallacy. You actually have to build up relationships with developers one person at a time. And so we're going to talk about these key concepts. And so I'll just jump right into it. Start like I said, it's all about the developers. The reason I like calling this particular concept out is because it's crucial for everyone to understand that when you're working with users or contributors to your project, it doesn't matter if it's open source or not, they're actually taking a career bet on your project. If they're taking career bet, that means their own career status is based on the success of the project and also the success of the project to boost them. Unless you remember that basic tenant, the reality is the project and the people who work on that project will not succeed. And so that's what we got to go do. And so it goes into this idea that developers are the heart of that community. This is not just in the main open source. A lot of the practices that I've talked about today, in fact, we practiced right when it was with SQL Server, in fact, and SQL Server from Microsoft, not really known, especially at that time, for being an open source advocate. Doesn't matter, because the whole premise is that we actually cared about the individual developer. And so when now we switch back to open source, why do these developers care about contributing? Well, part of it is just things like communication skills, right? Writing skills are extremely important to begin with. And guess what? Engineers, how do they communicate with each other? They write, okay? If they don't know how to write, guess what? They can't actually get across their point, especially when they're across multiple time zones. And so Amazon's famous for that writing culture. And basically it's a requirement whether for remote teams to be extremely effective. And so that's why we call that out. You actually have to ensure your communication skills are strong. You don't have to be the most verbose. You don't have to be the most salesy. But you have to ensure that you know how to communicate. Maybe you don't want to speak. That's fine. But you have to know how to write. All right. Another important aspect is the collaboration skills. We actually are working together with other people from across multiple time zones. It's hard when people are in different time zones to work together. Really, really hard, in fact. So that's why the writing skills are so important, because that way we can communicate with each other at asynchronously. In other words, somebody is actually writing something at 9 a.m. in China. Then somebody is reading it 9 a.m. Pacific. Guess what? Unless they're writing it down cleanly, they're not going to understand each other. And what's even more brutal than that, they often are working with not just different organizations, but potentially competing organizations. They're actually fighting with each other. But guess what? When it comes to these open source projects, they're still going to find ways to collaborate. If you know how to actually work together in these type of communities where technically the companies are competing, guess what? You can probably do a really good job in whatever company you're in. So another important aspect, which is very common, not just in engineering, but just in terms of business, technical stewardship and mentorship. The idea that you're going to go ahead and actually help individuals out, help them become better writers, better engineers, better communicators. Doesn't matter. The idea is that you're actively helping each other out. This is a super crucial aspect when it comes to building these type of communities and building these type of projects. Unless we're helping each other out, what happens if you have a developer who says, hey, this project is pretty cool. They go ahead and ask a question. Crickets. Nobody's answering anything. What happens when you do that? They feel like, oh, the community is not going to help me. Well, then I don't feel like learning. I don't feel like helping that project because nobody's going to help me out. So this is why technical stewardship and mentorship is so important. And then, especially in this day and age, even before the current layoffs this season that we're currently seeing with a lot of different technical companies, the reality is the most important, at least from an engineering perspective, technical resume is not what you write in LinkedIn. No, that's important. So please do that. I'm not trying to tell you not to do that. The most important one is that one right there, which is your technical resume, which is why your GitHub repo. How active are you? How much stuff are you doing? Carly's going to talk a little bit more about some of the tools that companies use to understand some of these things as well to assess. But this is just a one quick screenshot saying, how green, i.e., how active are you contributing to projects? If that thing isn't green all over the place, automatically, most companies are going to say, oh, you're not actually writing any code. Guess what? You're not an engineer. Forget it. We're done. Okay. And so this leads me into this concept of branding for the before branding for me. Okay. So this seems that I'm trying to be Shakespearean and maybe I am. But the context is when it comes to working within the realm of these developer communities, the job of the stewards of the community is to boost the other individuals in that community versus boosting themselves. That's actually their job. So for example, I'm a Delta Lake maintainer. My job is not to boost me. My job is to boost the other people who are contributing, whether they're maintainers or not, to boost them so they're better known. Guess what? Why? Why do I care about that? Remember how I started off? It's about their career prospects. They're betting their career on your project. If they're betting their career on their project, if I'm boosting them, then they're likely to have better career prospects. So guess what? That's my job. My job is to actually help them. By me boosting them, that actually helps boost me. So I still achieve my goal because I like being paid too. But the context is the job is to branding for them, not before myself. That's with the context. So like I said, I like the fact that I talk about Shingo Kawa. He's a relatively new contributor to the Delta community. I want to say this was like from four weeks ago, in fact. But guess what? He produced Delta Sharing project called Coda Zero. Pretty sweet. It's a really cool project, which is basically the Delta Sharing protocol built on Rust. Awesome stuff. So as soon as he did that, what was the first thing that Carly, myself, and the other Delta maintainers did? We boosted him. We made sure he was blogged. We made sure he was socialized. We made sure other people were aware of his project so people could go and start contributing to that project or looking at the project or downloading the project, which is important because then now he's looking at the Delta Lake community as a way to boost him, which is super important. Some of these can be long-term relations. I like pointing these guys out, Jeff Freeman and Robert Thompson, because they're buddies of mine. Simple as that. How long have I known them? I've known them since around 2000. Basically back, I think, when we were both in Bing. They've went down a completely different path that I did. I went down the spark path. They went down SQL server. We worked on analysis versus cubes back together in 2000, yet right now in 2023, they're in T-Mobile. I'm in Databricks, and guess what we're talking about? Delta Lake. Pretty sweet because that's the whole purpose of developer relations. That's the whole purpose of building communities. These are long-standing relationships. These aren't like one times. These are things in which 23 years later, we're still working together. Completely different companies doesn't matter. Or across different companies. Christina Taylor, she's awesome. I cannot overemphasize how cool of a data engineer she is. I've known her since Disney, Bread Finance, Carvana, and now Catalyst Software. She's worked across multiple companies. She's advocated for Delta Lake. She's advocated for Databricks. Guess what? She is producing tons of great content and socializing and advocating for Delta Lake and Databricks across multiple different companies because we went through the process of helping boosting her right from the beginning. So it pays in dividends when you go ahead and focusing on boosting other people in the community. And then now I'm going to switch to Carly. She can go into this session. Yeah. Awesome. Thank you so much, Denny. That was great. So as Denny mentioned, a lot of people are taking a career bet when they agree to work on an open source project. And the open source community is comprised of so many diverse individuals that need to work together. As he mentioned, across time zones, across companies, but everyone's bringing something really unique to the table. So I'd like to highlight just a few of our contributors and maintainers on Delta Lake. So we have our Tyler Croy, who actually codes live on Twitch, which is awesome. People get to learn live and ask questions when he's working on Delta Rust. He produces some awesome blog content. He's really active in the community and very involved in whether it be community office hours, D3L2 sessions. He's someone that you can really ask any questions to when he's there. We also have QP Hao, which you would be surprised because he literally built the Delta Rust protocol in two weekends. He is incredibly intelligent and brings a lot to the table on the Rust side. And he's currently a software lead at Neuralink. And he also has some really great content that we're able to boost on social email and help educate other community members on things that they may not know about within the Delta Rust project. Also, Christian Williams was scribbed, built the Kafka Delta ingest to improve Kafka Topkit topics and has presented at Data and AI Summit, educating a lot of people on Rust as well. Gerard is also excellent. And this is an example of actually a community highlight that we do. We like to feature different community members on the Delta.io website through social and all that good stuff. So usually once a month, sometimes once every two months, we post a contributor at Delta Lake and give them a lot of recognition on our website. So as you can see here, we also pull quotes of how they like to work with Delta Lake and the open source community. Also, another way that we really like to boost and collaborate is through in person events, whether those be meetups or just getting together at actual summits. It really builds a lot of camaraderie among the community members. We get to celebrate wins, milestones, and we also can share different ideas of what everyone's working on. But that seems to be an awesome way for people to really connect with who's working on the project. But when everyone's coding and developing, who is putting on the events, working on the social media. And that's a huge part of what the Linux Foundation and Delta Lake work on together is when they're working on the technical items, we're able to boost that through email and social media. So one of the tools that we use is LFX insights. As you can see here, we have two different charts pulled. One's contributor strength. The other one's total commits across all of the repositories. So GitHub contributors are, we sync all of their data to LFX insights and we're able to view the activity in one single view. We like to also let the community know what we're working on, how we're doing. So we send out a quarterly contribution report to the entire community with key updates, releases, events, and they're able to get a good glimpse of what we're working on. And to talk more about tools, Delta Lake uses a variety of their own and also from the Linux Foundation. So when we're working with virtual events, we use Bevy for email campaigns. We're using HubSpot, a great community management tool as common room. We're able to look at who's really engaging with Delta Lake on social media across all channels, whether that be Reddit or LinkedIn, Twitter. We also have Sprout Social, which is where we get most of our social media scheduling and metrics from. And speaking of social media, this is definitely a big success for Delta Lake in the last year. We ended up growing our LinkedIn to over 20,000 followers, which was a really awesome milestone. As you can see here, I pulled metrics from the last year. We're getting about 92,000 link clicks and engagements about 146,000. But you can see this chart over time, how much the ebbs and flows our audience has gone through. You'll notice some peaks around summits and conferences and some dips in holiday times. But we also do a lot on email. We distribute releases via email. We have newsletters and a lot of different event notifications. Denny has some really awesome sessions like his Ask Us Anything sessions, which we live stream to LinkedIn, Zoom and YouTube. We also have Delta Lake discussions with Denny Lee, which Denny interviews thought leaders in the space. And community members, everyone on our LinkedIn are able to ask questions live. And the Linux foundation team is monitoring and doing the event logistics on the back end. So as you can see here, this is our Bevy page with the Linux foundation. We have all of our upcoming events and all of our past events. So if anyone wants to ever check out different discussions we've had, they can always visit our Bevy. And it's also featured on our YouTube. And there's a variety of ways to join the community. So I'll pass it over to Denny. Thank you very much, Carly. So one of the reasons Carly and I together decided that we want to talk about this is that even though I started off with this concept of saying it's all about the developer, all about the developer, all about the developer. And developer relations means I'm having an engineering and engineering conversation. You'll notice that in order for us to boost it though, there's a lot of logistics, a lot of events, lots of social that has to be done in the back end. And that's what I mean by the partnership with the Linux foundation. That we're working closely with Carly so she can go ahead and figure out what are the right programs? What are the right mechanisms so we can ensure that all the work that each of the individual community members are doing are actually getting boosted are actually getting recognized that we go out of our way to recognize these users. So for example, you go to Delta IO or GitHub or Slack or Twitter. That's great. Come on board. But the reality also is that each one of these avenues we're giving opportunities. We ensure we're giving opportunities to each and every one of the Delta Lake community, not just the maintainers. I mean not just the contributors, even just users in general, giving them letting them have the ability to have a voice to tell us what's wrong with the project, not just what's right. We don't want to go ahead and just be told, Hey, everything's great. Every project have its ups and downs. Every project has its own issues. We want to make sure that they're able to voice even the problems because that means they know that they're being heard. That's how we make the project better. And in the end, the reason we're also collaborating with the Linux Foundation is not just about all the logos here, though that's really cool. It's also because we're working closely with other Linux Foundation projects. One of the things I love to call out is Finos. Traditionally, you're going, wait, you're talking about a financial organization? Yes, we're talking about a financial organization. In the end, open source is not just about being magnanimous to the community. Don't get me wrong, that's a good thing too. The reality is what are the potential commercial or enterprise opportunities when you build these open source projects? Well, working with Finos means we're working together closely with the financial organizations as well. So they're using not just Delta Lake and MLflow. Well, that's obviously a good thing from our opinion, but all the other Linux Foundation projects. These are important things to actually happen. And so it's not, it's really, it's about collaboration of developers of open source. And I know people will find it weird when I say this, but also business. And so in the end, we wanted just to end today's session just to show like, we have this really cool slide says, hey, we're the most widely used lakehouse format in the world, 11.1 million downloads, blah, blah, blah. But here's the call out. Every single month, there's a contribution of some release and tracking it has become a nightmare because we have so many different things. How do you think we track that? If we expected the engineers to do it, it will all be a gigantic cluster. That's all it would be. It's because we're working with the Linux Foundation to help us advertise, to help us track, help us get the emails out, help us socialize, so we can showcase what we're doing. Otherwise, nobody be able to understand what's going on. So in the end, the most important aspect when it comes to building these communities isn't just about saying, hey, let's go ahead and write some really great code. It's also about us collaborating from the aspect of marketing, social, media, all these other aspects that we work really closely with the Linux Foundation so we can ensure that we're boosting all that. So that's it really for today. Any questions? Go for it. Otherwise, not. I think we're good to go. Okay, I think we're good. Oh, sure. What about it? Yeah, basically each month we have multiple releases of different projects within Delta. So like for example, if you look at May of last year, that's when we released Delta one dot zero and Delta sharing zero dot one. And then June Delta rust, Python zero five bindings, July Delta rust zero four, August Delta sharing zero two, and so forth and so forth. So each month, yeah, we're releasing, then we release flank release, you know, release presto like just we kept on each the reason the downloads are going up is because we keep releasing more stuff. And the and the whole point is that it's the community that's releasing the stuff, right? It's not just like, for example, shout out to Trino. I only have numbers from last year because that's all all I kept it up to. But like literally in December of last year, they the Trino community pumped out, I think 36 or 40 different like PRs for a single release just for Delta Lake alone. Yeah. Yeah, sorry. Yeah. Yeah. No, no, no, this is a release on anybody's end. Yeah, that's that's related to Delta. Most of the stuff's actually in the Delta repo because we have multiple GitHub repos in Delta I organization. But for using the Trino example, no, that's actually in their own repo. Yeah, so and so we did want to call out like, Hey, this is one with the Trino release happened. Yeah. In fact, that's why she was talking about LFX insights helps with that. Yes, we're able to track all the releases across all the repositories and the most active contributors on every repo. Yeah. And churn rates too. So you can see, Oh, are people not staying around? Yeah, the whole game. Good. Yeah. Yes. So if you're part of the LX Foundation, obviously you can do that. The tool that Carly called out was common room as well, we're big fans of that, that can also sort of help with that. If you're not an L of a little LFX project. Really able to track not only GitHub on common room, you're able to track who's talking about you on Reddit, people that have questions about your project and give a very nice overview on all of them combined. Oh yeah, and positive negative sentiment even. Yeah. Yeah. Cool. Oh, in one thing too, I just wanted to mention that I should have mentioned earlier, but while we're ending, I, I was actually at an event for Delta Lake. And it was a really nice visual. I felt of the community because we were all celebrating the third anniversary, the third birthday of Delta Lake. And I was there getting a group photo of everyone. And when I got everyone together, people started leaving the group shot because they thought, Oh, I'm not a huge contributor to this project. I'm only done a line of code. And Michael arm breast was like, I don't care what you've contributed to this project. If you have done anything, you have made it what it is today, because we needed whether you made a social post about the project, you've spread awareness, you're part of this, you own a piece of this project. And everyone got back in the photo. And that's kind of the, I would say a really good visual of what the Delta Lake community is about is everyone has a piece of this project, whatever you're doing, whether you're writing 10,000 lines of code, or you're making social content, or you're attending events, you have an important role. And that's it.