 All right. Seems like we're at time. So welcome. Thanks for coming. This is, data processing is made of people. A case study in role empathic API design in Sahara. And that word empathic is really sort of the core of what I want to talk about today. Fundamentally, this is a talk about user experience and user stories in OSS infrastructure. So what are we doing here? To start out, so a little story, I started my career not as a software engineer, but as a nurse. And one of the first things that I did in nursing school was open my textbook to the first chapter. And I saw a figure that looked a little bit like this. It said, one of the most important virtues to have as a nurse is the ability to show empathy, to understand what another person is feeling or thinking. And one of the ways you can demonstrate empathy is by placing your hand on someone's shoulder. C figure 4.2. And so I've created sort of a copyright safe version. I didn't want to actually use the picture of legal ramifications and whatnot. But I mentioned this for two reasons. The first is that empathy is a really hard virtue to talk about without sort of getting very sickly, sweet, very quickly. We all know how important virtue empathy is. We all know how important it is to connect with another person, understand where they're coming from, and respond to it appropriately. But to talk about it in depth is hard to do effectively. And secondly, it's very obvious why a nurse needs empathy. That's very core to that job. But at the same time, I'm going to argue that empathy is really at the heart of what software developers do as well. And that in the end, we are only possibly doing our job as well as we are able to be empathic to our end users. And we're going to get more concrete about it, though, than figure 4.2. So moving on, the problem that we're going to be talking about fundamentally revolves around the idea that building a very clean, intuitive user experience in OSS infrastructure products in which you're trying to tie together many, many backends, many implementations of the same use case is very hard to do in a user empathic way, in a way that lets the user know where they are and what to do next. Secondly, we're going to talk about user stories, which are a traditional, agile approach to dealing with this problem. And then we're going to talk about a solution to the problem that occasionally works, did for one use case in Sahara this cycle. It's going to have a little to do with empathy, and we're going to use a Liberty feature in Sahara as an example of it. And then we're going to have a conclusion. So in case you're not terrifically familiar with Sahara, who isn't really familiar with Sahara? Luigi, get your hand down. You're ridiculous. So a few people. Cool. So Sahara fundamentally has two major portions. API v 1.0, the cluster management API, allows you to create templates, which allow you to sort of reproducibly create big data processing clusters, whether these are Hadoop, Spark, Storm, or any of the various distributions of Hadoop, Hortonworks, MapR, Cloud Era. We have plugins for all of those data processing engines, and you can spin up any of those clusters using Sahara. Then we have an Elastic Data Processing Engine, which allows you to actually create templates for your jobs once you've created them, store them in Sahara so that you can run them on data over and over again using a more limited set of configuration values. And that's a lot of what we're going to be talking about today. Today, primarily, we're going to be talking about EDP. And we're specifically going to be talking about the bit right between Create Job Template there, where you're solidifying the job that you've written, putting it into Sahara in a reproducible way, and then the step where you actually run it. That's a critical flow arrow, and we're going to be talking about why in just a few minutes. So you might want to be here. Hopefully, if you're an OpenStack ATC, you're very much the target audience, and you care about UX, you care about your user sanity, you want to write APIs that are going to make people feel at home and like they know where to go next. And you want to compare notes with me because we did something that I think is cool and that people might want to know about and might well be reproducible in other projects. If you're a Sahara customer or you're interested in potentially adopting Sahara in the future, you will learn about a new feature that will make it easier for organizations to run EDP jobs reproducibly and efficiently. Or you're just a really huge UX geek, and you really like to talk about Agile, and that's great, and you should come talk to me afterwards and we'll geek out forever and it'll be fabulous. All right, so how are we doing for time? We're doing perfectly fine. So the problem that we're addressing. So we'll first go to the concretion. Sahara allows you to create these job templates that encapsulate this reproducible big data job. They map to specific jar files, which you've written. You may have written them in Java, in Pig, in Hive, StormSpark, et cetera. They allow configuration per run. So when you write a big data job, there are a number of configuration parameters that you can bake into it to make it more usable in various circumstances. So the challenge that the Sahara team had prior to liberty was to create a configuration scheme that was broad enough to allow you to run any of these different types of jobs, which have very different ways that each of them likes to be configured, different ways that information likes to be passed to them, and to do that as simply as possible. And we succeeded, right? There was a solution, as of Kilo, that had three basic categories of configuration parameter that you could shove into a job. They were configs, which is sort of an engine that's processed by, or a dictionary that's processed by the engine. An example of that might be the number of mappers in a Hadoop job, right? That gets processed by the engine itself. There are params, which is another dict that gets processed by the job itself. So these are named arguments that you pass into the job that you have written into your job specifically to make it more reusable. And then finally, there are arcs, which are also processed by the job and take the form of a list. So these are only referenced positionally by the job itself. And as you can see, different engine, or different job types, Java MapReduce, TypePig, et cetera, can use different sets of these. There's also a complication with input and output data sources. Some of these specifically take top-level input-output data sources as arguments to the job. Some don't. It worked. And this is fundamentally what it looks like as of Kilo when you're trying to run a job. You've got a main class. Here, this is a Java job type. So this stuff up here, the main class, the Java options, these check marks, these are fundamentally UI sugar. There's nothing in the API to express these. But we built these into our UI because we know that these are things that Java jobs need. And it was nice to have them here. But if you're using the API itself, there really aren't any rails to keep you in place. And what you have is you have these little buttons down here for configuration arguments. So just a pop quiz. If you're trying to run Terogen, the basic example of a Hadoop job, who knows what to put into those fields to run Terogen? There are some people in this room who probably know. But they don't count because they work on Sahara. So this is hard. This takes a fair amount of specialized experience to know what to do. And you can figure it out. It's not the end of the world. You can look at the Sahara documentation for a while. You can look at the job that you're running. And you can figure out what to map. But it takes time. And it can be a little aggravating, even if you already know what's happening. So there are a few assumptions that we made as a team that made this a pretty good solution. And I want to be clear that these are valid assumptions. There's nothing wrong with any of these statements, really, or at least not deeply wrong. When you're wrapping multiple back ends, exposing the deep utility of those back ends, trumps ease of use, that's true. You want to enable your power users to use all the features of your back ends. Otherwise, they're going to hit a wall and they're going to go somewhere else. We're writing software for experts, fundamentally, on OpenStack. OpenStack operators are professionals. If they hit those walls, they're going to stop. So we have to give the full feature set. And finally, the last assumption, and this is a really basic software assumption that's incredibly important to keep in mind all the time. And bit us in this case is that the simplest thing that can possibly work is usually the best solution. And that's absolutely stunningly true. That'll never stop being true. But we'll get to a caveat. And this solution was a great start. Because in the end, what we're doing with Sahara and other OSS projects in OpenStack is pretty hard, right? These abstractions that we're creating, Sahara for data processing engines, Trove for databases, Block Devices in Cinder, there are a lot of implementations of these. Each of them has its own unique features. And creating a proper abstraction that can actually deal with all of these at the same time is a legitimately difficult task. Making it easy is even harder. But nonetheless, that is fundamentally our challenge. We are here to unlock the entire market through cloud infrastructure. The upside here is that if we actually do, the world gets better. We commoditize cloud infrastructure. Many different companies can get in on the game. It's better for the whole market if we can win this fight. And making it easy enough that it's widely adopted is a huge part of how we get there. So this is a real challenge. It's a hard one, but it's unambiguously a good one. The bad news is that it's not always really possible, right? In this Sahara case, we could make a pretty UI for each engine. And in fact, we did, right? We put in as much sugar for the Java job type as we could to make that easy to run. But fundamentally, we don't know what your job needs. You haven't written it yet. We can't know what parameters you're going to want to pass in. We can't know what arguments you're going to want to pass in. There's no way that we can actually create your UI for your job and make it usable. You have to provide those values. We can only build pipes for you to use. The question that we haven't asked yet, and the critical question for this talk, is who are you? And we're going to be talking about that next. We've been saying you a lot, but who you are actually matters very deeply in this case. So we're going to be talking about user stories now. I say congratulations on attending the Least Bleeding Edge Hocket Summit, because fundamentally, this concept has been around for a long time. The Agile manifesto happened a while ago. User stories have been around. But they are very seldom actually properly used. There's a place for user stories in our templates. And there's actually an OpenStack summit channel called User Stories. And it's about user experiences with OpenStack. But I've very seldom seen the term User Stories within the OpenStack community actually map to what the traditional Agile user story is. So we're going to talk about that now. So an admission, I am impressed that this talk was voted in. It was really interesting to me. And it's incredibly important. But it's not the most exciting hip thing happening right now. As I say, you should all be at some presentation about containerizing bare metal to just go. But seriously, since you're all still here and nobody's walking out the door yet, this is a user story. Sort of the Agile classic has three parts. The first is as a, and then you define the role of the user that you're talking about. I want to the feature and so that the value. So an example of that, as a big data framework user, I want to create a job template so that I can launch my job repeatedly with various configurations. Sounds pretty good, right? That's a fundamental user story. No, that was terrible. That was just an awful user story. A big data framework user is not nearly specific enough. We need to get a lot more concrete about that. Because in the end, there are a few different categories of people who are interested in using these frameworks. So you've got someone who actually needs data first off. Someone who's actually going to make decisions based on something that they can't see yet because they have too much data to actually parse through it and make the decision that they need to come to. We have a data scientist who writes algorithms and spends all their days thinking about algorithms, thinking about the best, most efficient way to process the data through. We have a job developer who takes those algorithms, actually writes code, stores those jobs in Sahara, hopefully, because Sahara is just great, and a cluster operator who actually then runs those jobs and maintains the clusters on which they run. Any number of these people in any one organization may well be the same person, right? Especially in small organizations, you are legitimately going to have job developers who operate their own clusters and data scientists who write jobs or just job developers who develop their own algorithms and don't have a data scientist backing them. That's perfectly valid. But any number of these people may be wholly separate. And in the case that they are separate people, you really need to think about who this person is and what they know. So it's important to remember that UX isn't actually one size fits all. The job developer knows how to configure these job types, knows how to write them, knows how they're configured, knows what the jobs themselves want in order to work. And at least at first, before documenting it, that job developer is the only person on Earth who knows how to configure the job that they just wrote. No one else knows. They probably, and I was this person, so I can tell you, I didn't know the paths in the cluster that my jobs were eventually going to run on. That wasn't my domain. I sort of knew Hadoop administration. I was OK at it, but only sort of came into pinch hit. And usually, as a developer, we think that our code makes sense. We wrote it that way for a reason, hopefully. And we imagine that what we've written is naturally intuitive. We're frequently wrong. As developers, I think the experience of every developer sort of speaks to that, how often what we intend is misinterpreted. But we keep going anyway. The cluster administrator, on the other hand, knows cluster administration like the back of their hand. They know how to make the cluster sing. And they're the only person on Earth who knows the minute-to-minute data flow on that cluster. And the relationship with the job developer is probably punctuated by frequent complaints that the jobs aren't documented well enough. The jobs are probably documented on some wiki somewhere, which is constantly sort of sinking into lack of maintenance. And usually, the cluster operator is right. And communication with the job developer becomes somewhat of an onerous task. So right here, there's this role transition. The job developer is going to register their binaries, create a job template. The cluster operator is going to be configuring the cluster, launching it, probably running the job. And right there, there is a change between people, a change between skill sets. And it's easier for us as developers of OpenStack who know the whole flow and think about the whole flow to imagine that, OK, everybody has our own knowledge. And everybody is going to be able to map these things through effectively. Because of that role transition, that's really not true. And if we wrote two different user stories for those two different people, we'd end up with a very different set of requirements, which would end up with a very different implementation in the end. So that's fundamentally what happened here. So let's recheck our assumptions from earlier. When wrapping multiple backends, exposing that deep utility, not letting people hit a wall of expertise. Trump sees abuse. That's still true. There's still nothing untrue about that, so that's great. We get to keep that one. We're building infrastructure, and our users are experts. That's true, but our users have different sets of expertise, which we've just unveiled. No one's going to be expert in everything. We like to imagine the superhuman user who knows development and knows cluster operation and knows the whole picture, and can bear the weight of the world on their shoulders. But that person seldom exists in real organizations or in the world itself. And it's silly to expect that. Fundamentally, the purpose of OpenStack is to make this stuff easier, not to only allow the best of the best to run it. And finally, the simplest thing that can possibly work is usually the best solution. That's true. But the question that we're asking now is, who is the solution actually simplest for? Is it OpenStack devs? Is it the tech underlying this engine? Or is it the end user itself? So what's simplest for them is the real question. So now we're going to get into what we did to actually address this problem in this use case. So what we created was an interface map or a tool that allows you to create a method signature effectively for your job. So when you register a job template, you can now, with that job template, register a number of arguments. So in this case, this is Terogen, what we took the pop quiz on earlier. It has an example class, a number of rows of data to create. Terogen happens to create massive amounts of data for load testing of Hadoop clusters. Usually you run TerraSword on it to benchmark. And Terogen itself does some nice benchmarking for you too on map only stuff. So rows, output path, it's going to need somewhere to actually write the data to. And finally, a mapper count. Maybe we want to adjust the number of mappers involved. We talked earlier about args, configs, and params. You can give them a mapping type. So these three need to be args. They're in that order. Configs, the mapper count is a config that gets shoved into MapReduceMapTasks. They each have a value type. Each can be required or not required. And you can give each one a default value, if you like. So this is now, we've taken the task of configuring the job away from the cluster operator. The cluster operator doesn't have to care as much about figuring out how the job needs to be configured. We've pushed that back to the job developer. Because the job developer, on average, is going to be doing the job template create task. And they're going to know. They're going to know exactly what their job needs. And they're going to be able to define a schema. So now, when the cluster operator does this, the job launch config, all they have to do is this. Job ID, et cetera. And then they pass in an interface. They can have rows, mapper count, output path. Because the example class had a default value, they don't have to provide it. And it provides a nice semantic view. And in the UI, it gets even more semantic, as we'll see in a few minutes. So this is fundamentally what you do to create an interface. This is horizon. So you select a value type for each argument. You can name them. You can give them a description so that the cluster operator has a good idea of what actually goes there. It can be as verbose as you like, being verbose is friendly. That way, you don't have to maintain that wiki, which always falls over. Give them a mapping type. Give them a location value type. Tell us whether it's required and provide a default value if you care to. And that's fundamentally the feature. So we are, in fact, putting a schema in our schema so that you can define a job interface while you define an interface. And that's fundamentally what this is. All we're doing is building a DDL. We're building a data definition language. It's not exciting, tech-wise, but it's fantastically powerful and useful at those transitions between roles. There didn't actually have to be any change to the engine here. That's what I like about this solution is that fundamentally, because we had already created a solution that allowed any use case to be filled, allowed the args and configs and params to pass all the way through to the job, in any case, all we had to do was sort of bolt on a translation layer on top, an additional pipe by which the job developer could provide information and the job operator could take it, use it, and have a much simpler interface and an interface which spoke much better to their knowledge of both the jobs that are running specifically and the engines that are powering them. It's really the job developer's job to know what PIG wants, what Hive wants, what Java wants. The cluster operator shouldn't care about that. They should care about HDFS. They should care about keeping the cluster running. So it does require dev effort, of course, but it also fills doc requirements. So there's good reason for the job developer to want this too. Maintaining that wiki is annoying. They need to do a lot of onerous communication and it can get problematic. Creating this allows them to define that schema once, have it be machine validated as well so that easy errors in job configuration can be catched early by Sahara and we can stop them before they hit the cluster and take up time, which is expensive. So the takeaway here is that we were right earlier. We can't know the jobs config schema. You haven't written it yet, but we can give you tools to describe it. We can give you tools to communicate. And that's fundamentally where the empathy piece comes in here. By thinking about what the job developer needs, by thinking about what the cluster operator needs, we've actually found a way to put ourselves in both of their shoes fundamentally by allowing them to put themselves in each other's. We have empathy all around. This is just a big triangle of empathy between the OpenStack Dev and these two people. So fundamentally, humans here win. So we did a pop quiz earlier about this job. Asked whether we could run it. Not many people said yes. How do we feel about this job? There's an example class. It has a default value that's pre-populated with pterogen. Data rows to generate says, if you mouse over the little question mark, this number of 100-byte rows will be generated. Gives you an output path, which defaults to an HDFS path, and has a number of mappers field. Could people run this? Matt can't. But Matt's a manager now. So that happens. All right. So I did see some head shake yes, though, so that's good. So the summary and takeaway here is know your user's roles. Look for those inter-role transitions in your flow. And this is hard. You have to either have real-world experience with organizations running your tech. If you don't, you really have to go and actively pull them and ask for that feedback and really act on it and internalize it. And then ask, OK, within the flow that I'm imagining we're going to create, does each user actually have enough information and have enough knowledge and expertise to do what they need to do? If you can, you give it directly to them. But in this case, we couldn't. But we could still, and the simplest thing to do can be to leave that dynamic. But sometimes building tools to help them communicate is really the right answer. And pure dynamism like we had before with args and configs and params, it was good. It did what we needed, but that's a last resort. Generally, you really want to build more communication tools than that. So all right, good. We're doing well for time. A few closing thoughts before we open for questions. There's this fallacy that we've already talked about of the power user and sort of the super user who knows everything. And as developers, we sometimes get sucked into this idea, OK, I'm really smart, right? Great minds think alike. Other people who are pretty smart must think like me, must understand my code and what I'm imagining things are going to do here. And therefore, whatever I write, because I'm pretty smart, anybody who's sufficiently bright should be able to understand. And this is not great. I don't think it's actually what happened with the original Sahara team. Things, there were time pressures as well. But there's a little bit of, well, this is good enough. And people who are smart enough will be able to figure it out. That's a frequent part of any development process I've ever been involved with. And it is a little bit distressingly common in OSS sometimes. There's a little bit of almost macho culture in OSS that would really be better to resolve. I've talked about the word empathy a fair amount. And I actually really like that word. I think it's important. We talk a lot in software about user focus and about customer focus. And I don't think that they're adequate. Both imagine the user itself as business resources as cogs in some machine. They're not. They're people. And speaking and creating flows that allow them to do their job well, it doesn't matter just technically. It matters morally. Fundamentally, these virtues, making complexity accessible, enabling structured, clear communication, empowering everybody who use our products, those are some of the fundamental virtues of the software engineering craft. And anybody in this room who is a software engineer, I hope takes those very seriously and thinks daily about how to do these well. Fundamentally, engineering we think of as this heavily corporatized thing sometimes, but it's a mission as well. We do good work. We do good things. We make things possible that weren't before. And that virtue of empathy is fundamental in what we do. Because especially in OpenStack, we do need to win. This infrastructure does need to be democratized. It's not OK to allow there to be a handful of technologies owned by a handful of people and not allow the infrastructure of the world to be anything more than that. And to commoditize cloud, we really do need solid UX. That's the way we're going to attract users. And what we're doing is fundamentally important. So if you're a developer in this room, this is a how to contribute talk. I know some people are here because they're interested in Sahara as well. But fundamentally, I wish to exhort you. As a closing, I have made a lot of fun of this presentation in terms of it's not the newest thing in the whole world. At the same time, I am really heartened that it was selected. I'm heartened that for how to contribute presentation, we're doing pretty well for audience. The stuff that we're talking about in terms of user stories, role awareness, human empathy, this is really important for us to succeed as OpenStack. And I'm very glad that we actually talked about it here. So thank you. If you voted for this. And thank you for coming if you didn't. And just thanks. So one very last note before I close for questions. This is how to contribute talk. It really didn't fit. And in fact, there wasn't really a track that fit at all. When I was looking at the various tracks in OpenStack Summit, there really isn't a category for developers talking with each other about lessons learned about best practices. And that's understandable. OpenStack Summit is fundamentally a marketing exercise as well. And there's nothing wrong with that. That needs to happen too to get people excited in our product because we need to win. At the same time, I've been running around Japan for like a week and a half before now. And especially coming from the US, Japan is sort of an exercise in sacred geography and sacred ground. And this is our sacred ground as developers. This is where we come together and collaborate and talk about how we're going to make more things possible in the future. So it might be cool to actually have a track for this kind of thing to happen from here on out. All right. So any questions before we close? Being new to Tokyo? Oh, I see. Sure. Tokyo and Japan in general, in the United States, everything is, we destroyed a fair amount of our history. And things are pretty new. There are few places that we consider important to ourselves as a people. But walking around the city of Tokyo and walking around Japan, in general, you get a very deep sense of how deeply the history and traditions of the Japanese people map to specific places in the world around them in a way that's wonderful and fascinating and different. Sure, we're getting a little philosophical here. But OK, we got a little philosophical. So it's turned about as fair play, right? No, that's cool. Yeah, when I say that summit is sacred ground, yes, I suppose that it's both geography and time. But this is a place that's important to our people as open-sack developers, probably as important as they get. And next time sacred ground will be in Austin and in Barcelona. But we take it with us where we go. Thank you. Yeah, and any other questions? Mm-hmm. So right now, Sahara's UI is in contrib in Horizon, right? So it's not in the core panels. It's in the contrib path. It may move into its own repository in the near future. It was in its own repository in the past. It may be there again. But for right now, it's in contrib. And those panels are all under the jobs. Everything we talked about was under the jobs panel. Yeah. Please. Sure. Yeah, that can be a problem. It's documenting the documentation feature. Yeah, absolutely. So I mean, I have some documentation patches in the Sahara EDP guide. If you take a look at them, I'd be perfectly happy to receive feedback. They're pretty explicit, I think. But yeah, they're there. Sure. Yeah, absolutely. A lot of this presentation comes down to the idea that documentation is really important, which is, we say it all the time, and nobody likes to do it. And I don't know. I kind of like writing documentation. It's very peaceful. Not all the brain has to go into it at the same time. Any other questions? I don't know yet, right? So one of the big things that I want to tackle next cycle, personally, is our image generation process, which currently is difficult. It takes a bit of a super user to do it well, or at all, sometimes. And admittedly, it's a process which often you're going to do once, and then you're going to be able to use Sahara for a very long period of time. And it's something that's put on the administrator of OpenStack fundamentally, rather than on the end user in a public cloud. So it's been deemed good enough, especially because the Sahara community itself can frequently generate images, publish them, allow them to be used, and offload that from the end user. But especially as we have more and more configuration options, we're going to need to allow users to create their own images. The matrix is just going to swell to a point at which we can't possibly support everything anybody could want. So that needs to be cleaned up. Yes. Yes. Sure. Well, I've actually talked with some of the places I worked at before Red Hat, before OpenStack, about, hey, you want to hear the good news about OpenStack. And a lot of them are adopting cloud. At the same time, they don't have the resources to take on their own cloud administration right now. The more we can reduce that bar and allow people to start running their own clouds, or at least to create enough public clouds that there's better competition, there's a better market, the better things get. And that absolutely will depend on OpenStack operator UX. The more we can push that bar down, the better we do. Period. Just absolutely, yes. And that's the beautiful thing about what happened with this specific feature is we had already created the really dynamic thing that let you do anything you wanted to. And we built on this nice translation layer on top of it. That previous layer still exists. And you can still use it. So we kind of get the best of both worlds, right? We get the deep config. We get the user experience with a nice pretty UI for the operator, and all's well. So yeah, I think both concerns are valid. You still have to allow the full feature set to stand. Are we out of time? I think we are. So all right, thank you all for coming. I've enjoyed talking. I hope you've enjoyed the presentation. So thank you.