 All right. Well, thank you for joining us today and everybody online. So I'm going to be talking about agile, independent service development with Bosky and Morphear. My name is Mark Merrin. I'm from Microsoft Research and this is a project I've been working on with Stephen Goldbaum from Morgan Stanley. He wasn't able to make it today with some travel restrictions, so I'm going to be doing most of the presenting. So let me just get started and say, well, why are people interested in microservices? Well, they seem like they should be just these super easy things. So when it comes to you, they need some functionality implemented. They say, we're going to give you a little data. We want you to take this data, make some other calls to other services, do some computation, give us back an answer, and this should be just some easy code to write. I can put it on the Cloud where it runs. I don't need to worry about server management or any of this other stuff. And it's just a super easy thing to set up, give to them, and then never have to worry about again. But unfortunately, I think as most of us know, reality isn't quite that simple. At the end of the day, we now have this thing. We've got to pick some frameworks. We've got to document it, version it, make sure dependencies are up to date. We have to pick a logging framework. In many industries like finance, there's a big emphasis on auditability and recording so that we can manage all of this. Of course, we have to have telemetry and metrics, so if the service goes down, we know about it and can fix it. And what became a simple task that was really focused on achieving a business objective suddenly grows a lot of other hairs about technology management we need to do, and it becomes an ongoing cost, not just a simple one-off expense. So what did we want to do about this? Well, we really came at the viewpoint that I think a lot of programmers are familiar with. Well, we want to automate away as much stuff as we can and just allow us to focus on the important and interesting part of the problem. So there's sort of two parts to this that we're going to talk about today. The first one is Boski, which is focused on how can we do a better job of producing meaningful, easy to use, easy to get right APIs, and how can we automate a lot of the parsing and validation stuff that goes on. And we also want to show some interesting things, I think, or I think are really exciting, on using automated reasoning to deal with versioning and quality assurance when building these kind of cloud services. The second project I'm going to talk about is Morphere, which is used to automate the deployment and management of these things, right? So it wants to take all of these choices that are very important for running the service, but not important to the logic and allow you to just specify them as configurations and it'll handle all of that generation and deployment and stuff for you and allow you to just focus on the actual business value and working with your customers to make sure that they're getting what they need out of your software. So let me start off, we'll start talking about service specification and validation with Boski. So I want to use kind of an example here, and I went to the National Weather Service and I looked at some of their APIs for their restful services, and they kind of have them specified in this swagger format where they'll give you an example of what the return value for a request should be. So in this case, if you ask for the forecast for a city, it has some fields low and high that it shows are numbers, a wind speed, wind direction, some sort of forecast detail and a little bit of a percent. And from this, we kind of got to guess what each of these are like. The low and high, they should probably be degrees, Fahrenheit, and we can kind of figure some of this out. We might go today and use something like OpenAPI or some of these nicer IDL frameworks or even TypeScript and come up with sort of a signature or the shape of this API that looks somewhat like this, right? We now have a type called forecast. It has a low and a high that are numbers. We can use the neat sort of discriminated string union feature to make the wind direction one of the cardinal directions and not allow arbitrary strings there. And then we'll say, okay, there's some function called get forecast that takes a city, returns a forecast and we might annotate it as a service so we can auto-generate like restful bindings for it. So sort of starting from here, let's boskyify this, let's make this a really rich description of what the service actually does and what it gives us. So the first thing we'll do is, I mean, we can do this in TypeScript and a lot of other languages too, but we'll make the wind direction an enumeration which is a compass direction. So that just sort of cleans that up a little bit. We'll also change these numbers, which could be integers, floating points or whatever else in JavaScript TypeScript. We'll actually discriminate them and we'll now have an int and a nat value which is kind of nice. It's gonna be a non-negative integer. Since the wind speed is always gonna be in the direction of the wind, we don't want a negative wind speed. That's a little weird with this API. So we can actually specify that. We can do a little bit better because bosky supports type numbers. So we can say that the temp isn't actually just an int. It's an int that represents degrees in Fahrenheit and I don't wanna accidentally assign a miles per hour value to a Fahrenheit temperature. We can go ahead and clean this up a little bit as well. Now, percent is kind of interesting because it's a natural number, but it also has the semantic constraint that it should be less than 100, right? So we can actually annotate these data invariants on our type number declarations in bosky as well. And now if we go and generate bindings for this, we'll make sure that whenever you have a percentage value, we've validated this constraint before we pass it to you so you don't have to worry about strange results from this sort of thing popping up. We can also do this kind of thing. We can have type strings as well. So our short forecast in this description is a string. And so who knows what the string could be. It could be thunderstorms, it could be bananas, there's no other value here, but we can actually constrain this or refine this type and say, here's a reg X. It describes the structure of the data in this string and we'll say it's a string of forecast detail and that can be a reg X of showers or thunderstorms or snow or fog and one or more of these in that short forecast description. So this is now actually a really comprehensive and understandable description of what you're gonna get back in a forecast data type, what that data means and what the legitimate values for that they are. So we can go, okay, well, now we might also wanna have some preconditions, right? We might wanna be able to say that the city should be a non-empty string and just have that as a precondition of how you're supposed to use this service and again, we can auto-generate the validation logic for that so you don't have to and accidentally forget it. We might also want some post-conditions. So we might wanna say that whatever value you get back, when you get back a forecast here, that the low temp is always gonna be less than the high temp and the min wind speed is always gonna be less than the max wind speed. And we can even do a little bit better. I mean, these insurers' clauses or the post-conditions really are data invariance on the forecast. So let's just actually say the forecast is a object with some structure on it and whenever you see one of these, you can always be sure that the low temp is less than the high temp. And so this is actually now a really sort of nice way to describe the API, the data you put in, the data you put out and the constraints it has to satisfy. Now what's kinda interesting here is these constraints are not just types or simple structures, there's actually executable code. And so this means that if you go and now implement the code behind to support this interface, you can say, well, I want to actually somehow validate these executable conditions are satisfied by my implementation. So for instance, I could run these conditions on the live data and if I violate one of them, flag it up to my APM monitoring tool, if I wanna make a change to my implementation and I happen to have recorded a bunch of data, I can actually replay this historical data with the new change and see if it violates the post condition and then I know I have a bug in my change without having to roll this out to users. It also makes it a lot easier to start doing aggressive fuzz testing because no longer is the fuzzer having to say, here's an input as a string and I have to start just guessing random characters. For example, on that short forecast it has a regular expression and it knows that short forecast has to conform to that regular expression and can be much more effective at generating inputs that get deep into the logic of your application rather than just continuing to fail on the validation layer. Now you can use all sorts of fuzzers, random, genetic, coverage guided, it's very nice there. But the problem is that works great if your service is just isolated. But most of these services we implement takes a request, does some computation, makes a call out to another service, comes back, does some more computation. And so now to do good fuzzing I also need to have implementations of these other services and the sort of us. Obvious way to do this is I'm gonna take a build a test deployment which is gonna set up every service that my service depends on and transitively the services that they depend on and I'm gonna try and set it up in some constrained environment with some dummy data and I'm gonna try and make sure that all the requests that go through this system happen deterministically so I don't have lots of flaky tests and I'm gonna try and now I have to set up a deployment strategy for this and it just is like managing a whole second service just to test the service that I'm trying to deploy. And so I think in practice most people will not do this, they'll say we're just gonna write unit tests for our simple bits of logic and we'll just roll out to a flighting ring and if we see our APM light up with a bunch of errors then we'll roll back real quick and we know there was a bug but obviously that's less than ideal. Now Boski is interesting because we have all these executable constraints and Boski is unique language in the sense that the code is convertible it's not just executable it maps very closely to first order logic. So we can take this code and we can convert it to logical representation of what that code means and we can then pass this to a solver that will find a solution that satisfies the constraints of that code. So what this means is that instead of say hand implementing a mock for what that weather service does I can actually take that code, take that type description, convert it to logic and ask the solver, hey, suppose my unit tests generated these inputs, can you generate me a response that matches the specification of that service? Let me give an example here you can see we've got the code that I just showed and there's no implementation, there's no body but we're gonna go ahead and execute this so I'll give it some string, I'll give it albacurky and you can see that it's actually produced a forecast object by solving those constraints. You know, it has snow, the max wind speed is greater than the min wind speed. You can see I also do it for Phoenix and it shows that the min temperature is gonna be zero and the low temp is gonna be zero and the high temp is gonna be one. So this is kinda interesting, you can see the solver has no idea about whether or anything what it's doing, it's just finding values that satisfy the specifications. So this is really nice because now for two reasons one is the mock is now entirely self-contained. I don't need to set up any other service or have any other server running or hand write any special data. The solver will just generate the data it needs to satisfy the request that it sees coming in as part of the unit test. The other thing that's very nice is since it does in fact generate unusual values or it doesn't know what a usual value is it does a very nice job of making sure that you don't actually end up depending on some quirk of a specific implementation because it'll generate any value that satisfies the implementation not just the specific one that you happen to be testing against or a specific one that happened to be coded in your mock implementation. So we thought this was really cool and a really nice way to start thinking about specifying these APIs and specifying them in a way that you could work with them as part of larger collaborating software systems without having to deal with a lot of the overhead of management of deployment and mock generation and all this stuff. So the next problem that we started thinking about when we looked at this was okay, this is great. I can define a service. I can write an implementation for it. I have a way to test it very effectively and I find a bug and I need to change it and now I need to deal with versioning, right? So let's talk a little bit about our vision for how versioning should work or how we'd like to make versioning a much more automated and predictable process. So we're gonna talk about what we call MBEF versioning and it's kind of based on semantic versioning which is, I mean, a great concept but it's a little challenging because it relies almost entirely on convention and human judgment, right? There's not a formal definition of what it means to be, let's say, some very minor change or a patch. We just say, oh, it fixed a bug. Well, it's not clear what that means, right? How can I test whether or not a semantic version number upgrade matches the change in the code? It's a little vague and hand-waving, right? So we wanna address these issues. We wanna provide a logically grounded basis for versioning definitions that are very clear. So I can tell you that if you change a version number, does the domain of valid inputs get larger or smaller, right? How does it impact the possible outputs on inputs that were valid in the previous version? We also wanted to use this to provide support to automate validating version changes both for upstream providers and then downstream consumers of some library. Let me talk about what we set up here. So it's not surprisingly different if you're familiar with Semvera versioning. So a major version would correspond to some arbitrary change. You're adding a bunch of new features. It's a fairly new library version. Anything can change, so you should be ready for anything. The next sort of breaking change we split out is a bug fix. And we specify this formally as one that changed, like I had a value, it produced output 01. I'm now gonna change my program so that that value might produce output 02. I'm saying 02 was actually the right value, but you may have unfortunately taken a dependency somehow on it returning the value 01. So while in theory, this kind of change to your semantic or your version number shouldn't impact the downstream client, it could if they're unlucky. So they'll need to do some validation. And we'll see how this splits out from kind of the classic bug fix of the others. So an extension version falls into a category of upgrades that you should not need to worry about testing at at all. So it says the original domain of values that were valid for your service is still valid. And the output on every one of those values is still the same output. So there's no change for existing clients. New clients might access new features, but otherwise there's no change anywhere, right? And so you could be able to upgrade one of these version numbers without any worries, no beepers at all. And then a fix would be something that previously crashed or returned to 500, right? And you now have it returning a valid value. Again, as a consumer, if the library was crashing or was just returning a 500 and now returns a good value, this shouldn't impact you at all. So you should be able to upgrade this very cleanly. So now this makes it a lot easier because now we can look at sort of the type structures and these data shapes and determine, well, what is the domain of values that they described? And if you change one of them, we can ask, does that enlarge the domain? Does it shrink the domain? Is the domain unchanged? We can look and actually do proofs on the pre and post conditions to check implication and equality relationships between them. And we can then compare implementations like using fuzzing or other things to see if we can find a difference. But the consumer can also validate the changes. They can check the data signatures. They can look at pre and post conditions. But if you wanna be a little aggressive and you wanna buy into this, say, Bosque programming ecosystem, you can actually use full program validation to check behavioral equivalence between your code with the old library version and your code with the new library version. Let me show an example of that. So let's take two clients that we might write of the forecast library we talked about. We might have one, they're both very simple, but one very simple one here is the float time. So given a forecast and a distance, it'll tell you the fastest time a balloon might be able to float that distance in the wind. Another one is a wardrobe recommender. So given a forecast, it'll tell you, oh, if it's cold or hot, you should bring a coat, you should bring shorts, you should bring both, depending on the high and low temperature. So these are both clients of our forecast API. And now people may have noticed like all good talks, I have subtly inserted a bug in my specification of what should be true on a forecast. The low temp, it should be less than the high temp usually, but it's kind of, I mean, it's possible that the two temps are actually equal. So I might say, okay, this is a mistake. This is actually gonna be a bug fix. It might impact my downstream clients. I'm gonna change my service that now it returns a low temp that is less than or equal to high temp. Now as a consumer, I wanna know, can I roll out this change safely, or do I have to worry about having a beep or go off when I'm at home eating dinner? So here we go. Let's actually run this checkup tool on our two applications and I'm gonna say that this is something we're sort of experimenting with, in a conceptual prototype. So there's a little hard-coded copying and pasting of code around in here, but the verification part is all being run and is all live and all not demo-wared up at all. So here we go. We'll run on the two of those. It's actually gonna check on that client if there's any input that can trigger a new bug. And it says for the wardrobe answer, there's no actual new bug triggered and there's no change in any of the possible executions of my code. It's actually proving that. It's generating a proof for all possible executions of this code. For the float time, it actually tried to generate a proof and it failed and it said, okay, I couldn't generate a proof. Let me go start looking for a witness input that would trigger this bug that I think might happen with this new version of the library. And you can see here that, yes, it did find a bug. In fact, when the wind speed is zero, both the max and the min, that you'll have a div by zero error when you try and compute that float time. And interestingly, this is so much like these subtle bugs where you thought, oh, it's just a small change. No one depends on this, but it just so happens that if the min wind speed has to be strictly less than the max wind speed and the min wind speed is lower bounded by zero, that means the max wind speed has to be exactly one, which means that that division is always safe just by pure accident, right? And this subtle change that shouldn't have really impacted anybody caused this bug and we were able to go and check this and find this bug without any of our actual clients having to run into this problem themselves. So let me take a minute to talk about how this actually works and what we're doing with this. So we're not the first people to try and use a solver to check a program for bugs or to convert a program into first order logic to look for bugs. There are decades and decades of work on this topic almost since people started talking about what programming languages were. But if you look at how these systems actually work today, they really hit serious limitations that prevent their use in practice. So one of them is scalability, right? Generally, if you look at a fully automated program checker, it's very limited to small programs, often restricted to programs that only manipulate scale or values or have fixed amount of memory that they allocate at the beginning of the program and only use that bit of memory, right? You can certainly verify larger pieces of software. There are a number of both academic and industrial cases where people have verified compilers, operating systems, large parts of networking and crypto stacks. But these are generally done with substantial manual effort and expertise, right? The amount of specification and proof code is usually double or more the amount of actual implementation code and requires a team of people who intricately know how the prover works and can structure the application so that it goes through. And I think both of these might be technically solvable problems, but I think the real showstopper is once you have this heavyweight machinery around verifying your code and fully describing what it does, there's sort of an ossification in the application where when someone comes and gives you a new business requirement that this bit of code is supposed to satisfy, it's very difficult to go at it quickly because you need to update lots and lots of machinery and other specs and make sure all the pieces line up. And it makes it very difficult to respond in an agile way to the changing needs of your business or your customers. So rather than try and figure out what is this amazing new technique for doing program verification, we really came back and said, well, let's take a different approach rather than trying to be really clever and be able to analyze Java, which is an amazingly hard problem and there are many really smart people working on it. We wanna go and say in these sort of cloud applications, you're not running a lot of heavyweight sophisticated code, you're oftentimes pulling data together from different sources, you're merging it somehow, you're filtering it a little bit and you're passing it back out. And this is a kind of a domain that works very well with functional style programming languages. Elm or Boski is actually kind of in that vein as well. And we'd really like to think about designing this language and the representation for it that we analyze rather than this classic version of I have a source language, I have a compilation target and the intermediate representation is designed to map closely to these constructs and support efficient compilation. We wanna build the IR to support automated reasoning so we can do this sort of checking very efficiently. And so it ends up being structured with immutable data. It's loop-free, so you use filters, you use maps, you use joins, this is stuff that works very well. And it's actually deterministic, which is very nice as an additional benefit when you're doing cloud programming that once you see inputs, you get the same output every time so you really cut down on highs and bugs which is kind of a bonus. And interestingly, despite the sounding kind of restrictive, it's pretty easy to target by a lot of languages. So Boski, why I've been showing you is one. Elm or at least a subset of it that Morpher kind of uses as his primary front-end language is another. The other thing we did is, we really took this checking as not, I want to prove that your program is perfectly correct, but we wanted to do validation and we wanted to increase your confidence in the quality of your code in whatever way we could. So a proof of correctness is great, but it's not always practical. If there's a bug, we wanna go ahead and just be able to find that witness input so you can debug it, right? That's probably just as useful as a proof of correctness for you. And I think as many of you know, usually if I find a bug and I go and file an issue in someone's GitHub repository, they're like, I would like a small proof of concept that shows, reproduces that bug and give me a small reproduction. And in fact, people generally view that most bugs have a very small input that will trigger it. So we sort of take that approach and we say, it might be difficult to try and find a bug in the entire program space with 64 bit ints and 1000 character strings and a million entries in your dictionary, but if we can basically constrain the search space and say, is there an input, let's say with eight bit integers and strings of 250 characters and maps that have five elements at most that triggers your input, let's search for that and that's gonna make our search much more effective and with high probability if a bug exists, it exists in that kind of space. So even if we can't give you a complete proof or always find a bug, we can always give you some degree of confidence that your code is correct and is doing what you expect. So kind of for each possible error that we encounter in your program, we're gonna go and follow the same sort of flow. So we're gonna try and do a proof that the error is not possible under any input to your program. We're gonna try and find a witness input that triggers that error on small inputs or large inputs. If we can't do that, we might wanna prove that that error can never happen on some subset or simplified set of inputs. So smaller integers, smaller strings, ASCII and a few bits of Unicode rather than the full Unicode character set just to simplify the problem if we need and if no witness input or proof can be found before we exhaust the time, well at least we've done a large search over this space. You can be relatively confident that if that bug is possible, it's not possible in just a simple way or with a very small input, it's gonna be a bit funny or unlikely to be seen. So here's sort of that flow where we start off with the error, we try and find that small proof that it's impossible, we look for a proof that it's impossible in the full space, we look for that small witness, we look for the large witness and then hopefully we come up with it. So let me show an example of sort of how this works on another bit of code. So we have a service that's just a simple calculator and you can pass it an enum that tells you what you wanna do with the calculator and it takes two big integer arguments but negate is a unary argument so the second big integer argument is optional and then we have these two casts from the second argument which is might be null or might be a big int to explicitly big int and these might fail if someone passes me the wrong input. So we wanna go and I'm gonna run this checker in verbose mode so you can see everything that it's trying to do. So it starts checking for each error and you can see here that for the cast it tried to prove that it was unreachable with a small input, it wasn't able to do that so it tried to generate an input that would witness that error and sure enough there it is, if you pass it add 14 and null then that'll fail. So now we go and we can say oh, the only legal way to recall this is if you require that if it's not a negate that the second argument is not null and you can see it checks everything here and it's able to check for small inputs, it proves that it's impossible for larger and larger inputs up until the four 64 bit ints, it proves it's impossible in all inputs and it proves that for all possible errors that can occur in that program as in there's several casts and there's division and that sort of stuff. So that's kind of where we're coming from the correctness side and we thought that was a lot of fun but that still doesn't solve a lot of that back end boilerplate implementation. So this is where Morpher comes in and Morpher is a FinOS project that was contributed by Morgan Stanley so it's all open source and what they have is they have a, it's a business focused intermediate representation so they have a lot of front end languages, they have domain specific languages, they have GUI tools and all of this stuff that maps into this IR and then they have the need to deploy and run this code on Scala, JavaScript, SQL, they need to instrument it with logging, they need to do APM hookups, they need to ensure there's auditability and compliance things and so rather than having to bake all of these choices in the implementation of the code from the beginning, they wanna be able to write the code as pure logic, compile it to an IR and then have all these cross cutting concerns handled at the configuration or other layer so you don't have to bake these choices in and manage them all the time at the code layer. And then this allows us to hook in things like the Bosque Checker as one of these back end targets so rather than compiling down say run in TypeScript or Scala, you'll compile it down, run it through the Bosque Checker, get your error validation and then can compile it down and actually to your executable target. So I mentioned that they're really focused on rapidly iterating with their customers and making sure that they're addressing business needs and one of the challenges there is that when you're dealing with some end user they can sort of tell you what they want, you can give them some code and then they'll run it and they'll say well it didn't produce the output that I expected and I don't know why and they're not gonna be able to go look at the code and understand quite what happened. So here's an example of a simple bit of code, it takes a wind category or a wind speed and it wants to categorize it for like a sailboat rental or your weather app or whatever and depending on how fast the wind is it wants to come up with sort of an enumeration is this a calm day, is it high winds, is it dangerous winds and so on and so the end user they might want this, they might not understand the code but they can more if you're automatically can generate a visualizer that will show you for any input what are the conditions and what was the flow that it took to get the answer that it got. So like an analyst or someone who's asking for this can look at this and say oh I understand why I got the answer I got and then can quickly get feedback to you about no, no, you, I explained myself wrong, you implemented the wrong set of logic here and quickly pinpoint where it is rather than having to have it extended back and forth. So this is another great example of by having this sort of agnostic IR layer, you can plug in this kind of visualization tool regardless of where your source language was or anything else you don't have to write this on a per language or per project basis. So here's some Elm code, I've been showing you a lot of Bosky but kind of highlighting how Morpher can go in here and for each function input it can take that and it can translate that to run the API spec, it can produce open API, we can produce Bosky, it can bind it to RESTful endpoints, other endpoint services you want. For the data types it can generate the persistence layer, it can generate all the code in JavaScript and Scala, it can bind this to the endpoints using, Dapper is the framework that I've used to deploy most stuff. Like using Dapper it can do replication, automated retry, scalability, partitioning, like all of this great stuff for you. So you don't have to sort of worry about the details and try and integrate it in your code. And just as an example of how this is going, like here's some Elm code that we're gonna compile into the Morpher IR and then we're gonna validate with the Bosky validation chain and you can see that, I guess I should hop back just a sec and there we go. Okay, so you can see this application is basically taking in a list of orders. So it takes a quantity which is an int, a price which is a float, and it's gonna basically run a map over these to multiply the price by the quantity to get the sort of total value for each order. It's gonna sum this up, it's gonna divide this by the number of orders taken. So you wanna just sort of do a quick and dirty average of the order value. Let me play this, right? And so, this is the kind of thing like I know I've written similar things many times and you're just like, okay, this is easy, I got a list, I sum it up, I divide by the number of values and you forget that particularly if you're filtering or doing something else, you might actually end up with an empty list and you might actually get a div by zero there, right? And we're able to pipe everything through and generate that witness input that says, hey, when you give this the empty list, it's obviously gonna have a div by zero error. So you should handle this correctly. So sort of in conclusion here, kind of up at the top, we have what is the development for a service-based application look like today and there's a lot of manual effort in many of the steps and we're trying to bring this developer automation there to eliminate as much of the manual effort as we can and just allow you to spend your time focused on the important parts of the problem and either bring automation to help you reason about those like sort of this BOSCII verification stuff or automation to move all of this and take care of it for you without you ever having to think about it in like the code and the build and the deployment layer as well with Morpher. So we are open source, like you can come and check out the BOSCII language repo, Morpher's on FinOS. We're not just open source as in we throw it out there and we don't take feedback. We're really excited about working with people in the community, particularly in the BOSCII language, we've got a great core bit of technology with the Prover and we're really trying to understand how to best fit it into various workflows. So we'd love feedback on, hey, that solves my problem or it doesn't solve my problem or could I use it in this way? You know, bugs, everything else and Morpher is a much more mature system. It's being used extensively and so I would definitely check that out if it sounds like you'd like to spend less time babysitting, craft in services you run and more time adding value through them. And I think that's it for me. I definitely have some time for questions for anybody who's interested, happy to chat about anything. Okay. That sounds good. Well, I'll let everybody go get an early coffee or whatever the break food is. Thanks for coming. Appreciate it.