 Let's talk about how Netflix makes GRPC easy to serve, consume, and operate. So I'm Benjamin Padurka. My pronouns are he, him, and I work on the Java platform at Netflix. My team specializes in the experience for RPC-oriented APIs. So we deliver a consistent experience for a point-and-point communication and JVM applications, and we do this by providing a variety of RPC frameworks to our engineers. Today, we're going to talk about Netflix GRPC Java. This is our client server framework for GRPC. We also have a variety of what we call our HTTP frameworks for just doing restful ish traffic. You might be familiar with some of our OSS libraries, Netflix Concurrency, and Netflix Systrics, and then my team also owns all the application framework integrations for these frameworks. Now, I see people already taking pictures of the slides. Go for that. I also have already uploaded them to Shgedd so you can get those there. I'm going to reference a couple of other talks here, and I'll have a link slide at the end that you can take a picture of then, or again, grab it off of Shgedd and just get all the slides there. All right. Netflix GRPC Java is our most highly featured and complex RPC framework. We have over 600 applications serving 1,300 services, and we have 1,500 applications with GRPC clients. It's critical that all these integrations are easy. We call making things easy our paved road. We approach this from five different angles. First, tooling. How to engineer CIN ad hoc requests and actually author their products. That's what we have to do to get started. Security. How do we establish and enforce RPC integrity and access controls? Resilience. How do we ensure that RPC succeed or degrade gracefully? Preservability. How do we understand the behavior of the system? And finally, ergonomics. How do we make all of this and other common RPC activities easy? Throughout the presentation, I'm going to show a few code samples. This is going to be from a Spring Boot Netflix application. Spring Boot Netflix is our paid framework for Java back-end servers. To learn more, I recommend the recent presentation, how Netflix really uses Java today by Paul Baker. Before we even get to the serving GRPC, we need a proto-specification in stubs. When a Java project is initialized to our project generator and GRPC is a requested feature, we include a proto-definition module and a client module. These modules include Gradle plugins, which orchestrate the OSS-Protobuf Gradle plugin for stub generation, include additional product plugins for adding ergonomics accessors. As an example, we will generate Java optional for all of the optional fields in your proto-3 spec. Generate clients with bundled configuration and language injection bindings. Finally, we advise on breaking changes in proto-lock. Here we have a code sample. This is a Hello World RPC. We generate into all the services, help our engineers get started to see where they should be adding code. We have a couple of proto options here, where you can see we've got API documentation built into the proto-spec. Finally, if I was going to remove that RPC, we get a nice error off of proto-lock. Client ship with all the necessary configuration to connect and configure the supported features. If necessary, these are overwriteable by our dynamic configuration service. Clients are using a configuration-aware managed channel which will rebuild the interceptors and the underlying transports. Then we can automatically swap them in for new RPCs. This means that we can reconfigure your RPC without interrupting any of the call flow. We generate Java clients by default, but we also have paid support for Python and Node.js clients, though they have a reduced feature set. Finally, we will generate a server module that's going to be a Spring Boot Netflix application. It'll be completely wired up to the server stubs with examples for how to test your application. At the end of the initialization process, engineers have a GRPC server running in tests with an example service endpoint and published clients ready for consumption. In a totally about once a month, I run a lab for new engineers at Netflix. We have them start from zero and then deploy to multiple regions in prod in 90 minutes or less. Another key deployment or development activity is sending ad hoc requests. As we saw before, we have three options for API documentation. So we can annotate our RPCs, our messages, our enums, and then we can now serve this documentation generically via the existing service reflection interface. With that, we can use common tools to display this documentation. An example, we have GRP Perl here at the bottom from full story, and that just works out of the box. It will show all the other options. That's a really good fit there. Then we've also extended the GRPC UI to include this protocol information. Finally, we do also translate all the proto and doc into an open API spec, and this allows engineers to have a single interface for GRPC and RESTful APIs. Our security features are pretty boring just like they should be. Our developers and servers are issued certificates signed by our internal CA. This establishes their identity. We use these certificates to then build mutual TLS connections for our first level of security. We also propagate end-to-end identity information, so we can authenticate calls between both the immediate color of the RPC and the initiator of the call chain. Our engineers don't need to worry about setting up this SSL context. They don't need to worry about propagating identities, and they don't need to worry about integrating with authentication or authorization infrastructure. All this is handled by our RPC framework and our application framework. All an engineer needs to do if they want to add security to method, is add an annotation to it, and it'll be automatically enforced. Resiliency is probably our largest investment and it doesn't lend itself to good pictures. I've just included a couple of diagrams I use for training new team members. We start with name resolution. We're using Eureka as our preferred name resolver. This is our open source project. We've extended it with a really small adapter to bring it over as a GRPC name resolver. We also support DNS. Eureka doesn't cover all of our use cases. As an example, if we need to do some special call routing to a different region or different environment, then we'll often fall back for DNS for that. Name resolution has also been extended with support for our canaries. This allows us to do AB tests. This is how we validate changes so easily. Once we're done with name resolution, it's time for load balancing. Our default is choice of two. We're doing all the load balancing within the client itself. We're not using very many intermediate load balancers. This is completely configurable for special needs, so we can handle subsetting, you can handle additional tweaks that tries the two algorithm. We can also swap in different algorithms. Some customers prefer to run with round robin or load balancers with more sticky capabilities. This goes as far as they can actually inject custom load balancer implementations using the application framework. We bring concurrency limiters, both on our clients and servers. This is using our OSS Netflix concurrency limits library. On the client side, we primarily run fixed limiters, and this is serving primarily as a latency-based circuit breaker. If your downstream server goes latent, your concurrency is going to spike, and we can start short-circuiting all those calls. That also helps protect us from engineers who might inadvertently spin off a few thousand async requests without any other following on it. The concurrency limiters are going to throttle that really quickly and help us control that traffic. On the server side, we run concurrency numbers to help protect us from turning herds of clients and ensure that we keep a base-level traffic flowing. Server side, we're using our gradient two algorithm and AMD for other special cases. These are both adaptive limiters, so they can learn the maximum concurrency of the server within a few parameters. We have deadlines. Obviously, OSS DRPC has full deadline support in the context. We've extended this in a couple of ways. One, this is completely configurable on our clients and servers. Our engineers specify the default deadline for an application, for their service, and for their RPC endpoint. We're going to take the smallest deadline of those, and we're going to apply that. The other thing we've extended is we've actually propagating deadlines through other IPC frameworks. A deadline can start at a gRPC server and then flow through Netflix Web Client. Maybe we're going to make a graphical call, and all that call chain will execute with that same deadline. We have hedging and retries. We've had them for quite a while, so we're not using the OSS implementations. We actually have implemented them via interceptor chain. Again, this is configurable for service and for RPC. Good examples of when you might want to use retries and not hedging or hedging and not retries is if you're running batch traffic or not. Hedging is really good if you need to ensure particular call latency, batch traffic isn't going to need it. If everything's failed and we actually get a failure back on the wire, we support fallbacks in a really easy way. Our service owners ship default stackly configured responses actually right in the protospec, and then we can construct those in the client and return it to the API or to the application. It's often easier for applications to receive a fully formed answer that is degraded experience than it is for them to be directly handling the errors. Sometimes that static response isn't sufficient, so we also support dynamic fallbacks. Our engineers can just implement an interface and again, we'll load that in through the application framework, wire it up, return it through the stubs. With all these failures, we're pretty good at handling outages and maybe bumps. However, we did learn that we can't wait for an outage to occur to see how the system's going to perform. So what we do is we perform what we call failure injection testing. We can actually annotate an RPC to cause failures at any point in the call chain, so we can inject latency or failed responses. And if you wanna learn more about this, recommend the AWS re-invent talk, building confidence through chaos engineering on AWS. Observability, I was really excited to see that talk about observability during this morning's keynotes. We're doing a lot of the same stuff. We track metrics for every call for both the client and the server. This gives us really important visibility on how calls are impacted and how they might be experienced differently on different sides of the network. At an OSS level, we're using our spectator API and the specification in there for what we call our common IPC metrics. So we're tracking per attempt call latency and flight counters and message sizes, both incoming and outgoing. Our OSS concurrency limits also have metrics. This helps us understand when the concurrency limiters are engaged and more importantly, what the state is for our adaptive limiters. This helps us understand why we learned a particular limit, not an OSS, but also helpful to us. We're tracking deadlines remaining for incoming and outgoing calls. This helps us understand if a call arrived on the box and maybe there wasn't enough time to answer it. We also do analysis on this so we can identify if a service is running with inappropriate deadlines. With all these retries, we tracked what we call our top level call metrics. These are more API oriented than our PTC oriented. So we're actually tracking at the stub level how the application is experiencing the RPCs. Finally, we've instrumented our load balancers and name resolvers to give us more information on sub-channel states. This is pretty handy when a lot of connections start failing, we can actually check to see did we have a spake in sub-channel failures without needing to actually log onto the box and see anything. So again, all this is integrated with our OSS spectator library and using Atlas for our metrics database. Tracing is all OSS, it was previously internal. We've since migrated to OpenZipkin and Brave. Tracing is expensive, so we sample out the edge, but we do get these nice distributed tracing graphs. If you wanna learn more about tracing, there's the tech blog post, Edgar Solving Industries Fascist Reservability. Again, I'll share a link with that at the end of the presentation. Our logging implementation is pretty straightforward as far as we emit logs on, well, access logs so you can get a Apache style log for latency and results. That's, you know, we don't wanna do that on every single RPC for higher RPS services, but that is available. We also have MDC integration. So when you're getting logs in your logging framework, we can actually tell you what RPC was being called which generated every log message. The more interesting part is we have some automated error analysis. So you can enable this when you're getting errors back across the wire. One really handy one is you get a deadline exceeded. So our internal watchdog timer's gone off. We're gonna gather some additional information and we're going to annotate the failure with this. So we can understand where did that deadline come from? Was there actually enough time to service the call? We have several of these failure rule sets. It just helps our engineers quickly understand what's going on within the system. We bucket a large number of other comment activities into what we call ergonomics. Anything that's done frequently for an RPC, we wanna bring that into the framework so make it easy. Caching is our number one ergonomic feature. Over half our production calls are served from a cache owned by our RPC framework. Cache configuration, so cache ability, keys, eviction rules are all stored in proto options. Advanced needs are covered by allowing clients to inject custom code. So they could maybe programmatically determine if a call should be cached or not. On the server side, cache calls are antiseptic and never reach the server subs. On the client side, the calls never reach the network. We support distributed caching through our OSS EV cache. We also have on heap and off heap caching for staying on box. And the kind of neat one is we have a cache that runs per incoming RPC. So if a engineer is written, maybe a naive data fetcher and it's repeatedly making the same call, maybe fanning out traffic, we can actually catch that call and return it from the cache instead of hitting the wire multiple times. From an application framework level, we've got quite a few integrations as I mentioned, almost everything we manage in the RPC framework we allow our engineers to extend. Something I want to show off here, this is a smoke test. This is an online test for making sure that the client you're shipping is actually able to talk to the server you're also shipping. And it demonstrates how our engineers actually bring in steps. They don't need to call the managed channel builder. They don't need to add all these interceptors. They don't need to set up all these features. All they need to do is just annotate their stub and we'll wire it in for them. Because our clients ship all the configuration they need to run, our customers rarely need to actually override anything. So they just need these two lines and then they're going to get the fully featured client. Again, these are plain GRPC subs. That's intentional because if you want a different API, maybe if you're going to abstract the RPC there, just inject the stub into the other API client and you provide this better experience. The other application framework integration we provide is we've added spring admin actuators that give us a live view of that GRPC framework on every instance. This helps us identify misconfigured components or a particular bad node. There's way too many other ergonomic features to go into detail. We take Jakarta validators and map them over to bad request trailers. We catch exceptions thrown from the server and map them to detailed status codes instead of just getting an internal. We provide batching at the client layer. We expose all of the netty client and server tunables as dynamic configuration. Again, we can change your max layout header size without impacting call flows. And we support the GRPC JSON transcoder from Envoy for clients who lack GRPC support. So with all of this done, what's next? First, we're going to work on service mesh. All of this has been implemented in Java but I mentioned that we also have clients using Python and Node.js. So what we're looking to do is we're going to shift many of these features into a sidecar proxy to reduce duplication and improve our polyglot experience. We don't have enough language platform teams to pave every single language that our engineers want to use. We want to enable them to have resilient RPCs. We're also looking to invest further in API and schema management. We've gained a lot of experience recently with GraphQL Federation and we want to bring all these ones back to the RPC space. So that's how Netflix makes GRPC easy to serve, consume and operate. And I'll put this slide up. These are the talks I mentioned and I will hold for questions until people want to go for lunch. Hey, thanks for your talk. Antoine from DataDog here. I'm magic, that's a lot of features. Like how big is the team that maintains all of that? Can you give us an idea of how much effort went into developing all these things? Yeah, so the Java IPC team is currently three engineers and I'll just say this with a big and two open headcount. And where does the headcount need to be located? If people want to ask? I would recommend checking our job site for... But are they local engineers or do you also hire remote engineers? So I am not local to the barrier. I flew in from Michigan. Another one of my team members is in Arizona. We do have one engineer here there in the Los Gatos office. As far as hiring remotely for the open headcount, I recommend checking the posting on the job site. In one of your slides, you talked about batching, but I didn't really understand what you do and... Yeah, so some clients, because they've got so many RPCs, what we do is they might be iterating over a list of making multiple calls to a stub and we will actually capture the calls to the stub and then we'll build a batch request out of it. So we'll actually translate to a different RPC call that allows multiple fetches and then we'll send that off and then we'll break it back into multiple responses at the sublayer. Oh, that happens transparently. So the developer doesn't even know about this. It takes some configuration from the client because you have to write the logic for how you should batch and translate the requests. But once that is done, it gets shipped off to all of your clients inside your jar and that way the customers of the service don't actually have to worry about it. I have another question. You also talked about fallback. So you can provide a default answer if there was a connection failure. Yes. Did I get it right? So in your proto file, you can specify a default answer. Yes. How do you plummet or how do you implement it? It's a proto option. So we can read it off the RPC and then our code knows to go take that proto option. So the engineers are gonna put it in there in the JSON format and then we'll build that back into the message, construct it, and return it from the client stub. They can also inject code to do a on-box calculation of what a response could be for a fallback. Okay, and you said in many cases, instead of getting an error, getting the default answer is a better option. We found this for some of our APIs because this allows the calling application to interact directly with the stub because they can assume that they're gonna get some user response back from the stub. So it avoids having to wrap all the subs and try catch to get the status exceptions. On the developer tooling slide, one of them, you kind of had something side by side on the left-hand side, there was something that looked like YAML. What was that before this? Oh yeah, I think this one. What are we looking at on the left here? So this is the actual example of configuration that our client shipped. So when you write your GRPC service, you configure how your client should operate with the CML file, and then that gets rolled into your client jar that gets to ship it out to all the applications. So is that something Netflix specific or? Yeah, it's currently Netflix specific. So what we do is we monitor this configuration. This is statically configured when you're putting in the jar, but it's also wired into the dynamic configuration system. So we monitor that configuration for changes and then we rebuild any necessary interceptors or channels to support those changes. Okay, follow-up question. Talk to us a little bit about the time dimension of all these things. So you have this, what'd you say, 1500 services or something? I assume people don't get them right the first time. People need to change stuff over time. What's versioning look like for you? Is all this in one repo? And you're just like, we've got to change everything every time we change like a proto buff or something. How are you handling that? Do you hold to like that? We always make backwards compatible changes. There's a shit ton of if statements in our code about is this here, is that here, is that here? Yeah, talk a little bit about the time dimension. So on the wire, the proto format is extremely compatible for the backwards. And we use proto lock to help our engineers not make breaking changes on the wire from a serialization standpoint. Would you call it proto lock? Yeah, we use proto lock. It's a great old plug-in. Okay. Well, I think it's a binary that's wider into a greater plug-in. But we use that to help our engineers avoid making breaking changes on the wire. As far as compatibility, just within the RPC level. So maybe the behavior of RPC has changed. This is handled by our canary tooling just to help our engineers for not breaking anything. We do gradual workouts. If there's any spike in errors, we'll automatically roll them back. So do you have anything like a proto C plug-in that would say, hey, you're gonna change this and actually this is gonna break in production? Like we already, you know, we can tell this service that this version is deployed. That's gonna break. You can't roll that out. Or do you do everything statically? So that's what proto lock gives us. So if you were to make a breaking change in your proto, then it's gonna generate error for you at the time. Okay. Do you have all of this sitting in like one repo or many repos? We are not using mono repos. We have mostly every application sits in its own repository. So your repository is gonna have your proto definition, your server implementation, and then the modules for your client and stubs. And those are just generated off of the produce. Okay, thank you. Hey, I had a quick question. Hey, Zach, your mic stop working. Good to do the press. So you have a pretty rich feature set here. I'm just wondering technically, how does this all plum in to the GRPC system? Is it a wrapper? Is it a component within the system? Like from a high level, how do you plum in a lot of these features? Yeah, so a lot of these are implemented as either interceptors or modifications to the channel builder. What we have is we have a, what we call a channel feature or a service feature. And that's how we control ordering of all these components. So application startup, we'll discover all the channel features and service features that are on the pass path. And then we will use a visitor pattern to basically build up with the comprehensive state of your client or server will look like. So it's going to be a stack of interceptors and then either a netty channel or netty server underneath it. All that's wired in through our application frameworks. So we've got juice adapters for our legacy framework and then our new framework share all using Spring Boot. Is that covered? Yeah. Hey, we have a question over here. I have a couple of questions and I'm seeing a lot of new terms here. It's really new terms like edging, fallback, failure ingestion testing. Can you talk about this failure ingestion testing? Is it on the live environment or is it something? Yes. Like it happens something and we're retrying it in the production environment? Yes. We do a lot of testing at prod. So if you're going to go and you're going to watch maybe one piece, it's possible that your request will get fagged for being part of the failure injection test. And we will annotate your request that says this service needs to fail on the call chain and then we can then monitor if you actually successfully were able to watch your show if you were, if the resiliency features were working through that section. How do you manage the data is like use the same data what you've got it in the request or is it something that your synthetic data is applied and retested? We're tracking the failure injection rules and headers which get propagated throughout the call chain. Oh, okay. Interesting this. The second point is like you said two lines of code in your actual service itself and you said you run those on each region or how do you do that? I mean it's like before deploying it you run these tests before. So part of your system testing is going to confirm that your actual GRPC client wired up to your actual network configuration talks to the server. So I think you're talking about this slide. This slide, yes. So this is a online test that's using a local NetEase client server pair to confirm that again you might have changed security rules or something like that and this is what's confirming that you're gonna work. But it's only implemented in Java not in any of the languages. There are test structures for the other languages. Java has received most of our investment so far and I focus on Java so it's hard for me to go into too many details but we do have GRPC clients for both Python and Node. We don't run GRPC servers in those languages. Yeah, one last question. So you saw, I mean I've seen that slide saying that streaming, right? So you have streaming options also in Java site. Do you have any like in Java client being a streaming receiver, right? It's a receiver. Now how do you use it in like different engines like Apache Beam or Dataflow Engine? So how do you make a call to your streaming servers? Do you have any simple framework to make these connections and receive a lot of requests and post requests? We have a internal data streaming framework. I'm not sure if we're really standing on OSS that provides a tight integration with GRPC. Okay. It just wires up against the observers and listeners. It's a struggle going on with how to do it in using the Apache Beam. So I thought you might have some frameworks built in already. Nothing specifically comes to mind.