 My name is Jay. I'm an engineer at Indeed.com. I work on our service architecture team and today I'm here to talk to you about how we're starting to adopt GRPC services And facilitate kind of newer capabilities inside of our organization I've been with Indeed for four and a half years. If you want to reach out to me, you can contact me via email Twitter, GitHub, kind of you name it By and large, I'm here to talk to you today about or tell you a story about migration in 2009 Indeed developed boxcar. It's proprietary distributed services framework in 2012 we announced the performance improvements that offered to our infrastructure and then in 2013 we did a tech talk on The protocol and some of the finer components For the purpose of today's talk, you don't actually need to know all that much about boxcar If you're interested in the finer details, you can go and check out our tech talk on YouTube And with that being said, you can kind of keep a few little details in mind First boxcar was written on top of protocol buffers. It balances connections between servers and not requests there's one ongoing request per connection and The load balancing scheme requires a fixed number of connections to be pre-established In order for things to work The scheme lies somewhere between round robin and the proxy-based model But by and large it tends to manifest itself as naive round robin Today boxcar still plays a very fairly large important role in our architecture. We have over 160 services running in production These services are very high performance. Client perceived latency is very low And it runs out of box without any additional configuration HTTP and REST have started to play more and more of a crucial role in our infrastructure And so we see about 20 or so services in production today These systems tend to have a very high latency involved when performing communication with them And configuring these services and getting them running actually takes a fair amount of work from our operations team Especially if you require TLS in your means of communication In its original implementation boxcar was implemented as a library that your team would pull into your project this pattern is often referred to as a thick client and Each web app embeds a small load balancer inside of itself. There's one load balancer per service So if you have an account management service, it gets its own dedicated load balancer If you have a candidate data service, it gets its own load balancer as well The problem with this solution is that as indeed has started to grow over the last several years We needed to start adopting new languages to kind of stay relevant With boxcar only having a few native library implementations Adding support for languages like Python and PHP became extremely difficult and in some cases impossible Also the service framework required a lot of tribal knowledge to get started And was very slow to iterate on so as we wanted to add newer features We needed to wait several quarters for those features to roll out to all of our production systems Additionally it was hard to test locally teams would have to spin up local proxies Make manual manually configured requests Things that just were very slow to get going As with any kind of iterative solution we look back that we look back over our original implementation And see what to see what improvements we could make One thing we could do is decouple the boxcar implementation with our web application This would allow us to write web apps in any language that we wanted to And the load balancing would be encapsulated by the sidecar process And so that's what we started to consider we went and developed a sidecar This is a very popular pattern starting to emerge Canonically indeed kind of refers to these as co-processes which are a little odd But even companies like Microsoft, Vox and Netflix have all talked about co-processes running in their infrastructure as well As a sidecar we're able to solve a lot of that development toil that we encountered in the library implementation While indeed needs to continue to maintain backwards and forwards compatibility on the wire protocol We're now able to control the release cycles a little bit better By going and actually manually deploying these sidecar processes This makes it so we can have the sidecar pickup required features by a certain date And push that architecture a little bit further Because our engineers are historically bad at naming things we obviously named this co-process as sidecar In its original implementation it would take HTTP11 requests, translate it into a boxcar request And then perform the boxcar request on its behalf By introducing sidecar we were able to solve some of these problems Most languages have native HTTP clients which made it really easy to communicate with Clients no longer had to worry about the specific implementation details of load balancing And languages like Python and PHP could easily get started with this system The travel knowledge requirement was reduced quite a bit By encapsulating load balancing logic inside of the sidecar Clients no longer needed to worry themselves with this and only needed to know how to construct a request to that process And because we went through and wrote so many tools to test boxcar We didn't really quite solve that problem using this solution And that's why the development toil is left unmarked Additionally one of the other things that kind of got introduced as we introduced the sidecar process was some more development toil Custom library in Python was written that encapsulated the logic for speaking with sidecar This took a protocol buffer file and co-generated a small footprint for the consuming application This was yet another thing that we needed to maintain and something that we needed to iterate on as time went on And so acknowledging that indeed has been growing and starting to adopt new languages and new technologies We really wanted to reconsider our service art texture Over the summer a few of my friends and I went through an innovation rotation where we started to reimagine some of these components An innovation rotation is just a small three month block where we're able to work on something that we find valuable to the company And then present our findings at the end of that three month block In our innovation rotation we set out to do three things One was improve restful services that indeed make it so we didn't have as much configuration make it so that way we could kind of iterate forward The second thing we wanted to do was support GRPC and HTTP2 as means of communication And the last thing we wanted to do was evaluate service mesh opportunities At the end of the summer we had tested and monitored the overhead of an HTTP2 connection And now all of our Java processes support HTTP2 as means of communication out of box The service mesh we wanted to establish criteria that we wanted inside of the solution Evaluated the various options on the market and ultimately wound up selecting one in the end And so in our considered V2 architecture we wanted something that would look a little bit more like this Obviously we need to continue to maintain box car and side car as their legacy services But ideally moving over more towards having a proper service mesh, GRPC and restful based services in the end Now as we started to work on this V2 one question I started to ask myself was How might we migrate our existing infrastructure to it? And side car posed a very interesting position in this process But before we could leverage side car we needed to make a few improvements One we needed improved performance Having multiple network connections over to a single side car process was very crafty And using a text-based protocol was a little non-optimal for the types of requests we were performing We wanted to remove the toil for adding new languages So no more of these custom libraries that people would have to go and write or maintain No more code generation, none of that And lastly we wanted to keep in mind that we wanted to treat this process as an intermediary as we migrated So going back and re-evaluating our original solution a quick optimization comes out of box of using HTTP2 By using this we now have all of our requests going over a single TCP connection They're multiplex so we don't have to worry about spawning up additional ones as needed And this makes much more efficient use of our network space The other thing it does is it facilitates the use of a binary protocol rather than the text-based one So we didn't have to go through and base 64 encode all of our requests that we made as we communicated with sidecar From there we sought to solve the client language problem After understanding how sidecar took requests and relayed them to boxcar It was really easy to go in and add GRPC support Our sidecar process is written in Go And so Go's implementation of GRPC has this nice little feature of an unknown service handler For those of you that don't know the unknown service handler is invoked whenever a service is not discovered on the target server From there we can take the request and parse out the components such as the service that we're calling as well as the method that we're invoking And then relay that accordingly to the various boxcar services that we're calling With that solution we were no longer we no longer needed to maintain our custom sidecar implemented sidecar Python library But as I started to go and get more and more client languages supported and add indeed I found adding all the dependencies to my local box to be a little crafty Each language required their own set of dependencies to be able to compile and generate source code And indeed supports five out of box Java, Go, Python, PHP and Node So I wanted to look to simplify this process a little bit and wrote a quick open source library that lets us go and generate code inside of a docker image When invoking from command line you can specify a whole bunch of different arguments Primarily the language, the source prodo directory and the target directory you're generating out into And Really? I don't trust demos so we did a video The gRPC gen docker library has a demo branch on that demo branch Is it actually playing? Indeed alpha, thord slash gRPC gen hyphen docker It says it's playing I don't believe it Alright network buffering is not cooperating So we'll go back So the nice thing is that this little script encapsulates the logic of pulling down the various docker images for you Performing all of the code generation in the background And then Copying the built artifacts out of the docker image It's not pleasant because Opened a new tab Yep, we do all of our code generation inside of a managed build system So it's able to encapsulate some of those dependencies But when you want to add new support you don't want to go out to every build server add every dependency And go from there Yeah that goes and actually stands up the docker images can do it in parallel pull all of the information out And then have all of your client libraries ready for you and available I'm going to skip that video Like I said as we wanted to evolve and iterate on sidecar We wanted to keep in mind that we wanted to move to a service mesh and later on Some of the key things to remember about a service mesh is that it helps encapsulate a lot of business logic That you don't want to encumber your applications with Things like circuit breaking is really easy to get configured in applications But as you go and start to support multiple libraries you find every circuit breaking library has their own way to configure different feature support so on and so forth So having kind of one canonical implementation of reference would be great And service meshes give you that ability In our considered v2 we wanted to have our system look something like this Web apps make htp2 requests to unboxes Linkerd instances and then those linkerd instances communicate with offbox linkerd instances ultimately targeting that service In the traditional boxcar setup our web apps connect directly to the web apps And so there's a little bit of teasing apart here that we have to do By delegating all of our load balancing logic into our sidecar process We now have smaller and dumber clients And our pattern starts to look a little bit more like that service mesh that we had targeted originally From here we can A-B test both the service mesh and our existing solution Make sure performance is on par, make sure requests are being relayed completely Certain services that are read only we can actually do dark traffic tests where we fork a request and just throw out its response Ultimately we'd want to kill off our old sidecar and boxcar based implementations And settle on the linkerd and service mesh based implementation Again revisiting some of those core concepts and benefits of using a service mesh All of those business logic and key implementation details such as circuit breaking, load balancing and service discovery are all encapsulated in one You don't have to write that flavor library for every client language that you want to support Your request path is consistent so whether you're writing an htp2 or restful based service You have knowledge of how the request was performed You don't have to know the finer working details of every service implementation out there To go and debug it you can hop right in and understand how everything is flowing Lastly this gives you the ability to centralize visibility into these request flows Things like open tracing and zipkin play very nicely with this implementation And you can see where requests fail along the way By and large this is a very easy integration Especially for developers to understand All of your communication is pointed at local host and when you're writing clients you don't have to think too hard about how things happen So where are we today? Indeed currently has gRPC support in sidecar and has three client libraries generating for that One for go, Python and Node.js We have a bridge layer that lets us continue to use boxcar generated code But perform all of our communication over gRPC This will make it easy for our boxcar services to migrate over to using this infrastructure later on They can continue to use thick client options or they can even delegate all of that information to the service mesh layer Some of the things that are still in progress is the full adoption of a service mesh We're considering solutions like MySQL, Redis, MongoDB as immediate adopters We have data teams and different development teams starting to stand up their own proxies for communicating with various Some of these various systems And some of the things that we got a little blocked on was how to handle gRPC and Java Naturally when you run your own Hadoop cluster internally The question of how do you deal with the proto3 library version comes into place There's a few solutions out there the gRPC IO forums call out to shading that library and handling it that way I did a little bit of a compatibility analysis between the proto3 and proto2 serialization And found that they were roughly compatible with one another and there weren't too many sharp edges But the kind of way that we've been looking at things lately has been using things like OSGI And shading at build time rather than doing it during the source code compilation So let's kind of recap over the slew of features that we have run through Or the slew of things that we have talked about We talked about some of the inefficiencies inside of the boxcar framework as indeed has started to grow and adopt new languages Additionally as I called out boxcar requires a fixed number of connections so as you hire on 400 new engineers You have 1600 new connections to deal with We talked a little bit about evolving sidecar to support gRPC and remove some of that toil of meeting to communicate with existing boxcar services and production While quickly adding support for new languages that we wanted to offer We talked about how we can leverage sidecar as means of migrating towards a service mesh And lastly we talked about kind of the state of the world where indeed is using gRPC today Thank you Questions It's not that it wasn't compatible We found serialization so long as you're still generating code using the proto2 compiler Things tended to play well with one another As long as you didn't wind up with that proto generated three There's some more like extensive based testing that I wanted to do but I haven't gotten around to doing it yet But we've seen pretty good success with these python and node clients that are using proto generated Our proto3 generated code And interrupting with our existing boxcar services that are using proto2 Now when you go into the Hadoop landscape you're kind of forced into the bootloader that they have there where there's proto2 at the bootloader layer Versus proto3 at your application level layer And so we wanted to make sure that when invoking certain aspects of gRPC We preserved the proto3 invocation path and so one of our teams have started using OSGI in that case Where anytime a client invokes code within your service package layer We make sure we invoke gRPC and proto3 properly And then shading at runtime is also another possibility Or not at runtime at build time is another possibility When we were doing When we were doing HTTP2 based testing there was very little overhead especially for on box HTTP2 calls Sub millisecond effectively I had to go down to the nanosecond layer to get any kind of rough timing around that Obviously with the initial connection establishment there's a lot more overhead but ideally your application isn't issuing requests in the critical path on that initial connection establishment We have background dependencies that are constantly pinging these services and keeping these channels open for us That way we can ensure that when a request comes in we can have that pre-established connection for us What drove the need for having to support multiple languages? Correct So our labs team is by and large an incubator They're making new projects on a day to day basis They use Python by and large and the reason for that is speed of delivery They're able to prototype an idea and get it out in a matter of time Or in a matter of much smaller time than if they were to use Java Now they need to get access to existing production data so rather going and rolling their own system they wanted to talk to boxcard So going and using something like GRPC to communicate with those existing services proved to be promising because GRPC offers code generation across a larger variety of languages We don't need to worry too much about adding our own support for it and then trying to build that into our timing A lot of this would just come kind of naturally and we can offer it quickly I think I got the Node.js implementation working in a day or so So not really an overhead there So one more yes We did not have to migrate any restful implementations over to using protobufs We have a few HTTP APIs that are using protocol buffers as their means of transport but they also support things like JSON and XML The spring framework if you're using Java has an actual message converter built into it for converting requests based on either protobuf, JSON or XML And so you can use protobuf to describe your request response models But then add that converter both on the client layer and on the server layer to use protocol buffers as your communication But also then let testers go and make natural JSON based requests We use that for our navigation system Sorry the demo didn't work, thanks y'all for coming