 This is only for the recording. There's no microphone for them. I will have to wait for the next day. So please welcome our next speaker, Soro. Soro will be talking about building state-of-the-art API gateways. So, hello everyone, am I audible? So, the talk I am focusing on is how we can build out API gateways for micro-services that can be scaled out pretty easily. So, who am I? So, I am a student also working with a startup right now to migrate their monolithic architecture to our micro-services-based architecture. So, what we are facing here is we have a number of services that we are trying to build out and if we go for building them and integrating them to the devices, we would have to make like 200 or 300 different API calls, one particular service in each and every device we support. So, yeah, that's pretty time-consuming as well as time-wasting because there's complete redundancy. Okay, I am calling one service in my Android device, now I have to recall it in my iPhone also. No, that's not what we are going to do. So, before starting with the API gateways, I will like the things which apply to the scaling of API gateways also applies to micro-services and vice versa. So, yeah, what exactly are micro-services in this context? Okay, you have an application you are going to deliver or maybe a site, e-commerce site and you create some services. One service specializing for one particular task. So, as in I can say for e-commerce site, maybe Amazon, you have one service delivering your product page, one delivering the comments, one delivering the price and one managing the cart for the consumer. So, you have some services, right? So, this is what we are doing. Micro-services, different services communicating with each other over network, building a cohesive approach. So that, okay, one service doesn't know how another service works internally. The two services are communicating with each other over an network protocol through some defined APS standards. So, this is a pretty simple, you can say, micro-service architecture. There are one client, three services, each having their own database. So, why? Because we do not want to share the internal schema. We want to build a cohesive approach and we want to maintain some opacity between two different teams so that no, one team won't know about the internals of another service. They just call it through the APIs, exposed well service. So, we are building a cohesive approach. We are building different services which one service specializes in one task. So, it sounds good at least in the ideal conditions when we are able to build completely cohesive services. So, where do we find the problem? That's pretty good, okay, we got the services. So, now imagine if there were three services, that was pretty easy to scale out. Okay, you just have to make three or four API calls from each device, that can be done. What if the number of services were 20, 200, 2000? Okay, don't just go for that 20, it's a very arbitrary number I went for and different devices. Each and every device is different. There are hardware limitations, there are software limitations and there are limitations from the support what they provide. Okay, my mobile phone has some battery but I cannot make like 400 requests from my mobile. It will just kill the battery off in a few hours. So, and then another problem is when we are going to support a new device. Okay, different device came into the market, we want to target that, the user base is quite large. So, now we have to write 400 API calls for that device also. So, that's not feasible. Imagine one service crashes or you want to change one particular API call, how it behaves. So, you will be changing the API calling each and every single device you support. So, this was the one, multiply that with the number of services you have. So, that was regarding the devices. Another thing is the support for the protocols. On one, mostly the mobile devices you can say awesome devices, they are pretty good in supporting the HTTP side calls. HTTP protocol is pretty good, supported. But what about some other protocols like AMQP for message queuing, message posting over the message buses. So, that can be a bit challenging if we have to support any number of different protocols. Like, okay, there's one service of mine which works over HTTP. Another is directly communicating through AMQP to the message bus. So, now that's an issue. Either we are writing some middleware to translate that AMQP calls to HTTP from the device to the server. So, that can be done, but yes, it will be challenging if it goes for more and more number of services. Okay, one day we had a service, next day we want to retire it and get some new service in place. So, we are going to rewrite the whole middleware to do that. And this is going to take toll on the battery life of mobile devices at least. The devices which are powered by very less battery power. So, API upgrades. So, what we have here is, okay, we have made some major changes into our application going. And now we have to support to our devices also. We're writing each and every single API endpoint on each and every device we support. And probably introducing a lot of unhappy users and a completely filled up bug zillas. Security. So, when you are exposing your services directly, you are also giving someone a direct exposure to where the service is located right now. On which port it is running. So, you are creating a base, a large base where attacks can happen. So, security is one issue of DDoS. Each and every service is being flooded by the number of requests far more than it can able to handle. So, I'll simply recall what problems we are facing. We are facing that it's difficult to support new devices. Maybe a protocol support. API upgrades can be challenging or maintaining security. So, now what solution can we approach for? What can we do to get rid of this? And one solution is API gateways we have. So, what exactly is an API gateway? So, what we did now was we introduced a middleware which will talk to the client. And whatever request is received, it translates it to the different microservices we have in our application. So, maybe a client sends the request. A API gateway provides a single endpoint through which all the data is fetched from different microservices and a response is provided back to the client. So, what API gateway is going to offer us? Why not go for microservices directly and why an API gateway? So, we get a single entry point. Now, we are not going to deal with each and every single microservices of ours. And we are directly providing, okay, we are going to show a product page. So, that's it. Slash our host slash product. Now, that API gateway will be responsible for talking to each and every microservice, getting the data from that and returning it back to the client. So, yeah, we offloaded the request from the client to the server. Another can be filled up, filtering for the device types. On the desktop, you can be pretty much comfortable displaying each and everything on a product page. Like user comments can be there, the product details, complete descriptions, whole endpoints. So, but on a mobile device, there is a restriction of space what we can show to the user in a one go. So, we just might be displaying the product description as the number of comments, user comments or rating maybe. So, that translates to maybe four different microservice calls on the desktop and two calls on the mobile. So, through the use of API gateways, what we are getting is we are detecting the device and we are making it filter the request. Okay, I received a request as a API gateway from a desktop client. So, I will call four microservices, determining the product, comments, ratings and a card service. And back to the client. Okay, now I got a request from a mobile. So, I will be delivering just the product page as well as the rating, average rating that the product got. Mechanism for API rate limiting. Okay, that's good if we can support unlimited number of people at the same time if we had some infinite number of resources on our servers. But yeah, that's pretty much impossible at least for the time being. So, what we get is we can limit the number of API calls a particular device can make or in case our services experiencing a heavy traffic, we can actually limit the number of API calls that comes from the client. So, a possible use case here can be limiting the number of calls in a DDoS attack. Because you are just getting an n number of request queries which are not of any use. Why just make our services hang for that? Why do they need to bear that load which is not going to be useful? And security mechanisms. So, when we are implementing an API gateway, we are practically hiding where our microservices are running. Now the client doesn't know where this particular microservice will be running on the server. It can be anywhere, right now it might crash on one server and get up on another one. And the API gateway is responsible for translating the calls. Now the client doesn't have to worry where it runs, how it runs. So, the benefits we are getting. The first one is the increased simplicity at least on the client side that developers building applications now don't have to deal with each and every single microservice application the team has built. They just have to call a single API point through which the data will be fetched. There might be now instead of calling 500 services we are having 10 different API points and points through which the data can be taken and sent back. So, that's the simplicity also the management is now easy. If we change some microservice on our application we do not have to change the API endpoints. The client doesn't know what changed internally to the service because what the client gets is the same experience may be faster, may be slow depends on the change. So, separation between application layer and requesting parties. Exactly the thing is we separated the user, the client-side developer from knowing what we have internally in our stack. Also okay my application I updated my application and it crashed. But will the user get to know that my application is not functioning properly internally that depends on me. Now I can provide some response which probably might have been cached before of the request to the user. The user doesn't see a downtime. We see a separation because now we do not have to get the user into knowing is our service running, how to deal with that particular service in any case. Even if that service was using some protocol which was not supported actually by the device, it still functions. It is making just a simple HTTP call to the API gateway. The API gateway does each and everything after that. It translates the call to a proper service. It may call different number of services and prevention from attacks. We reduced the exposure to our internal architecture, where the services are running from where our database is currently running. And we are also limiting the number of requests that a particular device can make. Severely limiting the amount of exposure or amount of some space where a attacker can take advantage of. There might be some authentication requests for example. So if we limit that a client can just make 5 or 6 different API calls to our authentication service in an hour. So we are severely limiting that client to have some invalid authentication done. If in case the authentication codes were leaked. So getting the API gateways to scale. Now we have a middleware which will be interacting with our clients. And if it's a middleware and it's the only point where requests arrive, we need to make sure that this thing scales, this API gateway doesn't crashes. Because once it does, even the microservices won't be of any use. Because who's going to talk with them? Who's going to entertain them? So there is a few points like failing fast if a failure occurs. Then there is limiting the number of requests. Caching are strategic upgrade mechanisms and the favorite one is logging. Log each and every single thing. So failing fast. So okay, we had a number of clients. If you are facing like 100 requests per hour, that's a different thing. But when you are having like a million requests coming to you in a span of some minutes maybe or some hours, then there's a huge possibility that there might be some services which will be experiencing higher loads than the other services. Okay, let's say an e-commerce app like that of an Amazon. What we have is users are quickly querying about the products. But it is not necessary that the users are also logging in at the same time, authenticating their devices on the servers. So in that case, the product service might be experiencing some high number of requests whereas the authentication service is not. But we have a single API gateway which is going to call different services. Now how our response will be there and that will be as slow as our slowest service. If our authentication service or if our product service responds in 2 seconds and the ratings of the product are retrieved maybe in 10 milliseconds, there's no possible option. We have to deliver the response back until 2 seconds. So what to do? That's going to be a negative experience for the user. No one likes to wait even if Amazon is not loading. I am quickly like, okay, leave it for the time being. And for some services like pretty user side services, it's going to cost the company a lot if they are earning from that. So what can we do? Probably we can limit our response times. Okay, this service was taken more than our specified limit. We set like a 20 millisecond limit on the maximum response time. A particular service like this, in the example, there's a card service which is taking 60 milliseconds. Okay, we can just cut this response out from our overall response and deliver our partial response. But yes, you have to be careful for that. This can be done only if that service response was not critical. Imagine building a microservice architecture for a financial company. The deduction happened at one service. The user account was deducted with some money. And another service was going to deposit that amount into another user's account. But that didn't happen. There will be a chaos, probably a legal case. So you have to be careful when you limit that response. It actually depends if you have a service which is critical. In less critical ones, you can actually skip the response out. If that service, particular services, probably what happens in a deployment is you are running different clones of the same service. But in higher use case scenarios, all the clones are experiencing heavy traffic. And you are not able to skate that anymore. In that case, the best approach can be to cut that response out. And stop the request, terminate the request so that the resources are freed out. The advantage goes is now there is no more resource consumption for that initial request. The service is able to process the request it already had. So it can introduce some delay for like 2 seconds or 10 seconds. But at least it's better to provide a partial response in that case. But if that was critical service, it is far more advantageous to respond like, ok, we are facing heavy traffic, service unavailable, the whole response can be terminated. But it actually depends on the context of the service you are waiting. The response you are waiting? Yes. Yeah, it can, it is a good option if you can, even if it is a 8K gateway itself. Because every time you are calling some different service to check the response time, then you are probably introducing some amount of latency also for that check. So that latency will add up to your response time. So yes, that's the case. In case like the security context. So it's like the first security context can be implemented by filtering the request. There might be a service which will only be serving a gateway request. So that can be done at the gateway side that only accept a gateway request for this particular endpoint. But in case you are calling different services from your 8K gateway for one single endpoint, like I said there is a endpoint product which is calling a product service, a rating service and some more service. So in that case you can pass the request out to the services and they can filter if that request is supported by them or not. So it's not probably going to be much time if the service is able to process the request quickly. If the service is pretty heavy, then there might be some issue. But yeah, because like this type of filtering happens at the start of the request only. So I don't think so there will be any delays regarding this. So the same phase fast approach happens when a particular service you had deployed fails because of an update. There are chances that you made some mistake, everyone makes it. So in that case we can simply break that circuit and return an error code to the user. That's our fault, we are failing but at least the user will get to know something. It's still better, the user gets to know that we are not able to provide your response right now, you can try back rather than making the user wait for like next 30 seconds and still hanging him out. We are not able to generate any response. So just fail fast, reiterate, get the service online, limit the number of requests. If this thing happens in this room, okay, this will be like even I won't be able to stand here. So you do have to limit the number of requests that you're accepting. Every service, every infrastructure has some limits upon how much it can take, how much load it can take. So as a developer you have to check if a client is using your application in the requirement side as it is intended for or the client is abusing the application stack. Certainly this is not the thing that's going to work for us because if we are not able to limit our responses there is a big, big chance that in very high load scenarios like when there is some sale happening on an e-commerce site they experience a very heavy traffic and if they do not limit the number of responses they have, okay. So the whole application will crash. Each and every user in this whole world won't be able to access any service of that e-commerce site and that is something which no one wants. So next thing, cash your requests. Why make an effort to query our service when we already know the response? In heavy use scenarios there is a possibility, a huge possibility that the request was already made by some other client also. So why should we go back to our service, re-request that data, increase the load and get a response back to the user. That's going to take a lot of time. So here what we can do is to reduce the response times we can have a caching implemented directly with the API gateway. The API endpoint requests can be translated to check if the results are available in the cache. If they are, then return them from there itself. Do not go back to the service. If they are not, go to the service, take a response, put it into a cache and go back to the user. Okay, we have your results. But the cache approach is certainly going to help a lot. Reducing the request timings, response and probably making some users happy. So what we are doing right now practically, what we have implemented is we have implemented caching at the API gateway level also. And we have the caching at the service side also. The API gateway caching is particularly responsible for common requests. So what we are doing for is we have a HR company which is dealing with jobs. So probably if our user is looking for a job into engineering, the response can be pretty much same for next 10 to 20 requests. So why should we query the service in that case? We can directly respond back from the API gateway. But yes, if it was something specific to the user, like the user authenticated and they clicked, okay, I want to interview for this job. In that case what we do is we go to the service. We do not check the API gateway, we go to the service. Check there if that particular request had some cache associated with that and return back. Yeah, it's somewhat of a redundant approach. But that is actually helping us send some responses quite faster which could have taken like a second for us to respond because there are a lot of requests happening at the same time. So we have reduced it to something like a 20 to 30 millisecond scale right now. So yes, there can be, but yeah, I will say a better approach can be to implement caches at the service side because this can make some extra effort added to the thing. This is one of the things if you can implement when you have some very high load scenarios. So that was about caching. So that deployment side, the favorite thing, testing, test, test, right unit test for this function, for that function, this endpoint, then check the integration. If it works, do some more testing. If it fails, repeat, develop, test, repeat. So testing is one delaying the major version changes. It's usually okay to go from fixing your API endpoint calls like, okay, we had some service endpoint which was not calling the, which was not showing the intended behavior. We wanted to solve the get request and we are also having the post requests. That can be pretty okay. But in deprecating the old endpoints and introducing the new ones on a pretty frequent scale can be problematic, especially if you are a service provider and there are number of developers who are working on the client side. They won't be happy. Like if they have to develop, redevelop the application endpoint, you have to change the endpoints in their application each and every month, just because you are pretty fast and developing at an insane speed. So it's a better approach if you can delay your major API endpoint changes that can provide a far more stable environment in the development scenarios. Okay, blue green deployments. So yeah, this is something I will focus on also. Okay, we have made some pretty big changes now and we are not sure if that will work or won't work. It might fail also. So what we do is we provision two identical environments. One with our one environment is running our previous code and another environment we have provisioned with the updated code. So what happens now? We set up our request routers to behave that they should send the request, some request to the blue, which was continued our previous version, stable version and the others to the green production environment, which is a new version and log that. If there is some failure, now we have some time actually to detect if there is some failure at our new endpoints. Did our new API gateway failed or did it work out? So after some point of time when we are sure that yes, our gateway is stable. It is properly able to serve our clients. What we do is we switch the environments. We discard the blue one, keep the green one and repeat this in case when we make a new change. So logging, there might be a number of things that happens. There are particular requests can fail. Something can probably work beyond your expectations like a user trying to log in and getting the access to your whole database. So logging can be very useful, which service is failing at what request caused it to fail. That can be found through that and this can provide a very good approach when you are facing some security issues because you can actually find out what request was responsible for that. Who the hell was that person who did that? So yes, log your requests and okay, Microsoft services and API gateways are a pretty big world having some of the same problems that we can face in deployments. So what tools do we have to manage this? All the stuff we are doing Jenkins. So testing part, continuous integration, deployments. As soon as you release a change, check it if it works. Okay, deploy it to something like a blue-green deployment can work in this case. So you are introducing some additional steps that can be done. Then there is Kubernetes, the helpful orchestration. Okay, we are running some API gateways. We can have some multiple API gateways running and scaling them up as we get a number of requests and trees for us. So that can be helpful. Grafana, it's one of the tools we are using right now to make our metrics and logs pretty user-friendly and catching to the eyes. Like I love UI. Docker, API gateways they are and they are pretty much a different service we have and what we can do is we can run it in a completely different isolated container of its own so that it doesn't messes up with some other resource on our servers. So Docker can be used. So that was all from my side. Any questions? We have our blue and green deployments. There are some new features we release in the green one and which are not present in the blue one. So the thing here comes is what we are doing is some requests might fail but at least some will pass out, some will get the result back to the user. So we are not going to annoy each and every user who uses our servers. So that's the thing about blue green deployments. If our new features work as intended, we are shifting to the production part. We have UI testing fails. So UX testing is something which we are right now working on. And yeah, so in UX testing what we are doing is right now it's pretty much internal. Like when we release some new feature set, it's tested internally by the employees in the company and then if it seems stable, it is made available in the production environment for the live testing. It's been like seven different regions we have been using the blue green deployments. Any other questions? So that's something right now I'm not aware of actually because there was some part of this which was shifted to the quality service team and the state team. So yeah, I guess we are not using it in the testing right now. Because we are pretty much in the shifting like from a recent time. It's said to the Microsoft set, so we are not asking to deploy the AWS team part right now. Getting back to the early failure, so for example if you do a random early job, why not on the AKIH when they give no balance and you see the latency become longer, you run the job of the request. So yes, the question is why not check it out, the load bin and the society itself. Okay, so we can do that. The load balancer site can be used to check if our response is late. But in that case, there is a load balancer on which side? Behind the in front of the API gateway or in front of the service? Microsoft is here only. See if you are delivering a partial response, then if we have a load balancer in front of the API gateway, it will display an error for the whole request. But if we have a service site, we can provide some partial response back to the user. So if it's the case that you are going to deploy, you are just okay with giving an error code back to the user. Then you can check the response manager load balancer in front of the API gateway itself. But if you want that you can return a partial response that can be useful to the user, then do that checking at that API gateway maybe. Okay, of our result, the data line at time, okay, let's get rid of it. Just get the result out. So, any more questions? Getting on to that same point, what about do you establish like service level agreements in between the services and the API? So, yes, the question is about service level agreements. Yes, we have because there are some services which might depend on the other ones. In that case, even if we don't want to, we have to fail that. We have to break the circuit out of that so that at least we do not provide a bad response to the user. A half improper response is as bad as making the user hang. So yes, we do that. Right now we are actually building out a tool set which is directly able to deploy, manage all the microservice stuff, return them where they are running and rerouting the request to them. And yeah, it's right now in a pre-production stage, but yes, we are hoping to get it right soon. So, any more questions? Then I guess that's the end of it. Thank you. You were so loud as you wanted me, but you were also the loudest. Yes, this was my first day on the first talk. It was awesome. On such a platform at least, because to me it's right now. You were like a bullet. Okay, you are quite a bit. So exactly what happened was each one of the individual service teams has a contract with the consult team. The consult team provides a response. They have a hand response to that. And they will be there so there's not a return from the SLA. Actually my talk was pretty much inspired by the need of these talks itself because before this conference, I was going to each and every Microsoft service talk and get rid of 183 of it this year. And yeah, I found a lot of stuff there, which I immediately made it also. So yeah, that was it. At least at 8, there was a lot of Microsoft stuff running. So do you find it useful? Oh yeah, we found some great numbers of the standards of the Microsoft services. Yeah. It's a pain in the ass. It's a good identifier. Thank you for the interview. Okay. Maybe if you could just say anything in brawl or whatever. Or do you like it? Yeah. I don't know if it's a good idea to say anything. Yeah, I think it's a good idea. No, I'm not. I'm not. I'm not. I'm not. I'm not. But I'm not. But I'm not. But I'm not. But I'm not. I'm not. I'm not. I'm not. I'm not. I'm not. So you're still here? Yeah, I'm still here. It's good to be with you. So you're still here? I don't think so. And for the first time. No, no, no, no. So you didn't date the other... ECC, yeah, ECC. Yeah. I was supposed to be in the other room. But there are already other people. So you're in some... Oh. So if you wanted. You can stay here with me, but I can manage it myself. Just to give a pass of two people. Yeah. Okay. So stay here if you will. Okay. Okay. Nice. Thanks. Bye. Okay. Are we finished early? Yes, I am. Okay. Let's get set up here. So does HDMI work? I think so. Okay. We'll find out. Maybe I have to... Well, my laptop thinks that there's a screen there. Hmm. Do we need to... Yeah. Wait, wait, wait. You just did something. But not up there. There we go. What's that? Well, it's like... Oh, the extra for this one. Hold on, if that's it. That's better. Yeah. Okay. So let me get all the other things plugged in. Is there a mic? There's nobody. Yes, here. Okay. The static one. So just please repeat the question. Repeat the questions. Okay. Because of recording this all. I will show you the remaining time by like ten minutes before the end and so on. You know, I just realized, is this half an hour or an hour? You probably know that. Is this an hour session or half hour session? One hour session. Okay. I'm good. I can do all my demos. See what? Yes. Yes, it works. See you later. Let's do it. Yes, let's go. I hope I don't have to do it too quickly. No. Okay. Okay. Let's see. Yeah, it's pretty big. Fine. Yeah. Okay. Okay. Okay. Okay. Okay. Let's see. Yeah. It's pretty big. Fine. Yeah. Okay. I said, uh, because, you know, it's no fun without a live demo. Okay. Perfect. Yeah, yeah, yeah. I always do live demos. What do I do? Oh. Okay. Okay. Okay. I'm sorry. Yes. Go ahead. Go ahead. Okay. Okay. Okay. Okay. Okay. Okay. Great. That's enough. Yeah. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay.