 All right. Hey, hi guys. Nice to see all of you interested on a Monday evening. So welcome. I'm Saranya based here in Singapore and I'm part of Microsoft's distributed database team called the Azure Cosmos database. That's my Twitter handle. How about today's session really is, and what issue we have, what about 30 minutes? Give or take? Yes. All right. You just prompt me and, you know, just go ahead and cut me off if you think we'll do. All right. So I guess today is more to give you guys an overview of understanding the world of distributed computing, potential opportunities and where their relevance lies. Personally, it's an area that I've been very fascinated with just to kind of give you an example for the last five years, plus being kind of just following Leslie Lamport, who's more to be called as the father of distributed computing, and came up with multiple kinds of consensus algorithms and papers that's been coming out. A very interesting field if you guys are interested to go and look at it like your weekend time pass reads and hobbies. End of today's session, I'll probably also give you some latest pointers, YouTube videos that's been come out from US universities because of the COVID they've put all of that whole distributed computing semesters sessions up online. And I went through it myself and I'm sure you guys will enjoy it as well. Okay, so usually key takeaways are set up at the end of discussion, but I thought, Junior Dev, today's youngsters probably want to know what they're getting in for for the next 25 minutes or so. So kind of did a quick rundown of what we want to talk right now. Distributed systems are complex. It's hard work to work with it. The net net starting point of this whole conversation is don't go to choose distributed computing or distributed systems. If you can avoid it, right. So it's not a fancy or it's not something like cool to go ahead and do if you can just run things out of a single system single machine single node, single database, a single queue single message single lecture whatever be it just go and do it. Distributed systems will involve a lot of hard work that kind of come into it as you go along with it right. They're chosen basically based on a necessity to scale. As you know scaling is either scale up or scale out scale up, you know has a upper limit or a benchmark scale out and you architect well for scaling out you kind of know that you have this option to add on multiple computing power and things just work like magic. It's chosen by a necessity of scale and price. We will talk a little bit about the cap theorem. For those of you college students would probably been familiar with cap. And we'll talk a little bit about that and that distributed computing encompasses various categories right probably won't have time to talk about each of the categories in detail. But clearly top on the chart is databases, data stores. There's also computing nodes like how can I run big data crunching and then various kinds of file system storage. There are messaging systems ledgers and kind of applications as well right like even if you look at a web application hosted on 10 servers and you're creating a stateless web app that kind of maps into some sort of distributed front end so to speak. In a very simplistic format. So that's where we're heading to today as we talk. Okay, what is distributed computing. Distributed computing or distributed systems is systems running on several nodes. Nodes could be VN spare machine so on so forth. Distributed systems are connected by network. Which means that they're characterized by partial failure failures in terms of network bandwidth or bad or malicious worker on the network so on and so forth. And they're also characterized by unbounded latency, just because you don't know when you're going to get the bits or things are going on the network and just because the latency is not like a bounded latency. So, identifying our protocols to work in these distributed systems need different kinds of parameters by that what I mean is a global clock might just not work in a distributed system, because there's no concept of this. So in distributed system if you're interested we won't touch up on this too much but what you have is called vector clocks and various characteristics of how those vector clocks can create some sort of a precedence order order of events of the distributed systems have an idea of the way the systems work. So, those are the things that happened right, but simply put in simple English for you and I, a distributed system is a group of computers. They're working together. They give you a seamless feel that you're just working with single computer to an end user. They have chat state, each of these nodes or each of these different computers operate concurrently, they're fairly independently, they can fail independently, so on and so forth. So that's that's really about what distributed computing is, and why did that even emerge and what's the real use case of it. As we said, scaling horizontally gives you a lot of ease in which you can actually scale out by just adding another machine or putting some compute and storage power and things just work as magic. And that is possible only in a distributed system, because then you can actually scale potentially to infinite space of how much ever you want both computing and storage. But that's a math to it. The second bullet that you see here, heavy workloads on this graph side, you would see that there's a tipping point where you could, you do not potentially need distributed computing or you're not going to be super beneficial. The cost associated for setting up the distributed computing will outweigh the ROI or the rewards that you return. So this is quick things about how much extra capacity is needed, versus what is price per capacity unit, and only beyond a particular tipping point where you have more capacity needed is when you will see the solid line that you see here after the tipping point is much more manageable it's not exponential curve that you see potentially in a single system where you want to scale up. Right. So these are all the motivation of why distributed computing has come up with also situations around fault tolerance. Like, hey, how can I ensure I can have in the very simplistic basic thing how can I have disaster recovery, high availability baked into my core architecture as opposed to having a backup DRS and other separate service component, like those kinds of tolerance systems systems also in the network layer message packet losses, consensus concepts and those kinds of tolerance as well. Finally, latency issues are also pretty significant that can be resolved by distributed computing. So for example, the fact that it's distributed users can go ahead and access those computing endpoints that's the closest to them, because you see you can't cheat physics right you can't beat the speed of light, if you know what I mean. Speed of light is the fastest moving object in the universe that we have seen right that being said, packet transfers are nowhere close to speed of light. But how can we simulate millisecond latencies if you want a responses to our websites web applications. One way to simulate millisecond latencies is bring the computing power both compute and storage to the users, as opposed to take the users to the data or this which is what the traditional clients of a model works right so that kind of really brings down tools down to micro second latencies, those are interesting concepts as we talk about. So, for the junior dev I kind of just wrote down these pictures just before the sessions I thought it was kind of interesting and cool and get some spice, but essentially what we will do right now in the next 10 minutes is, let's go through a practical example of a distributed say a database, just to pick it as an example, and so we can understand what we are talking about in terms of distributed systems in general. So the first step one, consider a two tier architecture, front end could be a web mobile app, back end is the database system literally right one node one node single system. So step two, let's assume that the users read request or the number of users transactions but the number of users are just increasing number of reads are increasing. One thing that we can do is to go ahead and said let's replicate the backend database to multiple copies, just so that the users can now read from any copy that they want, and that's your replication strategy. I'm sure it's as simple as I'm inserting a property a one with a value a in database one. However, I can select the user where a one equal to a either from database B or C. Not, it's a no brainer for us to understand that this is something that is very much existent today talk of various no SQL databases out there right. But the biggest interesting concept here is data is replicated so that is replication. And what are the pitfalls of practical issues with replication. It's a trade off right there's a huge cap theorem trade off and that's where we're heading to after this, just bear in mind. If you guys are not familiar with cap, we'll do a quick run of that but if you do then I guess you kind of know what I'm talking about. But also, the practical issue is this is going to support multiple reads that everyone wants to read, and they can go to various databases and read. However, in the previous picture of your kind of assuming the rights are happening to the central DB one, while the reads are being propagated to all the other cells. The interesting concept is DB one then becomes a bottleneck for the limiting parameter right for which introduces the next concept in the distributed world of storage or stores is sharding or partition. So you probably want to partition your DB one here that you see on your screens into various physical shards like partition one partition two and partition three here. So then the web app now knows that you can you can hash it and you could go some sort of consistent hashing to say which partition needs to be access for what kind of partition key ranges and so on and so forth. But net net is if the web app in our previous case if the value of a one is a. So it's kind of alphabet if it's from a to K goes to be one K to P goes to be to and P to say it goes to be. This concept is called sharding. So in typical distributed computing you have that one point shards, and then you have another point of replication. So imagine in this picture, assuming that you have for replication of every data for high availability, and you kind of have three shards. Automatically, you have about 12 of nodes physical nodes, and these need to now work as if they're single machine, not to speak about global replication, meaning if this is your high availability set up say in Singapore, and you have some after Australia saying here I want the similar setup there, then this setup needs to be another replicated for global distribution in another location. And both of these now have to work as if they're a single computing unit. Very fascinating if you see how things work through these. So, before this, so this is more like just a teaser for you guys to kind of think about distributed computing, but any talk or any conversation about distributed computing isn't complete, or even not started if you don't really talk about what is this cap theorem stands for consistency by consistency I mean if you have your data replicated in two databases are those data exactly the same, or because it's going to take time or lag to replicate how much lag is that going to be say so are they inconsistent so that's your consistency. There's availability, which means is your data going to be available, or is the system going to say wait, let me complete all of the full replication before the client gets the hack back to say your transaction or your process is done. So in simple English, we're going to Facebook, and you're going in posting a comment on Facebook, the application is available for you to go and do something else, right. So that will be more like availability is prioritized. On the other hand, if it's more like, let me actually go and deploy in all the other replicas for Facebook to ensure that the data is fully replicated before the end users application is going to be available for the his next user input. The application is not available, like I'm just giving you like the basics here. So that's your consistency that's your availability, and then partitioning is sharding that we said. So the cap theorem basically says you can have two out of these three. So you can either have consistency and availability of your data, given that partition like you have to you have to trade to off. You can't have all the three right so you can have maximum two out of the three. Now what is interesting is in a distributed computing world, the whole fact that the system being distributed means partitioning is given or taken meaning you have to have a partitioned system. That's why it's a distributed system. Then your trade off eventually is between consistency of your data, this is how available is your date. For example, here we have three different regions, West US, East US, North Europe, just for the sake of conversation, right. And in each of those, you have four copies of four replicas of your storage, for example, right. And then each of the four is replicated in piece three. So this is exactly the story that I'm talking about here. Now, so the interesting piece here is let's say there is a value five in one in one region a and then at the start all of the regions have the same value five for a particular item or a storage site. Now the user comes and updates value five to value six. And of course that needs to go and get propagated to your other regions, right, and it probably propagates say to the other part of the US but fails in the Europe region, a network disruption happens. So one has six and other has five. Eventually, what does region B users read? In this case, what will those users read? They will go to go ahead and read five. While the users in region C here are going to go ahead and read six, but the actual value is six. So this is what we call as inconsistency, right. So your story is, as I said, how do you prioritize? Do I wait for all the regions to get updated and acknowledgments received back to the leader before the leader confirms that to be a very strongly consistent system that strong consistency versus on the other end of the spectrum, what you call as eventual consistency is when you kind of know eventually the system will be consistent, but any point in time, you probably will have stale data, right, so that's the other side of the spectrum. Interesting papers are there where various other consistency models are there, something called session state consistency, where every user sees monotonic reads and monotonic writes, by that what I mean is whatever I write, I will read it in the correct exactly same or that is one sort of consistency. And then there are various other closer to strong closer to eventual so on and so forth. But in this picture in this diagram, the intent of showing you is not to talk about inconsistencies while that is very interesting as well as to kind of say that the cap theorem tells us that whenever for example, a network disruption happens, then you have to trade off between how consistent you want your systems to be versus how available your systems want to be. But in reality, if the systems will just work naturally well, right, they do not need to have a network consistent, it's just the packets are taking a long time, it's just slower. So there's no network disruption, but there's network latency. So there's another interesting academic term here called pack elk. It's basically saying, given that partitioning is there for distributed computing so P is given it's taken. So your tradeoff is between a and C, which is availability and consistency. Otherwise, your tradeoff is between latency and consistency. So it's not really about availability, but it's just just your system is slow, right, so you don't want your system to be slow. So, so those are interesting tradeoffs that people have to work with and teed out with and that's a very core point of distributed computing. I took the example of database here, but this is really across all systems and all computing components. And then here's like a reuse of a slide that I have and it says Cosmos DB here, but today's intent not to talk of Cosmos DB, but I thought the slide is interesting because it kind of brings in terminologies in one picture or one word. So all of us familiar with SQL world relational data components, and they are governed by asset properties, atomicity, consistency, isolated, and then a durable durability, right. While no SQL data components like MongoDB, CouchDB, Cassandra, Cosmos DB, DynamoDB, so on and so forth, they are governed by what it's not 100% true, but what we call as no SQL components governed by base, by basically available talks focus on replication core soft state, which means the data is transient in a way, but eventually consistent right the system is consistent eventually depending on your level of your acceptable level of stillness that you want. And you can decide your stillness level to say that hey I want last 10 seconds data slay or the last 20 operations of still so that kind of interesting tradeoffs can also be brought in. What is interesting is the consistency in the SQL world really talks about consistency of a single node or a single point. However, when you talk of the distributed world the consistency is across various nodes. It's not data durability, it's not data integrity within transaction commitment but it's really about multiple nodes and consistency of the system as such. And then there is this interesting world of today's world called new SQL. If you haven't heard I will encourage you to go and take a look at it, bringing the goodness of SQL, no SQL world because no SQL world has its own plus points but how can I add certain asset characters to no SQL world so that's just more like a plug here to talk about terms for you to go and look out for later as well. Trying to be cognizant of the time as we talk and for completeness we have to bring in other concepts of distributed computing. What we saw in today's session really was about database stores because that's one area that you will see any time you go and look for distributed computing first examples that hit you on the face and that makes total sense is the heartbeat of any system data is the king kind of a thing like how can I access the data that I have in a very consistent reliable efficient or fault tolerant fashion right so that's the first piece outside of that distributed computing. In the olden world, we used to have this algorithms called map reduce, you can unlearn, you don't need to unlearn relearn that there are better architectures like the Lambda architectures, kappa architectures coming up right now. But just to kind of say that how can I compute crunch my data across multiple nodes in an efficient way. For example, in Lambda event driven architecture, we will talk about a batch layer and a speed layer of access. So batch layer is more about taking your data and your compute request in batches and you're performing that through a cold path of access speed layer is more about real time data ingestion speed trigger as events and responses being spit out. So those kinds of situations where your computing channel is now distributed. So that's your distributed computing. You would also come across distributed file systems. HDFS is a Hadoop data file system is pretty prevalent in the last decade. This decade I'm seeing IPFS more around using consensus protocols for file system coordination. So those are interesting concepts for us to see as well. Distributed messaging is another category people have kind of it's got its own niche in terms of how can I have multiple channels channel you are rising cues and then subscribers reading from their respective cues so Kafka topics and Kafka is a very prevalent in this space. Again, various other messaging cues are available. Finally distributed applications are last but not the least right even when we talk about our microservices based architectures that we see. Those are classic examples of distributed applications front end applications which are creating stateless balancing application node and the context across nodes and so on and so forth. I can speak from Microsoft side because many of the Microsoft services use service fabric which is a microservice based load balancing unit clustering. For example, I'm from Cosmos DB and they use service fabric. Our IoT teams use service fabrics and so on and so forth but there are distributed application rings available outside. One of the common example for open source folks is BitTorrent. How does that work across? You can see computing is now distributed across various front end nodes to perform various tasks. Didn't have time to go through that but that's an interesting topic as well. Last but not the least distributed ledgers across blockchain, kind blockchain based data storages. That's also coming up big time. Yeah, won't go into that as well. But if you have questions I'm happy to take maybe one or two. I've been talking for a while now. But I just wanted to leave you guys to say that there are a lot to learn here. We've just touched the tip of the iceberg. You have a couple of things to do at one level if you're interested on the technical nuances. This is your page. You have to, you can go and look at how does consensus algorithms work. How do consistencies sorted? What are various replication strategies? What are fault models? Failure tolerance acceptance limits? Then how do messages broadcast across networks? How are broadcast message events ordered? For example, in the distributed world, just because you're receiving an event doesn't mean you're delivering the event in the system. So they will receive the event. Look at the vector clock. They will ensure that the clock they have is either lower, for example, than the clock of the receiving message. So they have various, I'm just simplifying this. But in that way then they will either deliver it to say I'm opening my box. It's like getting your mail in your actual mail and you're actually removing or cutting it on reading your sync post mail and you're opening the letter and seeing it. Or you can receive a letter and say I don't want to open it until I get the previous letter in the right order. So those kinds of event ordering concepts are very interesting. And then latency considerations and trade-offs. So these are really the technical jobs and nuances. Now there are interesting business conversations that you can go and look out for as well in this distributed computing world. So those will relate to what are the main industries where you see more need than less, what could be the various scenarios and user personas which need more. Because as I said, I want to conclude the session by saying you do not want to use distributed computing when you can avoid it. If there's even a small chance of not even falling prey to this and getting the ROI and what you want to do without entering into the world of distributed computing, the good choice is to stay away from it. But today's world, huge amounts of data being assimilated, you want to be a huge number of users and real-time data access being set up and cloud, public clouds are built in distributed systems. It's a very interesting topic to keep a tap on. So happy to ask the questions. Thank you. Thanks so much, Ania. Folks, we have maybe five minutes for questions. So I think there's one from Avis. Just to get an expert opinion from outside the cult of the opinion of SAP HANA Database. Does it live up to its hype? Does it scale out well? What products would you use? Yeah, so SAP HANA is an in-memory, but at the same time they have really beefy machines. They try to bring in this concept of, without saying so, they try to bring in this concept of HEDGE TAP, which is more like hybrid transactional analytical processing systems. Share market share of various kinds of modules, CR and ERP modules that can run on HANA, make it so much useful for us to host on HANA. That way you can do crowd operations on them. At the same time you can even do batch operations on them. Very good store. Very expensive though, meaning it is more an enterprise offering. You probably know it. So yeah, good one. Thanks. I hope that answers your question, Avis. From Jason, what's curious is microservice application will be a norm in the future in words to invest in or is it just a hype at the moment of time? No, it's not a hype. But jargons can be sounded as hype, but if you open the jar and peel the onion and see what's actually in there, it's pretty simplistic and basic to understand. So if you look at the client server and then you kind of went to service oriented architecture and then you went now to microservices architecture, the jargons can be daunting. But essentially what it says is in service oriented architecture, your whole business process, business units, functionalities will be separated and isolated into separate systems of holding. In microservices architecture is the separation is more granular. By more granular, what I mean is even one single module like in your online web application, retail application, say you have product catalog systems, you have a checkout system, you have social sharing system. So you do a lot of things on your online website. So now in microservices, the intent is each of these smaller modules can be load balanced within one container nodes. And so the interesting additions is you have a health checker, you have a heart beat and revival checker, some sort of data structures at the back end to maintain the state less and stateful workers so that you're able to allocate. It's going to be there for sure. Definitely be there. The thing is, it might become easier. A lot of things might be abstracted away, you know, have APIs to get it. But yeah, it's not going away. Maybe, maybe, Michael, that's a good topic for us to pick up in the next meetup. Let's talk more about microservices and domain design. Sounds good. Cool. If anyone wants to talk about it, reach out to Michael and me. All right, any other questions for Sarnia? You can always reach out to her on Twitter. Twitter handle is on the screen right now. Yes, yes, sure. Let's stop sharing. And, Esha, thank you for the opportunity, Michael, nice to know you. And a lot of these guys are calling. Good luck and lovely to be a part of this conversation today. Thank you so much for coming and speaking to us about distributed computing and this broad world of so many words that we've only heard so far and maybe a little more detail. It was really good. Thank you so much. And with that, maybe we can move on.