 I am Nagesh Kamali. I am from the Apology Group. So, I have, so there is a mention of sunburns, people will be working on sunburn. So, I decided to give the same presentation which I gave long back, it would be useful. And I have another presentation for my projects which is a different. So, I am not doing anything with sunburn, but it is essential that you know how the systems are built nowadays. So, likewise you may have to learn many, many systems which are already there, you know you cannot build the systems the way you have been building all the time. So, you have to know why such systems are useful nowadays, how it can be scalable. So, I tried to do a little bit of this slides are not there last time, last time I said one month back. So, I tried to gather some information on open edX and if you are, I mean you might have seen what is open edX, how it is, you must have felt it when you are accessing the course or doing something with it. You must have found that it is little slow sometimes, sometimes it is good responsive otherwise, you know there are some problems. So, if you see open edX, the philosophy, you have a traditional course structure like you know you have courses in the university and now they have refined those courses and they want to give it out to the whole world saying that these are the best courses. So, that has come from thousands of universities across the world, even IIT Bombay is participating. So, we generally call it as X MOOCs, X is extension, extension of the existing courses which are well refined, you know they are of very good quality ok. The third point is you will see that groups are closed. So, if the course happens say 2 years back you will not be able to go through that course again unless you have already registered or something like that. So, the group, the people who are actually communicating in the forum, so they last only for the duration of the course ok. So, if you are on Facebook, your Facebook is active and it will be active for another 10 years or 100 years right. So, that means your group is still there, but in open edX the group is limited to that course and that is true because in a classroom you do not interact with anybody outside the classroom, you interact with people around you and that is how that is the philosophy right. So, Sunbird is having I mean what we call as C MOOCs which is the connectivist approach. If you see YouTube people generate content and you must have come across very good content on YouTube right. So, that means the task is given to the public to generate content and then they say if I am an expert I can aggregate all of this content maybe I can create a course around this content ok. So, that is the philosophy slightly different you see. So, I do not generate all my content, but I aggregate from what is already existing and make it available as in the form of a course or in the form of a module or anything like that right. So, courses can also be generated in this fashion ok. So, there was an not an experiment there was a site called connexionscnx.org you may write itcnx.org and where what they did is that there were people who actually wrote I mean I mean you must have had a faculty in your class where where he himself has written a book and people are referring that book right. So, that becomes a general text textbook kind of thing, but you know this connexions actually started in rice universities and Sydney boroughs actually he had written almost 10 books on signal processing and he said that nobody understands my textbooks. So, the site actually made students to write book you know write a book for a particular course and so students themselves have written those books and that is how they understand how how they you know you do not understand somebody else's written material. So, there is a facility wherein you can actually create your own content collaborate across maybe n number of students and give something back to the community right. So, that is the philosophy. So, it is like more of like a connectivist approach. So, collaborative development of book and it of course builds communities you can have large number of communities then. So, if you see the architecture. So, this talk is based on the architecture differences. I mean I am trying to focus more on sunbird architecture because that is the one which came recent and I did some little bit of study between open edX and sunbird and I figured out in. So, I put some points here looks like it is modular open edX how the way it started. So, they were trying to build you know components by components and so what has happened is the code has become little bit monolithic you know. So, they had they write in the beginning they did not have the philosophy of putting it out in the form of services. So, though later on they evolved into pulling out the code you know in the form of micro services, but it is little bit more monolithic. So, there is a high inter data dependencies between different modules in the system. I mean if you go through the code you will understand why it is so and it is little highly coupled. So, it is also built around technical concerns rather than business. So, some of the points I have taken from open edX conferences where they themselves have stated these are the valid points which they have stated. They built around what you said more of technical where in you talk about certain technicalities like so like you know data analytic pipeline you know data analytics pipeline or something. So, I cannot generalize it at all. So, I have to still say that it is a data analytics pipeline it will become more technical. So, instead of that you say I could it be a pipeline for some activity. So, then it becomes a business the focus is on business, but if you focus the focus is more on the technicalities of it then it is a more of a technical concern. It is extensible it is pluggable framework LTI is learning tools interoperability if you are not come across what is LTI maybe you can search there are whole lots of things. I in fact, Shukla Nag stated out lot of things of open edX and these those are the things which are there and SSOs asynchronous. So, the refactoring of code is required if you are going to work with open edX. So, if you are going to actually change some code in open edX probably it may take longer time to do. So, if I compare this with Sunbird Sunbird is just just evolved actually just evolved and so, what they have done is little bit more slightly different than the way the the systems are built. Like for example, the most important thing is for concurrent computations. So, where you do not actually lock lock the data source at all. So, you just allow things to run concurrently. So, that is where you get little bit of performance and so, you have a so, the current architecture of systems that we have today a multi core architectures. So, you need to take advantage of all of this course and make you know the computations I mean available for the resources required in your system. So, Sun it is also built around microservices microservices if you do not know just go and Google. So, it is something about like you build small small services you you or even if you you want one single service you will like to break it into small small small and then come up with individual services and and there is an rest API's connections between so, so that you do not actually call. So, you can write your service in any language ok. So, does not matter whether it is Python, Java or anything you just write it in any language and with rest API's everything is going to work ok. So, so message passing at the code level you have have you done functional programming I think anybody here you are done which language scheme Haskell anybody Haskell yeah. So, so these are the things that it carries you know these are the frameworks which are available today there are many available by the way. So, if you are building some front end using this type of frameworks then there is a there is a framework called ELM. There are other languages available closure Facebook I think WhatsApp uses closure or or some other which is they do not use Java they do not use Python ok. So, that is why you find it is highly scalable. So, they do not care for you know any what do you say if a service shuts down they do not care to see to actually write a code which is going to do some exceptional handling and all those things. So, that time is saved. So, if service crashes it crashes and then it restarts again everything starts. So, these are the things which are you know which you will find in this type of technologies which you have not come across before previously you are used to writing exceptional handling a lot you know in Java and C++ and things like that. So, and you might have never done what is called as message passing. So, I have a code written I have a code written I have function here I have a function here. So, I pass using messages nothing else I do not do a function call. So, these are slightly different in the way the way you have come to writing systems earlier. So, another language which is well received in the industry and if you want to maybe tomorrow if you go to industry that is Scala. So, Scala is very promising because you have heard of Scala yeah. So, Scala runs on JBM and JBM actually runs everywhere and you can write Scala in as a procedural you can write Scala in as an object you can write Scala as a functional. So, lot of flexibility you have in your building systems using Scala also. So, let us explore some of the components. So, if you see so they have something called as Cassandra. Cassandra is nothing but it is a peer to peer architecture of you know key value store kind of a system. So, earlier we used to put everything in databases you understand. So, you said that everything should go in databases and everything you planned well before to say that it is going to be structured. But you cannot structure in you know so well that you know the system will be still useful for another 20 years and 30 years the structures are changing everything is changing. So, what do you do you just put very little information in a structured form like and then you know what is big data now big data right. So, you do not put anything in the structure but you device or you put some new schemas required you know as and when you need. But the core thing goes in MySQL databases or Postgres or some other databases. So, if you see Cassandra itself has come from you know combination of two architectures one is Dynamo and Google Bigtable. So, they appear to peer Dynamo is peer to peer key value store peer to peer means there is no master there is no master and slave architecture. So, anybody becomes a master. So, there is an algorithm for doing that there are I do not remember the there are algorithms which if you are interested you can learn those algorithm very recent ones which came was raft algorithm for election you know you want to know who is going to be the master. So, there is an algorithm which can if you have 10 different nodes and then you want to know which one is going to be a master. So, it can send out some packets to different all the nodes will send out packets and then one of them will get elected you know and then and then when. So, now you have 10 nodes there you do not have one node and two nodes you understand if you put MySQL I just have one node right. So, that is a there is a kind of a bottleneck. So, in such systems you have high data application ok data is well replicated maybe a factor of 3, 4 or something like that depends on your needs sharding happens across the nodes in Dynamo itself you have eventual consistency eventual consistency means you do not you know what is consistent right. So, if your system is consistent when any transaction is performed ok. So, your data has to be has to go in the system. So, it has to be consistent after you access for the second time you should get the updated value that is it is called very consistent systems. In fact, you see Google's big table is actually very consistent it is very consistent. So, Dynamo actually falls under what you say AP that is availability and partitioning. So, under AP availability and partitioning if you are using Google it is consistent because you will get the right results I mean and it is highly partitioned ok, but it is not available sometimes. Have you noticed that? Have you noticed Google sometimes not available? I have noticed it ok, but if you are using Dynamo as a product I am telling you why because you cannot have all three in the distributed systems impossible. So, you have to have only you can have only two of them. So, either C A or P consistency availability and partitioning ok. So, this Google's is a master slave see I will not go through each one of this you can read the papers listed maybe it will be beneficial for you if you want to do research or something like that in data storage architectures and things like that. So, they have what is called a gossip protocol. So, one one protocol which I told you was rough protocol. So, there are many many protocols available ok. So, and they use what is called as sorted string tables. So, SS tables they call ok. So, let me go through what you say it is something like this if you see the read and the write path is totally different they do not have the same path. If you see here Cassandra is actually writing the data to memory table ok and then it is flushed later on that is why it is called eventual consistency. So, it is not immediately flushed to the SS tables ok and then when they when you write you also maintain a commit log ok. So, if your system fails you can recover everything from the commit logs ok. A commit logs have to be written somewhere. So, for which your system has to provide non failure of you know certain portions of the hardware is right. So, you have a write path which is like this and then you have a read path when you read you actually read from SS tables. You also read from this because it is in memory in memory everything is in memory. I mean you have things in memory you read out from tables because of eventual consistency and then you map it you combine these two things and then merge it here and then you get the final results. So, which is totally different than if you explore any database architecture it would not be like this ok. So, traditional rdms web tier architecture if you see lot of servers here lot of middleware servers you know middleware right I mean you are aware of these terms probably and then you see there is a bottleneck because I just have rdms there. So, people actually started building what are called as parallel databases if you heard. So, previously when people were building a distributed databases based on actual rdms and then started moving towards parallel systems and then they did not do much work in that area. So, there are still cluster components available like G cluster in MySQL you will find that those clustered rather than single database you have a cluster architecture available for you. But if you see this it is very different than what you can get out of it. So, here you have nodes and nodes one of them is a master. So, they know whom to communicate with the web servers here and I can increase these nodes I can go to 2000 10000 nodes or whatever and all these nodes are on as a VMs or a docker kind of thing. So, you can very well scale it to any level. So, there I cannot scale at all impossible. So, they also have what is called elastic search. So, if you see all the architecture this type of systems that have been built over the years they are all peer to peer and not master sleeve architecture. So, that has this has actually told us that any system that we build should be like a peer to peer architecture. If you are building a particular system and saying that these are my core business activities of my software I cannot just write code. Just write code and you think it is going to be scalable and things like that nothing is going to happen. You have to follow some methodology wherein you will be you should be able to say that the same system can go on multiple nodes. So, that is the approach when people are whatever components that you see are scalable are the ones which are peer to peer and not master sleeve or just a single component. So, you have to design the whole architecture in such a way that there is always a path for you available right and even if one subsystem is very heavily used then I need another subsystem immediately spawned from the resources available and then you should be able to use that from a computational point of view. So, that the availability and availability is always there when we want any system to be up and running it has to be available right. So, here you have so, these are all little bit technicalities JSON objects and maybe you have heard about it. So, here also they are auto sharding elastic search fault tolerance replicas I mean data is replicated. So, you have a library to handle indexing searching and they use an inverted index mechanism. You know what is the inverted index? You know what is inverted index? Does anybody know what is inverted index? You don't know what is inverted index? We have studied this right from KG. You go to the back of your book index it's inverted index. You can look for technicalities maybe on some if you don't know but I'm giving you real-life examples. Lockstash, Kibana, Kafka, they are all of these things. I don't think I should cover so much maybe you can read and can figure out what are this this is a data pipeline. Here you have Kibana is for visualization. Kafka is for you know message passing systems. So, this is exactly what is there in the stack. If you want to actually configure your stack it will be something like this. So, I can have n number of Kafka workers here. I can have n number of lockstash. I can have n number of shippers. This could be your dockers and you know this could be anything. This could be VMs, dockers, lockstash here. I have elastic search here. Okay, searching mechanism. Web server here which allows me to do some visualization. Today you don't need to build a visualization tool all available for you. You should only know how to connect properly so that you get this is a puzzle you have to solve the puzzle using the best technology. Okay, so they have three cloak open ID talked about Oath and things like that. So, these are the things that they are using a single sign on and for as Oath and other ways of authentication mechanism required in the system. Whichever method mechanism you want is available today. You don't have to build everything. They are all there. So, core services of Sunbird, the learner services, content services and the name itself tells you what are they. Actor service, the one which I told you and then player service which is more of like they use something called as play framework. So, if you have not heard about what is the play framework, you can read on it. Okay, and then a click hook service. There are many services. In fact, she listed out. These are one of the major services which I am listing. Play fame framework they have, I mean, you can write code using play framework, RESTful API is totally asynchronous. So, anything that you write. So, hot reloading. I can reload some portions of the code somewhere. I don't need to restart anything. These are the things which are very important. Now it is when you are building things. So, there is a Scala support also. So, middleware. So, you have all these components, elementary, organization, dashboard. She covered all of those framework. And then you have, if you are not, if you don't know about this library, please go and see what does, how does it help you. There are lots of, you can do 3 billion transactions per second. Okay, so using commodity hardware. Figure it out. There are so many things to learn. But you have to choose. It is a puzzle. What is the best available today? This was not available before, 10 years back. So, I will skip this. These are all libraries available in the car modules. Then this is nothing but dynamic UI framework. This is totally written using JavaScript. This is a framework available for mobile-based applications. This is an architecture that you have. These are APIs that we were talking about, learner service, content and all the services available there. You have a middleware here. This is also scalable. Remember that we know that this is scalable. This is not scalable. So, what do we store in this? We will store very required information. We are not going to store all the information in that. So, which can be order of much lesser, less 10 times lesser than what you will find in Cassandra. And you have all these visualizations happening here with elastic search, log stash. All the logs are actually been recorded at Cassandra's and then they are visualized using this tool. So, each of these tools you would have lot of things to learn. So, if you are planning to study them and then use them, you just don't plan just to study. You have to use it. If you use it, then you will understand much better. Okay. So, this is the architecture of these are the tools available that you see Docker, Swam, Ansible. Ansible is another Jenkins. So, you know, in today's world, today's world is it required that you become a developer as well also know how to write Ansible scripts, also know how to use GitHub, also know how to use other things for monitoring and all those things. So, that means this is a kind of a DevOps approach where you just don't do write code. You do much more than required. You should know how to even configure systems. Okay. So, this is all what is required. So, you know, this one is more architecture of the deployment environment which they have. You know, there is a CI server here. CI is nothing but continuous integration. Continuous means when I write code, it gets on, I mean, it is pushed on to Git. So, any code that you write goes to Git. Okay. Can you tell me why it should go to Git and not sit on your computer? Yeah, you don't lose it. That is one because you want to do some versioning. So, if you want to, but there is something more to it. If I commit, I can test it immediately. I can test it immediately. I don't need to test after two months when you are all gone. Right? So, you can test it immediately. So, that's why it has to be logged somewhere. And then series of activities should automatically start. They are all automated tests which are there. You know, you can write and you are supposed to write. Okay. So, I will not go through this. Okay. So, this is not for you. And these are all references. I will say this, I think these slides will be somewhere up, I believe.