 Thank you for coming to the session. I'd like to start the architecture of the Microsoft brand action based on the distributed consensus with Kubernetes. My name is John Peniston. I'm working for the Hitachi in Japan and this is my first time experiencing and enjoying international conference in English, so I'm a bit nervous, but I'll do my best. In my experience, I have developed a web solver network patch and your application solver like GlassRidge for 10 years. Instead of that, I created a modern addition plan for our customer. For example, infrastructure change like the on-premise system to the public cloud. And now I'm researching the developing a microservice transaction system. Today, I'll be presenting consensus on transaction comments so called Paxos Committee. This paper was written by two famous people. First is Dr. Jim Gray, who is famous for the relation of database technology. And second is Dr. Leslie Lampard, who is famous for the distributed consensus algorithm Paxos. We are developing a microservice transaction system applied Paxos Committee for cloud native customer needs. Today, I will introduce the practice to implement a Paxos Committee in the Kubernetes environment. This story originated from the difficulty of achieving the distributed transaction with microservices, so I will explain in this order. First, microservice and distributed transaction challenges. And second, cause of the inconsistency. And third, theoretical approach to prevent inconsistency. And fourth, and last, Paxos Committee operation in the Kubernetes. Are some of the engineers, sorry, what is the microservice activity? This is a way to develop a system while dividing several small applications and according to the business responsibility boundary. This way causes easy understanding service and engineers can improve system policy story and engineers can choose the best technology for the each services. But sometimes engineers may have first difficulty in architecture microservices, particularly in data management. Engineers would like to implement a microservice in their enterprise system because of the needs of the system majority. And business data agility may be necessary in some cases to increase overall system majority. So split the business application and data in the database according to the responsibility boundary and make it a service. The database for service pattern is a famous way to split the data in microservices. As a result, database began to be distributed. And updating data in the database to DB increased the risk of the inconsistencies. And what is the inconsistencies? I describe the example of the inconsistencies. When the transfer service transfers money from the bank A to bank B. And bank B will draw the money by the bank account and bank B deposited the same amount of the bank B account. And this scenario, if bank B cannot deposit the money for some reason, like a failure, inconsistency happened between the bank A's amount and the bank B's amount. The famous way to prevent this type of inconsistency is so-called distributed transaction technology. But when engineers construct a distributed transaction system in the microservices, we found the problem that the system will be complex. Please look at the graph. This is a result of the solving on the stack of our engineers. Many engineers think that the complex distributed transactions are challenges and the top of the concern to construct a transaction system by the microservices. What is the complex distributed transaction? The Microsoft's textbook, Microsoft's partner with example Ninjabur, includes the benefit of the microservice and the challenges of applying the global transaction management to the microservices. It mentions DB must be ex-compliant and cannot show the modern DB technology as a noise QL. And second, technology stack is limited to the famous framework that include the transaction monitor, such as the Java EE or Spring. And third, consistency must be given up to some extent in the distributed system from calf searism. Therefore, while citing the semantic paper, he mentions the man of achieving the distributed transaction in user application. To resolve the inconsistency problem, SAGA design is the most famous approach in the microservice architecture. SAGA links the business process together in the workflow. As the process fails, a compensation transaction is called to counsel the update and ensure that the misty. But please look at the core sample of the business application. With his implementation of the transaction management and routing the business rule, as a result, the business application become complex. Why is it complex? Why is it complicated? That is because three inconsistencies factor in the distributed transactions are very complex event. There are three main causes. First, business error. Second, system failure. Last, right, right conflict. The first step of the complication is the update is prevented by the business validation in the DB. It will prevent this update. For example, with the rule more than the maximum balance or out of balance, it will be necessary to counsel service that have already been updated. To achieve this operation, we have to implement the compensation transaction in the application logic. The second step of the complication. For example, when the process goes down during the transaction execution, the transaction state will be lost and inconsistency will happen. To prevent the inconsistency by system failure like this, the persistent mechanism needs needed. But failure handling is also a challenge. Failure can occur at various times and how to set the transaction depends on the case of the error. That's the code to deal with the numerous error will be exposed to the application logic. And the last step of the complication is right, right conflict. For example, if multiple transaction interfere with the same account area, updates will be overwritten and inconsistency will occurred. To prevent this inconsistency, concurrency control is needed. The lock is the type of the concurrency control and the lock information itself becomes a state. The lock information must be the persistent. After that, the code to deal with the concurrency control will be exposed to the application logic. Please look at the picture below. What we originally wanted to do was simply to have the two service updates. When distributed transactions are implemented, there's three factors complicate the application and its first structures. This is the root cause of the complication of the transaction management. The saga itself has only business error capability and cannot deal with the failure and right, right conflict. In particular, the saga is seriously flawed when it comes to the right, right conflict. Therefore, no matter how hard we try, we cannot prevent the inconsistencies. There are already established mechanisms in the academia to prevent the right, right conflict in the distributed transaction. That is the principle of commitment ordering paper from 1992. These papers say that the only mechanism to prevent the right, right conflict in the distributed transactions are to phase locking or avoiding cascading abort. And saga doesn't have this technology. Today, I don't mention the detail, but and this is this academic opinion. However, many microservices engineers don't know the fact why. We think that academic affiliation is fragmented. Microsoft services boundaries to like Chris Richardson prefer to cite a semantic asset and the saga paper, but failure and right, right conflict solutions are inadequate in these papers. On the other hand, there's a faction that's considering how to prevent failure and right, right conflict automatically. In the fraction, there are commitment ordering paper and distributed consensus for the failure paper. For now on, we'll explain what happened when we implemented theory by this faction. One of the techniques in this faction is the two-phrase commit, which is called the anti-pattern of the Microsoft seeds. Two-phrase commit provide concurrency control. Two-phrase commit requires a lock during the DB update and locks release after the agreeing atmosphere. This is so-called two-phrase locking. Two-phrase commit aggregate adds a DB update possibility like a prepare, okay, and finalize to commit when all are prepared, okay. This is the so-called atomic commitment. The two-phrase commit has two merits for the engineers. The first is to prevent inconsistency by the right, right conflict. And second, the exposure of the transaction control faction to the application can be very small, at least one line like a JTA atomic transaction. This reduced business logic coupling. We first attempt to reproduce the mechanism in the Kubernetes environment. We have tried to implement a separate transaction management architecture shown here. I apply my prototype atomic transaction annotation as a library to the transaction application on Quarkus. A prototype JD-DVC driver is applied to the bank application on Quarkus. From there, the resource manager process is launched as a side code for the bank app and transaction manager is started as a separated pod. The transaction manager and the resource managers are building the glass fish and JTA transaction info was used to make them independent. First, atomic transaction internally numbers the transaction ID at the same time as transfer API is accepted. Then the transaction ID is allocated to the rest API head of field and passed around. Trans service code bank will with the API and bank be deported API after the API call. The transaction ID is pressed internally to the JD-VC driver in the bank services. The JD-VC driver internally passed the transaction ID and updates SQL to the resource manager. The resource manager process updates SQL to the DB. In addition, before the update, the SQL is locked by the set for an update. When the DB update is finished and the control is returned to the transfer service, atomic transaction and post-processing is executed prepared to the transaction manager. After that, the transaction manager sends a prepared to the resource manager. After that, atomic commitments to start. The DB returns updates availability like a prepared OK or an ID to the transaction manager. If all returns prepare OK, the transaction manager sends a commit to the resource manager. If even one prepared NZ is found, the transaction manager sends a route back to the resource manager. At the same time, the transaction manager passes the result of the atomic commitment in the transaction log. After that, DB released a lock. The transaction manager responds to the result of the atomic commitment. This method keeps the application simple. Nevertheless, this two-phase commit has well-known weaknesses. That is a single point of failure problem of the transaction manager. In particular, the transaction manager process is down in the atomic commitment phase and the transaction state is lost. DB lock is not released and spontaneous rollback breaks the atmosphere. This is a common note as a futuristic hazard. To prevent a futuristic hazard, it is necessary to deal with the system switch operation of the past system stage called primary backup. However, IT fragmentation can easily occur in the crowd environment. If fragmentation makes it difficult to prevent a futuristic hazard, there are two reasons why the crowd environment and primary backups are incompatible. First, primary backups use the fencing equipment to switch systems. But the promise for the primary backup is the fencing equipment must be able to receive the failure notification immediately. In an environment such as a cloud, sometimes the failure notifications delay and the system does not switch over. Second, the single coordinator itself becomes a single point of failure like a primary backup two threads commit to a saga. Therefore, the mechanism to overcount the single coordinator's single point of failure will end up having a single point of failure. And by taking a counter measure against it, the system will gradually become dependent on the hardware. The crowd varies to abstract the hardware from the software stack. But if you increase the dependence on the hardware, there is no point in the building the system in the crowd. I will discuss it from the theoretical aspect of computer science again. There is a so-called failure model in academia. The model describes the severity of the deal with the IT system failure. Saga and two threads commit can only operate in the failure-free environment. Primary backup is a protocol that works properly in synchronous fail-stop. That is an environment where there is no communication delay or break up. The crowd is in an environment where the infrastructure is hidden and distributed happen communication delay sometimes. So it is very difficult to apply Saga and two threads commit on the primary backup in the crowd by academic theory. And two threads commit is a theory before the crowd is popular. So new theory needs required to construct a distributed transaction in the cloud native system. That is to incorporate the consensus in the failure environment theory. The Dr. Lampot and Dr. Lynch have worked on. In other words, we adopted products such as Paxus or Raft, which can raise the consensus even in the distributed failure environment. Let me explain what Paxus is. Paxus is a consensus mechanism is using a Google Spanner for the multiple database replication and blockchain ledger replication. What is the consensus? Consensus means that the group of the multiple participants obtain a single result. This is a bit confusing. So I will use an example of the parliament system of the Greek island of Paxus. In the parliament democracy, the process of enacting a law is the submission bill to the parliament discussed by the members and majority board and agree or rejection. The parliament system on the Paxus island is unique in the original Paxus paper, the part-time parliament, the secretary and the legislator or part-timers. The secretary had their own ledgers, but the congress wants to have a single vote result for the bill. So it is necessary to have a consistency among the ledgers. In order to achieve this consistency, Paxus incorporates a mathematical theory into the parliament system and the communication of the legislator. Please look at the vote process in the Paxus island. Legislator 1 makes an agree proposal. Then, each of the legislator votes against the proposal. Another legislator 2 may make an off-site proposal and all the legislator will finally converge on the agreement through a consensus process such as a discuss. Then, secretary 1 records the result to the ledger by the majority world and now the legislator and secretary on the Paxus island are part-time and so there is a possibility that the some secretary and legislator will go home after the vote is over. If the secretary lives with the ledger, the result of the vote will be lost. On the other hand, there is a desire to keep track of the vote result even after some people have left. What should I do? You can vote again and another secretary can record the result of the vote. The majority world so-called quorum mechanism works here. Even if one of the five legislator lives, the previous vote result and later vote result will be the same. In other words, if one process is down, but the majority of the process are still alive, consistency can be assured. This is not not the pigeonhole principle and the mathematical theory. It is very difficult to explain, sorry. If you are interested in this theory, please look it up. So where should this Paxus mechanism be mounted? A simple solution is to replace the primary backup storage area with no SQL implemented rough to a Paxus or something. For example, storing the transaction log in the Redis. But this approach is not efficient in the term of the number of communication for consistency and transaction termination. It is extremely efficient to implement the Paxus at the atomic commitment. The problem with the heuristic is the aggregates the updateability and the resource participating the distributed transactions. If this is a primary backup, it will be delayed or not switching the crowd. That's a pride of Paxus-Iran Parliament system to the atomic commitment. Resource prepare updateability makes the input for the Paxus protocol. The richer process to the prepare OK and the voltage token after exiting the Paxus protocol finalize a commit to a logback. Since the results have a fair lot to rely on standard availability, even if one legislator follows, a commit to a logback can be finalized immediately. Let's look at an example of this using the Kubernetes. The characters are most the same as the two friends commit, but in the low left corner, please place the redundant cogenitor in the port. We'll prepare it as a port for the port for the bank B. Here we will first update the DBA. This update process is not really different from the two friends commit. And next, send a prepare from the transfer service to the site called the bank service. At that time, the site can create a box to collect the consensus supporting result and the transaction state of the bank services. Next, the DBA returns a prepare OK at this point. The site calls for the transaction state as a prepare OK. After that, the site call of the bank here broadcasts this prepare OK to the redundant cogenitors. The redundant cogenitors persist in the updateability like a prepare OK bank A in its own volume and cogenitors send the updateability to the site call branch of the bank B. This means that the bank B site call now know the updateability of the bank A. Branch one of the bank A and branch two of the bank B are both prepare OK. So, the bank B service finalize a commit state to the DB. The redundant cogenitor for the bank two send the updateability of the bank B to the site call of the bank A. And that's the same. Even if one of the redundant cogenitors go down, if the majority of the cogenitors are still alive, the friends can commit immediately. And if redundant cogenitors act more, the cogenitors are already crossed to the 100% and the probability of the furious hazards can be as close to their percent as possible. And the current implementation, the redundant cogenitor is pushed into the single pod. But if the cogenitors are geographically distributed or rearranged, it will be possible to provide like a disaster recovery and so on. Now, to summarize, there are multiple challenges to achieving the distributed transactions in the microservices. First, implementing transaction management complicated microservices systems. And second, technology stack is limited to the framework that includes the transaction monitor, such as the WEOS spring. And third, consistency must be given up to some extent in the distributed system from the capitalism. And last, DB must be excellent and compliant. Regarding this first issue, it was confirmed that the Atmack transaction can be used to describe transaction boundaries simply. The second issue is that the separation of business application and transaction controller will allow us to solve the problem. We have standardized the API interface between business application and transaction controller components. For the standard, we can create a library in the language such as Go or JavaScript that works the same as Atmack transactional. And it starts from the CAP series. The issue, the consistency must be given up to some extent in the distributed system. And that all CAP attributes can be guaranteed by the Paxos. What does this mean? The CAP series, CAP congestion is a hypothesis that the distributed system can not satisfy consistency of our ability and partition tolerance all at the same time. And in the crowd, our ability and the partition tolerance are important so consistency is ensured by the eventual consistency. In contract, we have been following the Google or Dr. Lynch's academic proof of the CAP. If the partition does not occur, both consistency and the viability can be achieved. And partition really occurs in the real system, even if it does occur. Paxos can prevent the lack of consistency because Paxos provides a redundancy and wait for the commit during the partition occurred. And when consistency and the partition tolerance are required, 100% of viability cannot be achieved. However, redundancy can bring viability as close to 100% as possible. In other words, using distributed consensus algorithms such as the Paxos were roughed or something, in effect, consistency and viability and partition tolerance can all be satisfied and simultaneously. The first and the final point, we still consider the issue that the DB needs to be the acceptable compliant and that more than DB such a node as here can be selected. However, this issue can be solved by the putting a transaction wrapper on no SQL, like the scurral DB or cherry Gaussian paper approach. In other words, the Microsoft transaction problem can be solved by using all of today's technology. If you encounter any of these crowd or Microsoft transaction challenges, please contact me. I'm sure there are many areas we can solve. That's all for today. Thank you for hearing about my extension.