 Hello, good evening everyone. Thanks a lot for coming here. I hope this is the last session for you. So I'm in between your Guinness journey and Hyperledger. So today we have an interesting topic about how we can accelerate Hyperledger Fabric specifically and what is the use case we tried using in this case. So for every, for any blockchain, the business value is very important before going into whether we need to accelerate and what kind of model we need to choose and other things. In AMD, we try to make this journey like five years before when Hyperledger.9 got released. But that time we don't have any strong business use case so we can proceed further. And again, we restarted this journey like a year before. So we'll walk through what we have done. Quick disclaimer and just a quick update about AMD. As you all know, we are a semiconductor company and we have presence in all the possible combinations of acceleration as well as on Nick as well as like standard CPUs and your 3Js. And also our presence is there in all the way from edge to cloud and even beyond like most of the satellites, mass rover and all those areas. And also like AMD presence is there all the way from supercomputers to edge. So with all this presence, it also creates a unique challenge as a company for us. That challenges mainly on this particular area. In the recent COVID situation, you might have heard about a chip shortage which created a huge publicity for chip companies. So that is a constant news at least for past six months. And specifically for automotive, you might have heard many of the countries like people has to wait for somewhere between like two months to two quarters to get a new car. And this is the first time in US history they used to market used car market price cut increased. With all these things, since due to a shortage, we also seeing a different kind of market happening in semiconductor. So that one we generally call as a gray market. So as you point out few areas here, the huge like a revenue loss also because of the gray market. One is like the current estimations around 75 billion dollar loss because of various IP loss as well as like tamper chips and all those things. And it also have a huge impact to the job market itself. So how we can address this kind of things. So we always constantly look for how we can give a proper workable chip to our customer. There is nothing called tampered and they're not buying it from wrong source. It's supposed to do what it's supposed to do. So as part of this we validated and we worked on various solutions. Especially when it comes to supply chain in a blockchain use case. We came out with like a three different options. How we can use a blockchain for this to address this name. So one is like a product provenance as you all know. And we also looked into both tokenization as well as like a POS. POS is more like a point of sales. So what happened between like a distributor to the real client? Are they getting the real product what we're shipping? At the end we started focusing more towards a product provenance where we have like a lot of use cases to address and solve. So when it comes to product provenance just I want to touch base. This is a very, very abstract simplified diagram of semiconductor supply chain market. So we have a die bank where the fab creates our product. Then we have a packaging. Then we have a separate team for testing. And we also have like a finished goods. From there it's getting shipped as well as like it goes to a customer. Sometimes we also have like a contract manufacturer. So it's a huge chain before we are getting that like real customers are getting their finished goods. So in all this whole model what is happening here is previously they aimed is a central party where they receive the data. They will filter out something and they'll send it to us next process. Similarly it is happening throughout and aimed is being a central point for all this data management. Now with blocks in solution what we are looking is how we can make like a mesh so that all the parties can exchange data freely. And again based on a smart contract they will get whatever the data what they want. During this process one more interesting observation what we made is since we are making it as a blockchain the number of transactions obviously going to increase compared to standard database where I can have like a really huge table with multiple columns and people can keep on updating but here everything's based on transactions. So what does it mean is we need like high performance blockchain. So one is like a transaction volume is going to be keep increasing and all my following process those also needs like a higher performance because they are looking like something like a database level that kind of performance where traditional blockchain generally lacks. And also we don't want to create like a very complex architecture with hundreds of nodes. We also make it whole process simplified. So this is the time we are looking to various solutions what is available in the market which can address some of our future needs. So while doing a research what I notice is within aimed itself one of our researcher done some kind of acceleration blockchain use case. So let me invite Harris to proceed with the next steps. Thank you. Hello. Okay. So I'm going to be a bit more technical than Muthu because he was explaining the business use case and why do we want it to go into the blockchain space and the supply chain and product provenance. I'm going to be a little bit more technical. So before I go into the details just to give you a quick recap of how transactions flow through a Hyperledger Fabric Network. So most of you would already know this that there is a client on the left hand side transactions get created and it is sent to endorsing peers after the transactions have been endorsed. There are enough endorsements on a transaction. The client would send that those transactions to the ordering service where the consensus protocol is run. And finally what comes out of there is blocks of transactions. And those blocks are sent to all the peers in the system which is highlighted as yellow here. We call them validator peers. I will come to that later. And these blocks are validated transactions and validated finally committed to the ledger. So what I want to highlight here is that in our observation and experience the validator peers significantly limit the performance of a Hyperledger Fabric Network. And the way these transactions flow through this lifecycle the third party, the external party can only see the data after the blocks have been validated and committed to the ledger. And that's when the transaction is confirmed. So that is why we wanted to look into these validator peers and see what are the performance bottlenecks and how we can improve it and accelerate it for higher transaction volumes near real-time updates. So here's a snapshot of a performance dashboard from one of our validator peers in the network. So what we are trying to show here is that we just took the standard Vanilla Fabric Peer using CouchDB and the performance you get from such a peer is about only 150 transactions per second. That's far from what we wanted to kind of achieve at the end of the day. And we are not the only one to get this sort of performance. There are a lot of other papers which I've highlighted here and those papers have also shown multiple times that validator peers become one of the major bottlenecks in a Hyperledger Fabric Network. Let me switch gears and talk about FPGA cards. So an FPGA card is just like a GPU card. You can bite off the shelf and then you can plug it into the PCI slot of your server. But unlike GPU cards, you can actually program these FPGA cards with custom hardware accelerator which you can build for your particular application. And not only that, you can actually reprogram these FPGA cards for different sorts of applications. So I can program it for a blockchain application. I can program it for a machine learning application and so on. So inside an FPGA card, typically is something called adaptive SOC chip. A simple example is shown on the left-hand side. You can see, you might not be able to read all this, but there are a lot of blocks here. So some of these blocks are ARM processors. There are blocks which do PCI connectivity. There are blocks which do Ethernet connectivity. But the most important part is in the middle, the red block which is called adaptable hardware or programmable hardware. So this is where your programmable custom hardware can go. And then you can reprogram these for different custom hardware. So what we can actually do with an FPGA card is that we can actually program it to do compute, network, storage, acceleration, all together in a single card. And then we can also have it connected to the CPU and we can have a hardware software code design kind of a setup which is much, accelerated for the application you want to implement. So with that in mind, coming back to Hyperledger Fabric and the validator peers becoming bottleneck, our solution to that problem is the accelerator we call Blockchain Machine. So essentially Blockchain Machine is an accelerated validator peer for Hyperledger Fabric. And you can see a picture there. So it's a server, a multi-core server, with an FPGA card or maybe multiple FPGA cards in future. And it's programmed for the accelerator we designed for Hyperledger Fabric validator. It is specifically designed for high performance. So what I mean by that is much higher transaction validation rates, reduced block validation latency and so on. And then it is also designed to be adaptable. So what I mean by adaptability here is that all of you would know that Hyperledger Fabric by design supports smart contracts, right? So you can install a new smart contract. So what if someone installs a new smart contract? We should be able to adapt our program, reprogram our hardware, upgrade it to include the new smart contract. And that's how we can have more smart contracts supported over time. And then finally, the Blockchain Machine peer is compatible with existing nodes in the network. So why that is important is because I can bring up a Fabric network where all the nodes are standard nodes. For example, standard peer order nodes. And then one of the nodes is a Blockchain Machine peer. And that will work with all these existing nodes as well. And that's how you can first evaluate it, see whether it gives the right performance you want. And then later you can upgrade over multiple cycles the rest of the nodes in the system as well. So the current status for our work is that it's available for Fabric v1.4, although I don't think anybody uses that anymore. And it's also available for v2.2. And it has already been open sourced as a Hyperlager Labs project called Fabric Machine. The software code is all open source and the hardware is available on request. So if you wanna try it out, please reach out to us. So let me give a little bit of more details of what Blockchain Machine is. So here's an overview. As I mentioned earlier, we have a server, multi-core CPU here on the right-hand side and an FPGA card plugged into the PCI slot. And there are a bunch of modules inside the FPGA card. So the FPGA card has a network interface. What that means is that all the network traffic comes into the FPGA card and then goes through these modules. And what these modules are trying to do is that they implement certain types of network acceleration, for example, efficient transfer and access of block and transaction data from the Ethernet packets. There's a compute accelerator which accelerates the verification and validation of the transactions. And overall what happens is that data comes in, gets processed and then it's ready for the CPU to be accessed and further processed on the CPU side. So we call this bump in the wire processing. It's a very typical term and it's a very popular paradigm for higher performance computing. So data streams in through the FPGA card goes all the way to the CPU. So a little bit more details. Protocol processor at a very high level. We have a very hardware friendly protocol to send blocks from order to the validator peer and then it will extract all the relevant data that is needed for validation of the block and the transactions. For example, transaction IDs, block IDs, read write set. People who work with the fabric would know this and how you do the endorsement policies and so on. And then the block processor is basically the compute accelerator. We have different types of block and transaction level pipelines here. So what we are trying to achieve is that we want to process as many transactions in parallel and in a pipeline fashion so that we can achieve high throughput. And internally it has a lot of different types of transaction processing engines, validation engines and so on. And finally we have something called a memory map or a register map. So the idea here is that once the data has been processed transactions has been validated. The results are basically written into some sort of a register map which can be accessed by the CPU. So we provide a Go language API which is then used by the fabric software code to basically interact with the hardware, get the data out of the hardware and then combine it with the block on the software side. And this also provides us a way to basically overlap communication, overlap computation that is happening on the CPU side versus the FPGA. So we can have certain operations running on CPU while other operations running on FPGA in parallel as well so we can overlap these computations. So here is the final snapshot of how the whole system looks like in terms of implementation and in terms of how the hardware and software is partitioned. On the left hand side we have the entire system where you can see the CMAX app system without going into details. Essentially a CMAX app system allows you connectivity to the Ethernet port or the network port on the FPGAs or Ethernet packets coming through here. Then on the right hand side we have a QDMS app system basically that provides you connectivity and access to the CPU. So CPU can interact with the accelerator and then in the middle the user logic box is where the accelerator goes. So you can see we have the protocol block processor and the maps and everything in there. On the right hand side here we show how the hardware software basically runs different operations of the validator peer. So in a typical standard vanilla fabric validator peer all of these happen on the CPU side. But then for us we basically offload most of the computationally intensive operations. For example, verification of the block, verification or validation of the transactions, the VSEC, the database, MVCC and so on all on the hardware. And then the CPU is only reading the data, the final data from the FPGA, the hardware accelerator, combining it with the block, writing it to the ledger. So what comes out here, the ledger from this blockchain machine peer is basically exactly the same as any other peer in the system. So now the main question, the billion dollar question, what we were able to achieve with all this. So here is a quick snapshot. We have a lot more results. If you're interested, we have a research paper in ICDCS, the Disability Computing Systems Conference. You can take a look at that. But what I'm trying to highlight here is that on the left hand side is a standard fabric validator peer with multiple CPUs. It can achieve up to about 6,000 transactions per second. And on the right hand side with these blocks in machine, we were able to see up to 70,000 transactions per second as well in validation throughput. So the way we benchmark, this is using HyperDecaliper standard benchmarking tool. We use the small bank benchmark, which is a typical banking application that creates accounts, transfer monies and so on. And then all these peers are using level DB. So now I'll stop here and then hand it back to Muthu who will talk about how we took this research project and then applied it and integrated it into a production level network. Thank you Harris. So Harris was able to convince us like yes, we can get like higher throughput, but as a typical IT like we are not convinced yet, whether it's going to work or we need to do some different type of programming that's compatible. All those questions as any other customers, it exists for us also. So what we done next as part of the process, we want to implement this particular supply chain solution. Initially, we haven't used any of this custom hardware. We just use like assets like a phase one, typical like standard software, written everything for CPU and we are able to make our actual application work first. Then as a phase two, what we done is we introduced this particular blockchain mission to one of the only one orc to see how it works. It was able to work like transparently, specifically like Harris and Steve who's from who's our blockchain SME, they work together and we are able to do everything like transparently without doing any code change from developer or developing point of view. So currently we are in phase three, so we are working on how we can introduce this blockchain mission to multi-blocks. So the overall idea is more like a simplified diagram. So what are the current thought processes? We'll have the standard hyper-login network where AMD already has a blockchain mission and we're going to introduce to couple of customers at later stage and slowly introduce to various vendors and partners whom we are working. So what we done on AMD side is it's not just implementing for sake of blockchain. So here we also introduced like the full application scope as it is. For example, this particular application integrated all the way from our ERP to our client application. So we also took like a modular approach. For example, like we had like a message queues where all the data is getting received from our standard systems and we also created like a client worker, this mall like a parallel workers as a transaction processor. And we also created various rest engines so that it is like a standard interface. Plus all the chain code and other things are running in our accelerator hardware. And we also create like additional half chain databases and we do created like separate client applications also. Especially in this case, it's a mobile application which is using like backend hyper-login fabric. Now I can short a quick demo which is like a recorder. Semiconductor supply chain is a space where significant value can be extracted from enterprise blockchain solutions. AMD supply chain is an interconnection of organizations, activities and resources for transforming raw materials into a finished product for delivery to the end customer. Managing the integrity of products and processes in a multi-stakeholder supply chain environment is a significant challenge. With rising expectations from our customers on end-to-end visibility, establishing reliable provenance, preventing fraud and counter-fitting is crucial for AMD. Now that compute intensive tasks are addressed by the blockchain machine. AMD is able to create a supply chain pilot to showcase a real world implementation using accelerated hyper-ledger fabric. Okay, so this is a quick demo like how we develop the UI, which is like a mobility and also molecular entire process in various steps like various transactions are created with proper transaction code and people can go back and verify instead of blockchain what is happening under particular transactions. So this is the application which we created for some of our external customers and moving up. So we'll also show what exactly happens on the performance aspect. In this video, we show a live demo of our hyper-ledger fabric network with hardware accelerated validator peer. We set up a typical Prometheus and Grafana based that were to look at a few performance metrics of validator peers. For this demo, we only show the most relevant metrics. First, I would like to go through the configuration we used which is shown here in the top row. As you can see, we use fabric V2.2 and we also report the total number of blocks that have been committed to the ledger. So far, the ledger has more than 60,000 transactions. The first set of metrics which I would like to focus on is related to the validation of block and its transactions. We define this as the time it takes to validate all the transactions of a block and commit them to the state database. The top row shows the throughput for a vanilla peer running on a multi-core server while the second row shows the same for an FPGA accelerated peer. We configured a block size of 50 but it can be smaller sometimes depending on the incoming transaction rate as shown here. The software peer is only able to achieve a few hundred transactions per second which we realize is because of this lower database accesses. We use CouchDB as the state database because our application needs to run various queries on the committed data afterwards. The hardware peer, on the other hand, is able to process thousands of transactions per second because all of the bottleneck operations in validation phase are being executed on the FPGA card. This results in a huge speedup which is shown here on the right-hand side. You can also see the throughput over time of both the hardware and software peers plotted in real time as the transactions are received and validated by the peers. The second set of metrics is related to committing of the block to diskless ledger. The reason we separate out this operation is that it always runs on the CPU. Here, what I would like to highlight is that the ledger throughput of both the hardware and the software peer is about the same because the ledger is written by the CPU. The third set of metrics is the commit throughput of the peer which is a combination of the validation and ledger throughputs of the peer shown earlier. Here, you can see that the commit throughput of the software peer is very close to its validation throughput because it is dominated by the database accesses. The hardware peer, on the other hand, delivers at least 10 times more throughput and sometimes much more because of the hardware acceleration. This is shown as the speedup here on the right-hand side. However, a very interesting point to note here is that this throughput is quite a bit lower than the validation throughput of the hardware peer reported at the top. From further analysis, we found out that even after offloading bottleneck operations to hardware, the fabric software running on the CPU is still much slower. We realized that we need to better optimize the software in tandem with the hardware. For example, we can use a better disk to improve ledger throughput and better overlap operations in hardware and software. We are currently working on these optimizations. Thank you. So overall, we are able to achieve better performance like this what we could see during the peak, like the pure validation aspect point of view, we are able to get up to 120X and whereas like the entire operations, we are able to get somewhere between 10 to 20X overall. So all this are happening using this particular LVO C1100 PGA core. So this is like commercially available core plank. So when it comes to implementation of this whole solution, so we follow like a standard IT approach, like all the deployment is happening through CICD because we have to try out like multiple times, like we are like entire desktop automation pipeline creator and all the integration that happened through like a standard rest interface so that as a customer or various application we need to rewrite. And also like from our creature point of view, we decoupled all the aspect in that way there's no tightly coupled and there's no dependency of either hardware or blockchain version or ERP version, all those aspects. And also monitoring is kind of key. We also created some additional monitoring aspect as you've seen previously in Prometheus and other aspect as well. So entirely we created like a end deployment methodology so that it can be ready for production deployment. Some of the insights, I cannot call it as a challenges. The first insight is more about how to do an order adoption. For example, like always there is some kind of dilemma whether I need to put like a new hardware and other things but nowadays in the enterprise data center outside it became like more and more commercial or visible. For example, for many of the training we use a GPU. Similarly, like acceleration also coming into mainstream in many of the enterprise data center itself. And as usual, get your management support so that the whole process can happen smooth. And compliance and legal is very important because here we're talking about a smart contract and other things which are more close to legal rather than like IT code. So we'd be involved with compliance and legal while we're thinking about a blockchain itself and other other concerns and questions, why blockchain, what it cannot do and what it can do, all those aspects. And security also a very tricky portion like as usual like InfoSec has tons of questions about the security of the whole solution as it is. So that also we are able to address. And UI, that is one piece we are, we haven't considered the initial stage. Then later we found out for blockchain we need like a different type of UI so that we can show the full strength of the blockchain. So this one of the overall key takeaways. So what we noticed is we are able to do like a better traceability with the whole model of this blockchain so that all the data is coming straight away to this particular blockchain aspect. And also we are able to prove like there can be the gray market tracking can be happened better. And overall we are able to reduce the overall administration and paperwork because of this otherwise we need to go and scrap some other website to find out whether that is valid and all those things. And overall we are able to find out like better transaction throughput which can be usable once we are going into mainstream with multiple partners and customers. So with this particular solution we are very confident we should be able to get minimum 10X and much, much more beyond using this order acceleration. Thank you. Any questions? Yes, please. Yeah, hi, thanks for this very insightful presentation. I have basically two, maybe three questions. So the first one is if you're using the FPGA how did you get the code into the FPGA that you use some kind of high level description language and then port it over via VHDL what's the tool chain that you used for that? And if you're using FPGA and want to scale it isn't there some point in time where you want to move all this into an ASIC because it's much cheaper if you want to have a lot of them? And last but not least, well this is maybe I don't know if you can comment on that but you mentioned that we have a shortage in the semiconductor supply chain currently but I heard that this is currently reversed because we have in some areas at least a surplus of chips. So how is that going to balance out in the near term? Hopefully, thanks. Maybe I can answer the last question like other two questions Harris can address. Yes. Okay, especially when it comes to shortage just trends are currently reversing but what we notice is whenever there's a shortage of semiconductor the gray market became like more and more visible and many of our customers it's in the CY like they end up buying somewhere in eBay they're not sure like is it a real product they're buying and all the suspect. So that's where we want to have like a product provenant suspect. The other two questions like yes, Harris. Yes, so for the first one the implementation is a combination of a few different setups. So we have used HLS, we have also used VHDL we have also used something else called P4 language which is specifically for network processing. So for example, our protocol processor that's implemented in P4 is a high level language and then you can compile it into VHDL. Some of the parts are in HLS but then the main core of the blog processor like the validation of transactions and things like that those are in VHDL for the best performance you can get out of it. So whatever we found make more sense we use that for the implementation. So it's not like you have to use a certain thing you can implement it in any way if you want as long as you get the performance out of it. For the second, what was the second question? Yes, so yeah, I agree with you. I think at some point we would have to move these into let's say hardened blocks either inside the adaptive associate or have a separate SIG for it once it becomes a more mainstream kind of thing. At this point it's more of still we're exploring we have something which you're trying to move into production but yes I would agree once it matures it should go and become a six or some hardened blocks inside FPGA chips. So the question is when it's gonna be available in the cloud? Cloud a little bit challenges now still we are in a pilot stage. So once it got options coming like much more or mainstream definitely there's a possibility of cloud but there's one chart because we are using network port of the FPGA which is not generally exposed in many of the public clouds but there are in a few specific use cases they were ready to open it up. So once that option became like mainstream definitely we can look into it. Right now it's currently available in some of the co-low clouds not a standard public clouds to try it out. Yeah, so basically the implementation is not specific to an FPGA card in your own premises server like Amazon F1 instances, FFPGA cost this should work there but because the FPGAs are receiving data directly from the network so the FPGA network ports should have network access which is typically not provided by cloud providers. Once that becomes available of course this should be available to try out there too. So one more question from here. So first of all do you really need to support 14,000 transactions per second in your system and also how many third party systems or internal systems you had to integrate to make this a viable solution. I wonder for this particular small client app we end up integrating with around seven to eight different systems for data theme. The transaction per second then many of the data is coming as a buzz to us. So that's why the transaction rate keep increasing is like a sustained like 20,000 transaction throughout the day. So we don't want to cube it like as much as possible we want to complete it faster but still it's currently focused on very very small set of product not for the wider AMD product range. Yeah, just to add for that 14,000 transactions as you said if you have a burst of transactions coming into the system if you know about queuing theory the delay and the latency for transactions will increase exponentially. So what we want is because you want almost near real time updates. So under the peak workload we want to be able to support as much transactions per second as you can just like a visa system where during shopping sessions like a shopping season they typically say that they have to do 65, 70,000 transactions per second but in normal workloads maybe they're only doing 10,000, 15,000 but if the system is not able to support 65 or 70,000 during the shopping season there would be huge delays like minutes, hours for transactions to settle down. So the same thing applies here as well. And that's why we have FPGA card to accelerate, right? So this is maybe a technical question but I was wondering where you got most of the speed up from CPU to FPGA was it the data access or the actual computations themselves? So yeah, if you're more interested you can look at our paper but the summary is if I remember correctly about 10 to 20% of the time is spent on accessing the data because all the data in HyperJet Fabric uses protocol buffers. It's a layered structure. So if you do profiling you will see about 10 to 20% of the time is spent there accessing the transaction, the data the endorsement policy so not policy, the endorsements and so on. The rest of the time is actually spent about 50% in the actual ECDSA verification. So the digital signatures which uses ECDSA scheme that takes about 50 to 60 and in some cases up to 70% of time with the SHA-256 hashing as well. And then the rest 10, 25% is spent in level DB accesses. So we are getting most of the speed up by using multiple of these ECDSA engines in the FPGA so we have like parallel and pipeline transaction processing which efficiently uses these engines to do the verifications as fast as possible. And then because we have in hardware database which is of course gonna be faster than what you will get as go level DB. So that's where the rest of the speed up comes from. I think one last question on from my side the Hedera consensus you can replace with the existing the order is with the Hedera consensus protocol. So that's what I heard. Have you tried that? If you do that then your problem goes away. I'm not aware of the Hedera consensus protocol and it can replace raft but what do you mean by the problem going away? Can you elaborate? I thought that orders will go away once you replace raft with the Hedera then you get this 100,000 transaction per second in hyperlature itself. I'm not very sure. I'm not a techie but the way I understood, I explained it. Okay, I'll look into it later and get back to you. I'm not sure about the outright performance but it is another hyperlature labs project. So you should better take a quick look at it. On the cloud side though there are a couple of like I'm thinking of Equinix metal which used to be packet. I mean it'd be interesting to sort of talk to folks like that possibly Ridge cloud. So in other words, once you get beyond the hyperscale clouds there are some bare metal cloud providers that could be interested in this. Yeah, we are already working with a few bare metal providers to see how we can host. So right now we are planning to host some of the demo environment so that people can come and try it out. So we are working with a few others in just a second. Any other questions? Okay, if not, thank you. Thanks a lot for having the session. We'll all meet you in Guinness. Okay, thank you. Thank you.