 So why we choose this name? Because it's a very, very useful English word. So I think there are no conflicts with other project names. And to start, let's talk about my understanding of open source hardware. So everyone is know open source software very well. And a full name, there should be free and open source, not just open source. And one very well known event is that the Kudung project started by Richard Storm in 1983. Then of course at that time, people do not know about what is open source hardware. So some people think that hardware is cheap enough. We can buy it everywhere. It's kind of open source, free and open source hardware. But nowadays we see more and more open source hardware. And we adopt the idea for open source software and then build hardware. So nowadays people know about what is open source hardware. So it's the hardware design. It's the design that can be freely obtained in the public domain. So it's a hardware design. So let me show a few well known projects. So what is a processor? A RISC-R processor. So it's an open source hardware. So you can download the specification of the instruction set. And you can also download the hardware architecture design. So if you have money to take up, you can send it to the fund trees, to Teba trees. So the design is freely available. So it's an open source hardware. And the Juno and RISC. So you should be aware of the two projects. So you can get the specification of the PCP board. You know how to buy the components. And you can send the design to Huachan Bay and take one for you. So the design is freely available. So it's also an open source hardware. And they're also a project for the server for data centers. So Overcompute is a project started by Facebook. And they release the specification of electrical design and also the mechanical design. So if you find some manufacturer that can build a server, you can send the design to them and get a server according to their specification. So these are all examples of open source hardware. And in our project, we focus on open source processors. The processors are part of hardware. So it's not a PCP board, but it's only a chip. And according to our definition, the design of the chip is an open source hardware. So it includes three parts. The instruction set, the micro architecture and the tool check. So instruction set and micro architecture can be built in about kind of hardware. And we also need the software to make it runable. So we need a compiler to make it executable. So let's see, review the examples in the journal purpose computing that I just mentioned. So we find an instruction set as ISR. And it has also released some demo implementation. So for example, the rocket chip project. So you can download the source code and you can compile to get the robot, to get the hardware and description language that you can manufacture later. And also we release the RISC-V GCC. So you can compile from a C language to the binary program that can execute on your RISC-V processor. So it consists of three parts. And in our project, we are interested in the specialized processor. So we also propose an instruction set and we implement the micro architecture and we also implement a compiler that can. So this is our project. And to make the open source processor system complete, in fact we also need the ANA or IP. So for some of the DDR, IP, Wi-Fi, Wi-Fi or USB IP. So that should also make available. But currently we only have the open source protocol that are implementing digital circuit. And I don't know how to make the ANA or IP open source. So if you have an idea, you are very welcome to connect it. And also we mentioned the software, the compiler. They are part of the open source hardware. But think about that, after you get the Rivalon, after you get the hardware and description language, you still need the E-day software, the design automation software to get a layout from the Rivalon. But currently the E-day tools are also very expensive, so most of them are commercial. And although there are some open source tools, but it's not complete yet. So there's a project in the US called OpenRoad. So they try to get a complete flow. And we also have a future plan to get a similar project to get a complete E-day flow. So the green one, RISVINE. So you should know there are a few associations based in China, RISVINE. And in our project, we focus on specialized processes. So it includes three parts, the compiler backend, the instruction set architecture, and the micro architecture. And we would like it to be adaptable for the AI applications. And then later on we hope you can drive the development of the RISVINE ecosystem to be part of the specialized code processing. And also we would like it to drive the rapid development methodology. So this is a general idea and the purpose of our project. So let's see briefly what our project improves. So it improves the compiler backend. So we start from some neural network format, for example, a TensorFlow or PyTorch framework. And then currently we have a lot of compiler frontends to use. So we make use of some compiler frontend to get the intermediate representation. And we start from the intermediate representation to generate the instructions for our processor. So we generate instructions and it can be executable for our micro architecture. So for the specialized processors, especially for TensorFlow processors, we collected a few other open source projects. So for some of the NBDLA from NBDL and Vita from University of Washington. And also the Dyna from Boston University. So these are all examples of open source TensorFlow processors. So you may also try to run it on your own. But our project starts in a different way. So I will explain a little bit. So let's review two example projects. So on the left hand side is the NBDLA. So the NBDLA people start from the software stack. Because in NBDLA they have a lot of software engineers who work on the QDNM or the optimizations. So first they have the software stack. And later on they develop a chip for Autonomous Strident. And they release the source code for the accelerator on their chip. So that's the NBDLA. So they start from the software first. And later they release the open source processor. And our Vita is another example on the right hand side. So it's originating from University of Washington. So it also starts from the compiler first. And then the processor only uses an example to demonstrate how to use the compiler. So our project is different from these two. So we start from the processor, the microstructure design at the beginning. Our goal, so we mentioned, has two parts. One is to drive the ecosystem of co-possessors in the RISCY ecosystem. And to the downstream in the flow. We also would like to drive the open source EDA tools. So we would like to have one open source processor. It's not only can be used as a co-possessor. It also will drive our design methodology. So we hope to develop more design automation tools around this project. So that's our rubber project. So as I mentioned, it includes three parts. The micro-adjecture, instruction set, and the compiler backend. So you can see we only use 5,000 lines of code. So we use the same program and language as RISCY. We use CHISO to implement architecture. So we only use a few lines of code to get a very basic version. And we also implement a shell for the science FBJ acceleration count. And it will be easy to port the design for the embedded FBJ walls. So you can run either on the FBJ server or also on the embedded FBJ card. So before we start the project, we also think about this question. So what's the motivation of our project, especially when you can buy AI chips in the market? For example, you can buy the chips from Huawei. They build a very powerful AI processors. So why we build another open source ones? So we have different purposes. One is education, of course. In universities, we can educate students, use this framework. So students can easily implement new technology. They can replicate some ideas in existing applications. Or you can organize and try it out. So my two students work on these projects. So they only talk about Huawei development and also the deep learning separation. But after this project, they are kind of experts. So in the afternoon, you can talk to them in our workshop. So education is a very important purpose. And also new scenarios. So we find that in the IoT domain, there are many fragmented application demands. So we find that commercial AI chips may not be motivated enough to serve all the small markets. So open source ones will have the opportunity to serve these fellows. And also new algorithms. So currently, when you already build your AI chips, so it's very useful to add more functionality. So when you have the open source ones, you can adapt to the algorithm in a shorter turnaround time. So you have all the source code. You don't need to compete with the commercial AI chips. You can just try to improve more features to see whether it satisfies your demand. And these are all the motivations. And we also hope to use this open source processor to drive our design methodology. So we are developing an AI drive, developing a flow based on our algorithm. And here is the flow of how we design, test, deploy, and estimate the chip. There are multiple stages. So at the beginning, the chip design stage. Chip design stage will use a cheesel language. So you can only leave your laptop or both stations to develop your processor. And then the second stage is the MBJ prototype or deployment stage. So you can buy an MBJ laptop or use only a few hundreds of IMDs that should be affordable by most developers. Or you can rent an MBJ server on Huawei, Chrome, or Alipay. So you can either put a card in your chip, use an MBJ infrastructure, or you can even use MBJ as an intermediate product. Because MBJ is at least faster than your ARM processor or your RISCI processor. So you can also use MBJ as an intermediate product. And then after that, when you find that your market is doing enough, or there are some advantages, for example, for the performance or the power, then you can go to the estimation stage, basic estimation. So you can estimate the power performance area and the cost. Use this stage. So the stage is with a right mark. It's a mark that means you can perform all this development using your own laptop or something easily available. So that helps develop a lot. So we already figured out how to do this with only your laptop or processor. But at the end, if you would like to tap out, tap out is out of the open source domain. Because you have to pay the money for e-day tools, pay for the function, and pay for the packaging and testing. So it's no longer in the open source hardware domain. But at the first three stages, you can all develop on your own. And then at the end, so you may decide whether you should take out a chip or not. And in the second stage, when we deploy our processor in the FPG class, this is a really illustrating example. So you prepare the data, and you compile the program or the CPU. And then MQJs nowadays have very useful infrastructure. You can transfer the data and the instruction in the program to the MQJ class to this memory. And then you can start our processor. The processor reads the data and reads the program in the memory. And then after you compute, it stores the data on the memory. And then CPU can move the memory back to the CPU memory. So that's the deployment of the processor. So you can use this way to test your processor. So you can run, for example, like 200 mHz. So that's fast enough to do the processor. So here's a short introduction to our Instruction Set architecture. And we do a survey on the popular Instruction Set architecture description. So for example, the ISA for HANUJI, for the Chemicon. And also for the ISA for Google TVU. So we study a lot of the Instruction architecture. And then we propose our... So our Instruction Set architecture consists of two parts. One is the manual Instruction. So it's like a SISC, a complex Instruction Set architecture. So we can move data from memory to the oxygen memory. Or move data from oxygen memory back to the oxygen memory. And then we can also SQ to some computation inside the chip. And for the computation, we can describe a more complicated behavior. So we also define a level of micro-operation. Micro-operation. So each complicated Instruction can read a few micro-operations to know what we should do in one Instruction. So the micro-operations like the RISC, the Rethinks Instruction Set architecture. And then in our processor design, if we would like to support one new Instruction, how to do it. So we should modify the micro-attention and also the compiler backend. So for the micro-attention, it's easy to implement. To support one new Instruction. So we just add one more macro to handle that Instruction. But of course when you add more and more Instruction, then you may use up your hardware resources. Then you can think about how to do the optimization. So currently we do this manually. And later on in the next one or two years, we hope to develop some design tools, some automated tools to do the optimization automatically. So for example, we have one macro to support Instruction ABC. So to support a few Instruction that rely on a more final gray modules. And also we have another macro-wide that relies on a final gray modules. So we may merge these two macro to be a single bigger macro that support both Instruction. So that's a basic idea. To add the support for a new Instruction, we just add one more macro. And then later on you can think about how to merge these macro to save hardware. And we also need to do more on the compiler backend. So my students will show how to implement to support one new Instruction and how to modify the compiler backend in the afternoon in the afternoon switcher. And if you know about VTAG, you know about VTAG that I mentioned, started at University of Washington. You may find that our design at this stage has a benefit in common with VTAG. So because we use the same compiler backend, we use the same compiler PGL, QL. So we look at the same at this stage. But our compiler supports both the VTAG Instruction and the new Customized Instruction. And the difference is that we are looking at this design for tensor processing. And our processor is one example that we would like to study how we should develop a processor for specific domain. And the ultimate goal of our processor is to develop a design methodology, not just of a single processor. A design methodology when we have a new domain, how we can develop a new processor. And VTAG, the main purpose is to serve as a core processor. And our relevant processor can be a standalone processor. So that's the difference. And this is how we do the compiler backend. So I will skip the details. So currently we implement all these modules in the Chisel language. So Chisel language use SCAR as a model language. And it's a domain-specific language built on top of SCAR. And then at the end of this game, we release the documents. So currently we already release a source code. But it's not helpful for developers. So we go to get the document as soon as possible. And currently we are implementing this version, a very simplified baseline version. So we can compute the vector processing, matrix processing, convolution, prudent. And then later on, we will extend it to support like a few irregular small tasks. And also extend it to be this architecture to support the high throughput big tasks. And we already found some co-operators to work on this version. And then in the document, we will show in the document that we are working on, we will show how to include a new feature and how to modify that one more architecture and how to modify the compiler backend. And in the future, so we can be part of the RISPY ecosystem to be a co-possessor. And later on, we hope this design methodology can drive to develop a more specialized co-possessor. So there are some details we are working on this design flow. So we start from the computation for specific domain. So we build the macro library. The library can be synthesized to become a micro architecture. And then the computation can be described as a compute graph. And then we can choose how to implement the compute graph using the micro architecture. And we will do the placement and routing. So that's what I made at the beginning. We would like to use the variable processor to drive the development of the design automation tools. So currently, the core members only have three percent, one faculty and two students. So we start this project at the end of last year. And we develop like a synthesizer for sleep. And currently, you can download the code with an open-eye website, openeye.org.cn. And the project number is 99. So I like this number. And in the afternoon, there will be a workshop to show how we can develop, support a new construction using this infrastructure. Thank you.