 Okay. Sorry about that. So, hi. My name is Skyler. So, I'm a developer of a new hardware description language called Chisel, and also related to a thing called Fertil. But what I'm going to be talking about today is about how to build accelerators for the accelerator socket that ESP or that Luca was just talking about for ESP. But about trying to use all the power of software engineering to make it dead simple for people to build accelerators. Okay. So, this is a classic academic slide that's basically just saying that accelerators are important because we're in the end of Moore's Law era. So, one thing you can always do to get a lot more performance is go and build accelerators. However, with one caveat from a pretty cool paper, it was just saying that the big benefits you get from accelerators is one time. It's not about iterating on the, right, we got Dave Winslow in the room. So, basically just saying that, hey, if you want to build an accelerator, you're probably going to get a big benefit going from CPU to accelerator. Thanks, Palmer. But after that, so basically just the design is important. You want to make it easy for people to build accelerators and do it fast. Okay. So, but usually I like to think about things in terms of like type hierarchies. So, really what we're concerned with at this point are loosely coupled accelerators as opposed to tightly coupled accelerators. So, something like SIMD instructions you might add to a processor. This is like something that's like a fixed function accelerator, something that you would think about and you have a C function. You want to go and build an accelerator that's going to go and accelerate that C function. So, in this case, things like a fast forward or a transform, or the model that you would have for something like a GPU, where you do like a CUDA mem copy over to a GPU, you then go and run some kernel on it and then do a CUDA mem copy back, or stuff like a neural network. I have a neural network and I have an image and I want to classify it or something like that. So, this is sort of the restricting the domain of the problem that then we're talking about here. And just to sort of concretize that, this is sort of just like an example like toy timing diagram of an accelerator that is just doing a computing the sum of all the memory, or computing the sum of an array in memory. And what that looks like is you start off at the top and there's some configuration. You're saying, hey man, go look at address hex zero or hex A. It's a vector of size three. And I want you to compute the sum and write the output to address B. So that happens. The accelerator says, OK, I'm working. It goes off and does a DMA request onto address A, goes and grabs that data. The data comes back over multiple cycles. Here you're seeing that with one, two, three. The accelerator does some computation. OK, computing the sum is not that complex, but it finishes some time later. The result is six. It writes it back and then reports that it's done. So these are like the sort of the class of accelerators that I'm talking about for this. All right, and this example of sort of building up an adder accelerator is sort of goes through this whole talk. And there's also an example available on the GitHub right now. OK, so Luca's embedded scalable platform comes into the picture here because it's basically saying, we want to be a platform where you can go and just grab your accelerator. You have an accelerator. You want to build an accelerator and just integrate it with the system. And sort of the pitch for that is great. Bring your accelerator in any language. The accelerator can then be easily integrated with this SOC. But there's sort of a couple of little flies in the ointment here if you want to go and build in like Veralog or VHDL. So and the first thing is the ESP framework has to know about the accelerator. So in the sense, there's all this metadata that you have to describe to that nice, pretty gooey that Luca was showing. And that's all in the sense of an XML file. The second thing is there's sort of a defined interface for this ESP accelerator. But you can also configure that with additional configuration lines if your function has more parameters, right? Now, the thing that's annoying is if you go and just, you know, put this in a read me and just say, OK, you have to write this file. You know, maybe you provide a schema for it. And you also say, I have these wires and they have these names and you have to code to that. That's really brittle. You know, you're basically saying that a user has to go and parse a spec and come up with a bunch of correct strings that are going to work with this system. And I sort of come from the school of, you know, I would like to just provide something that is, you know, I would like to get a compiler warning if I screw up any of this. So for that, it just is one more example just of what this accelerator kind of looks like. You have an ESP accelerator here. The things which are in green or sort of a white color are things which are fixed. So the Exocket is providing a DMA port, a debug port for like an error code when you're done, something the accelerator can say it's done and something that the processor can then say for the accelerator to start working. But everything in red is something that the user has to go and implement. And these are things like you have an optional, an optional vector of configuration parameters that you have to pass to this. You have all of the business logic associated with the internals of the accelerator. And then you also have this XML, which is supposed to be consistent with what you've written here. Okay, so the usual answer for something like this as well, just code it up in, you know, Veralog, System Veralog or VHDL. However, Veralog and VHDL are kind of annoying because you don't have optional IO. This is just like a fundamental feature that they don't support. Second, we're kind of coding in the 1970s here or even worse, even earlier. But there's no object-oriented programming in system Veralog or VHDL. With a slight caveat, the system Veralog supports classes but only for verification. You can't go and build hardware that uses any of this stuff. And with the slight caveat that even like, it's like with the BaseJump STL stuff like great work for Michael Taylor. But he's even highlighted like System Veralog has some like annoying features that make it hard to try and do this kind of stuff. So where then I come in is with, so a language sort of jointly developed at IBM, UC Berkeley and startups, well, not so small startup called Sci-5 called Chisel and Fertile. What is Chisel? So Chisel is a hardware domain specific language. So think of it's like you have a bunch of classes that are describing hardware components and then you can go and extend them and add things to them. And Fertile is a circuit IR. So you guys all know about like LLVM IR. This is basically just LLVM IR is program IR. Fertile IR is for describing hardware circuits. And you put all this together and the benefits of the fact that this is a language in Scala is you get all the benefits that Scala has to offer. So you get simple parameterization, you get parametric polymorphism. You get first class function support, functional programming and object-oriented programming. So basically you don't have to wait for the vendor tools to come along and say, well or you don't have to wait for a standards body to come and build out to come and make an addition to the System Veralog specification. You can just say, hey, I'm going to go and use all of this awesome power to go and build hardware. So what is this process then actually look like? You write your circuit in chisel, that runs through the chisel front end, that generates fertile IR, that runs through the fertile compiler, that generates a lowered form of fertile IR and then that runs through a Veralog back end and you get Veralog out of it. You can customize this whole process with custom transforms that you inject into really any stage of this process but here it's just shown in adding custom transforms into the fertile compiler. Okay, so there's a whole website on this chiseling.org. It's an open source project, check it out. But what really what the point of this talk is, is how you can try and use chisel to restrict and define sort of like a hardware API for this ESP accelerator socket. So the sort of abstraction that we came up with is this notion of a specification and an implementation. So a specification is sort of the encapsulation of one of these sockets and then the implementation is the actual hardware associated with that. So for one specification, you can think about lots of different implementations. Like as an example, you could build an FFT accelerator. It could either be pipelined or not pipelined. Those would be two different concrete implementations but they would all have the same specification. So the specification is the thing that handles all of the configuration. What are my IOs? How much memory do I need? All of those types of things, but the implementation is the actual hardware. So for this then, and this is also sort of a type hierarchy kind of thing, from the chisel three module class which is just a generic hardware module. You have an ESP implementation that extends that and an accelerator mixes in a specification with that. Okay, and so I think this is on later too, but this is the website of the project for the specific part of it. And then sort of going off of this example adder accelerator that we have from before. So this is sort of what the code for writing the specification for this adder is. So you have three parameters here. So these are the things that you would configure. You have your read address, your size, and your write address. Some additional things that are useful for humans to look at like a name or a description as well as things that the ESP framework cares about. Like what is the memory footprint as well as device ID. So what's cool is that I can just say, hey, you as a user, if you wanna go and work with ESP and write chisel, you just have to implement this API. This is, I mean, it's just an abstract class. If you don't implement it, the compiler yells at you. Or if you miss something, it'll yell at you, and that's nice. You want errors as early as you can get them. All right, so then the implementation is really just something that's mixing in that spec, you define the actual implementation name. So how do I differentiate this from other implementations? And then finally, all of the business logic associated with this. So if you wanna look at this example, I finished it, it's up online last night. You just go and look at it and check it out. Okay, so sort of the quick kind of can demo for what's going on here. You just go into the project, you type SBT run. This will go and build all of these accelerators for you. One of them is this sort of toy adder accelerator. But what you get out of that is just automatically you wind up with an error log for this accelerator, and you also get this XML file. And what's really going on here is that we add a transform into the fertile compiler that goes and looks at the design. And then emits this additional XML data which we care about. And then these are the two things that the ESP framework wants to consume. And then you're ready to go. So basically, there's no confusion on what is this supposed to look like? What is the interface? What is the schema for the XML? We just take care of it for you. And I think just, yeah. And just give you some concrete idea of what that XML output looks like. This is what it expects. What are the configuration, the optional IO that I have for my design? And there, and you can see this example online. And then sort of finally just wrapping all of this up. So we currently have like three ESP chisel accelerators. Two of them are toys, one of them is not. There's a counter accelerator, which is just like report done after in cycles. There's this adder accelerator that we have and we also have an FFT accelerator that integrates work from UC Berkeley on building nice, fast, efficient FFTs. And future work for this kind of stuff. So obviously like this notion of emitting extra XML, you could also think about emitting test benches, emitting Linux drivers. Trying to make it just super, just dead simple for people to go and write hardware. But also if they don't want to use the system C, HLS, high level synthesis kind of approach, they can do this but still get the benefits of all the collateral that automatically gets generated. All right, so this is just links to the project, some stuff about me. So the main project is ESP project. There's the chisel accelerators, which is a sub module of that. For chisel three, that's on free chips project, but that will eventually switch to another project called Chips Alliance. We have a chisel three Twitter, there's the fertile project and my GitHub, if that's of any interest, and that's it. And I think I somehow got back on time after the interesting start. So thank you. Sure, any questions? Shoot. So we try and just take advantage of open source tools. So Verilator is like an open source compiler of a Verilog file to a C++ executable that you just build with GCC. So that's like the how do you get it to C++ to simulate it. So we just use Verilator for all of this stuff. But there does exist another project called Tretl, which lets you just directly simulate the fertile IR. And you can use that for testing too. So we have all bunch of unit tests with this thing. You just type SPT test and it runs all the tests for the accelerator as it goes along. I'm actually not, sorry, I need to repeat the question. So the question was can you go from system C to fertile? I'm not aware of that right now, but the project EOSIS does have a Verilog front end and a fertile back end. So you can theoretically try and take advantage of the fertile compiler ecosystem from Verilog. So you could go system C to Verilog to fertile if you wanted to, but you'd probably lose a lot of semantic information along the way. Yeah, so the question is what are the benefits of chisel versus something like system C to Verilog? So generally the way that I understand you're gonna use system C is sort of if you have these macros and you're very restricted in the set of what you can do. With the benefits of chisel it's basically saying you get all of the last 40 years of software engineering and you can really apply whatever programming paradigm you want to the process of hardware generation. So you have first class functions, you have parametric polymorphism. There are no restrictions in the sense of you don't have a synthesizable subset. It's just whatever you wanna use to describe hardware, how it connects. And you can build more complicated libraries on top of this for doing whatever you may need or your company may need. So, got a minute, anything else? Do you wanna borrow the minute? I can yield time too. So I'll be around if anyone wants to talk about chisel or any of this kind of stuff today and tomorrow. So thanks guys.