 So there's been a lot of excitement about ProgPowl. And what is ProgPowl? Well, it's a programmatic proof of work. And we're going to be tackling all your questions and answers today. This will be sort of a medium level expertise presentation, so not too technical, but not too generic. And if you really want to go in a deeper dive, you should check out the official GitHub repo with the official code and the white paper, as well as our medium accounts. But first, a little bit about the team. So too many people put an emphasis on the people behind algorithms or work, which is highly amusing to me, especially considering that cryptocurrency is all about no trust. But hey, I learned from the first time when we released ProgPowl that you actually have to do these disclaimers. So I figure we have to do a disclosure on who we are. So ProgPowl is made by three people, if deaf, else. I am Ms. If. I am currently Chief Technology Officer of Core Scientific. Core Scientific is an artificial intelligence and blockchain infrastructure and hosting company. I'm also the founder of the minority group who is responsible for our fantastic graphics today. And I'm also ex-genesis mining, the world's leading hashing power provider. And then we also have Mr. Deaf. Mr. Deaf is an experienced systems engineer. He's a blockchain enthusiast. And I like to call him a professional catherter, because that's pretty much what 24 hours of his job consists of. And then we have Mr. Else. So Mr. Else is quite an experienced semiconductor engineer. He has over 10 years of experience working in this space. And he is also an experienced GPU architect. And he loves ProgPowl. But not the ProgPowl we're all familiar with. He loves this ProgPowl. And together, we are if deaf, else. So what is ProgPowl? Well, as I said before, it's a programmable or programmatic proof of work. Or as we like to say, a GPU-tuned extension of at hash. It's also sometimes called ProgPowl or ProgWOWl after this little guy. Our little hero from Star Wars, The Lost Jedi. And it also takes its roots from the word programmatic. Now, I quite like programmatic, because it's a great way to explain algorithms. Algorithms are actually very musical. Every algorithm has a beat. And optimization, when we talk about optimization of software or hardware, it's all about matching the beat of an algorithm to the beat of a hardware. So at hash, I like to say, is a little like Trapcore. Minero is like Dubstep. Zcash is a little like Rock. And these beats, like I said, they need to synchronize with the hardware itself. So let's do a big deep dive. Right now, why was ProgPowl created? Well, today there's kind of a problem with algorithms. You see, traditionally, algorithms are designed by software engineers. Now, you might be asking, what's wrong with that? Well, would you want your hardware designed by a software engineer? The answer is no, because that's actually how you end up with things like these, the Windows phone. So we always talk about how you need to have people who are experts left to do the things that they're an expert in. Hardware engineers should tune for hardware. Software engineers should tune for application layers. That's the crux of the problem today, and how we get fixed function hardware, or in this case, A6. And proof of work has traditionally taken an algorithm that's fixed and tried to shoehorn in the hardware to make it efficient at executing it. But that leaves many unused parts of the hardware that you can just shave off or you can get rid of. Instead, ProgPowl flips this paradigm. We take the hardware, already existing, and we modify an algorithm to match it. Because an efficient algorithm for proof of work means hardware needs to match access patterns and available space of that hardware. Now, I know what everyone right now is waiting for me to say, given how vocal I am about ASIC resistance, but I hate to say it. ASIC resistance is a fallacy. There's no such thing. See, ProgPowl is mistakenly trumpeted as an ASIC resistant algorithm, but quite frankly, that's bullshit. Proof of work requires some form of an ASIC to do work, be it a CPU, a GPU, an FPGA, or a mining ASIC. And guess what? Oops, they're all ASICs. They're all built using the exact same fabs, the exact same technologies, the exact same materials. The difference, however, in this hardware is the class of algorithms they're designed to execute efficiently and that they're optimized for. So generally, when they say ASIC resistance, what they're meaning is centralization resistance. And we can dive into why that's important in the breakout session, but there's my statement on ASIC resistance. It doesn't actually exist. So why does hardware actually matter? Why do we care about hardware with Ethereum? Well, for the time being, hardware actually defines your user base. And no blockchain is suited for all user bases, just like no application is suited for all customers. GPUs, they're for the common folk. They're readily available, they're flexible, they're adaptable, they're quite cheap, and pretty much everyone has one today. Look at your phones, you all have a GPU in there. Every single laptop you have has a GPU. Every console out, video game console out there has a GPU. FPGAs, they're for enthusiasts, really rich enthusiasts. They're high cost, high expertise, they run hot, they're definitely not plug and play. And I should know, the minority group hosts around 5,000 of these things, so. And then we, oops, and then we have, of course, ASICs. These are really for enterprise miners. So they're relatively simple, they're plug and play, they're consumable goods, and they kind of have short lifespans and they generate a lot of heat and consume a lot of power. But the density of an ASIC lends itself really well to professional farms. So what user base do we actually want for Ethereum? Well, if Ethereum is a decentralized application blockchain, that means we kind of want as many participants and as many players as possible. Application user bases usually need to be as decentralized and distributed as possible to prevent malicious actors. And if you want an enterprise level blockchain, you also wanna make sure as many participants as possible are active to ensure your application and your code is executed on as many diverse nodes as possible. A GPU card is naturally decentralized. There's widespread adoption, it's in every device, and you have multiple manufacturers and a healthy ecosystem outside of the blockchain. Now, that point is really important. Because when you only have one line of business, you are naturally incentivized to protect it at all costs. That's an important statement. But wait, what about CPUs? Well, there's a problem with CPUs. See, they're naturally exploited for cryptocurrency mining. They're really great with botnets and you should probably trust me on that because I might have written a few back in the day. CPU implementations are also the easiest to execute on fixed function hardware because of their very single threaded nature. That lends themselves really well to fixed silicone implementations. On top of that, CPU implementations are also hard to tune for. Each architecture, shift, or each revision has some tricks of the trade to get the optimizations or in order to perfectly saturate the hardware. And of course, they're not that great at math compared to an FPGA or a GPU. They're kind of slow, not as dense. You can't really pack a mining machine full of CPUs. You can pack a rack full of ASICs or GPUs. Density is really, really important to mining enthusiasts. And also remember, GPU was a GPU based algorithm from the start. That's what memory hardness means. It means that you need a GPU. We simply tuned it for GPUs. Now, a warning. This is gonna get a bit technical from this point on. This is our et hash spaghetti and logo. Lego, rather. Et hash itself is relatively simple. It requires a semi-constant scratch pad file. We call it the DAG. And then all instances reuse that same DAG for the next 100 hours. Mining et hash really just involves grabbing random slices of the data set and hashing them together. And then the DRM peak bandwidth up there, which we can see, it's easy to calculate. It's 64 iterations, each reading 128 sequential bytes. So, et hash is really just a memory hard proof of work algorithm. That means it just requires two major things. A relatively large frame buffer and as much memory bandwidth as possible. Both of these things a GPU has, great. But guess what, an ASIC can have that too. So can an FPGA. So can a PCB with just a ton of DRAM and a little compute core. Because that's all you need, memory, DRAM chips. Et hash is only tuned to memory but not the rest of the hardware. And so again, how you make an ASIC, you just remove all the other parts that are currently unused or rather all the wasted space and you get a nice increase. That's ASIC creation in a nutshell, guys. Just remove all the unwanted or wasted parts, be it silicone or compute cycles. Voila. So the inefficiencies in et hash become more obvious when you profile a graphics card with say, et minor. So this is the top picture there is taken from insight profiler. That's NVIDIA's new performance profiler, pretty sexy. Bottom one is AMD's. So you see SM up there. SM stands for streaming multiprocessor. Those are kind of the computational cores of NVIDIA GPUs, which create, manage, schedule, execute instructions from many threads in parallel. And they consume most of the GPU's die area. So when we talk about a GPU ASIC, we're referring to the GPU core. As we can see, they run at less than 30% utilization right now. And on the bottom, codexl, AMD's performance profiler. We don't have as many details but you can see there, VALU busy. That stands for Vector ALUs. Vector ALUs are the digital circuits that perform arithmetic and bitwise operations for the AMD GPU cards. And it's kind of the same thing as the math core for NVIDIA GPUs. Both of these things are the building blocks of a GPU ASIC. So that means today, et hash only consumes 30%, less than 30% of a GPU core. So that's one of the major deficiencies with et hash today. And the reason you can get a large ASIC performance gain because really all it does is a small 128 byte read from the main memory. That small access size is a reason that GPUs that utilize GDDR5X or GDDR6 memory were so inefficient at executing et hash. Some of you may be familiar with our tool, et largement. It got kind of viral on the internet. Et largement abused this. So we matched access patterns of 5X and G6 memory to et hash to enable 128 byte loads to run at full speed. We have a public version of that tool for 5X. We have a private version of that tool for G6. We made a lot of money off that just because an algorithm wasn't effectively tuned to the hardware. We see up there the Titan X Pascal. It's similar to a 1080 Ti and it's pretty awful. You can see up there it has less than 20% core utilization and less than 60% memory utilization, which is on the bottom. There's also another issue, ketjack. That's the hash function at the start and end of et hash. And that can be executed much more efficiently on an FPGA or an ASIC. In fact, that's actually what the acorn line of FPGAs are designed to do. Offload ketjack computations to save system power and increase performance. You shouldn't be forced into buying extensions to your hardware just to compete with other hardware. But hey, guess what? ASICs can do that too. Profiling et hash with ketjack removed shows us that the compute cores of a card are really only utilized 20% of the time, allowing for a 10% efficiency gain right off the bat. Simly, our per source line profiling up there shows us that more than 20% of the instructions are ketjack, which can all be offloaded. That's a 30% performance gain, folks, 30%. Okay, so I've talked a little bit about the deficiencies of et hash, so what the hell did we do about it? Are we just gonna sit up here and trash et hash? No, we went and kind of fixed it. So that's our prog-pow spaghetti and Lego. This was made on a plane while I was slightly drunk, so excuse the mess. There are five major changes here. See, we changed ketjack from F1600 to F800. I'll go into why that was really important in a moment. We increased the mix state. We added a random sequence of math in the main loop. We added reads from a small low-latency cache that supports random addresses, and we increased the DRAM read from 128 bytes to 256 bytes. And this is what it all became. See, the most efficient and effective algorithms are the simplest, and any ASIC designer will tell you that. Complexity leads to weakness in proof of work. Okay, so why these changes? Well, let's start with ketjack and DRAM read. Ketjack hashes used at the start and end of et hash get reduced from F1600 with a word size of 64 to F800 with a word size of 32. Why? Well, GPUs actually have 32-bit data pods, and F1600 requires twice as many instructions to execute on a graphic card. It's wasted cycles. Et hash does not use the extra data processed by F1600, so reducing this amount of data has no effect on the security of the algorithm. But what it does do is reduces any possible efficiency gains from offloading the ketjack computations from the GPU. That eliminates your ASIC speedup. That eliminates your FPGA speedup, and I should have just put that up there. Oops. So bye-bye, 30% performance gain. We leave the number of accesses to the DAG, which is also the number of the loop iterations, unchanged from et hashes 64. We don't need to touch that, don't need to dig around with it. But the DAG read size, we increase from 128 bytes to 256 bytes. This allows ProgPowl to be efficiently executed on all current and hopefully near future DRAM technologies without requiring overclocking. That means there's no more system tuning. It just works off the bat, fully tuned, fully optimized. Now, for this part, see, as I said, GPU cores are most efficient when they're doing 16 byte or forward loads. And in order to have our 256 byte loads, we have to do 256 bytes by 16 bytes per lane, which ends up with 16 lanes working together in parallel. If I'm getting too technical, let me know. When we work back from the frame buffer interface, GPUs have an L2 cache, an L1 cache, and a texture cache. We use these things for graphics and virtualization. We haven't discovered a way or a method of making use of these caches that's both efficient and portable across GPU architectures. So we leave those alone. That does allow for a potential efficiency gain in a fixed function ASIC where you could shave those off. And we'll touch in that on the breakout session. So ProgPowl doesn't target those caches, it simply passes through the DAG load. But next to the caches, we have something that I love, which is called scratchpad memory. So let me tell you a little bit about scratchpad memory. It's this high speed internal memory used for temporary storage of calculations, data, other works in progress. Think of it like your brain. NVIDIA and QtR refer to this as shared memory. And AMD and OpenCL refer to this as local memory. The defining feature of this memory, compared to DRAM, is that it's highly banked with a large crossbar. That means it allows accesses to random addresses to be processed really quickly. Something that fixed function hardware doesn't do too well with. NVIDIA's Pascal line supports a scratchpad of up to 96 kilobytes. AMD's Polaris and Vega line supports up to 64 kilobytes. And the AMD OpenCL kernel currently required additional scratchpad space in order to exchange data between lanes. So in order to effectively execute on all existing architectures and not limit our occupancy, the cached portion of the DAG, our cached bytes up there, is set to 16 kilobytes. Now the compute core of a GPU card, the ASIC itself, is really just a large number of registers that feed high throughput programmable math units. The inner loop of EdHash just uses the DAG load and then the FNV to merge the data into a small mixed state. ProgPow, however, adds a sequence of random math instructions and random cache reads that get merged into a much larger mixed state. Why is this important? The bigger the mixed state and the more random, the bigger the die size gets on fixed function hardware. Speaking of FNV, I wanted to touch on it for a moment because I did promise some folks that I would. So there's this rumor and myth going around about FPGAs with some killer performance on EdHash even when they're just bound to DDR4 memory. FNV, full or no vol, is not a secure hash function. There is a possible attack where if you know iteration I is accessing DAGX, then only a small number of next possible DAGY will be accessed. And technically, you can do a pre-calculation attack where you pre-compute DAGX with enough SRAM or BRAM followed by all combinations of DAGY and you completely skip half the memory accesses. What that means is that you violate the memory bandwidth of EdHash. Most of these FPGA rumors seem to be required 64 gigabyte, which suggests pre-computation. Combining DAGs doesn't actually work, but in case there is an FNV attack, well, there's kind of good news. See, Prog Pal doesn't have this problem. EdHash is DAG to FNV to DAG, but Prog Pal, it's DAG with a bunch of random math, which with DAG with a bunch of random math, repeat, repeat, repeat until DAG. Now, multiple people have claimed to attack this weakness in FNV to date. And who knows, maybe I might have too. So let's move on to the random math function and why that matters so much. Randomness and fixed function hardware do not mix. Random math helps to match the GPU's programmability and integer math that they normally excel at. Fixed function hardware, like I said, doesn't like randomness. FPGAs don't really like it much either. They require reprogrammability via JTAG, USB, or some other metric to employ randomness. Some people think that the random math in Prog Pal is a bit naive. They assume it's cheap to implement on ASICs because we use bit operations. Our implementation uses rotates, which require a full barrel shifter. That's pretty expensive on fixed function hardware. And sure, if the bit stream on an FPGA could be updated every 12 minutes, it could definitely implement the random math portion of Prog Pal at a lower power than GPU by creating a fixed pipeline. But there's a problem with that. The cache accesses, however, fuck all of that up. So since you're gonna need a high throughput crossbar, you know, the things GPUs naturally have, it's gonna cause a problem. And we use KISS99 for our random math. And that's kind of important for a number of geeks out there like me. We don't use Mersane Twister because it's efficiently implemented on a specialized ASIC. Actually, we don't use a lot of the random number generators out there because we need something that pasts the test U01 statistical test suite. Why? Well, test U01 actually performs checks to determine whether a random number generator produces well, truly random numbers. For some reason, most of them fail horribly. So yeah, ZOR shifts. They're not your friend, folks. They're not your friend. So by now you're getting kind of bored and you're thinking, okay, okay, great. You talked a little bit about ASICs, what are the actual results? Well, boom, baby, look at that. Look at that, almost 90% utilization in both the core and the memory. And on top of that, we can also see our scratchpad memory up there, the second screen is also completely saturated. And then look at the 580. 580 has 88.8% occupancy and saturation. Up there on NVIDIA, 88.4%. Now in et hash, we all know that NVIDIA GPUs and AMD GPUs are not equal. They're not matched. They're not balanced. PROGPAL fixes that. A proper proof-of-work algorithm is all about balance. And yeah, while our Ketchak calculation is halved, PROGPAL does add that series of KISS-99 calculations as part of our mix stage. Remember we talked about why mixing is important? Well, the film mix stage has too much data to offload to an external FPGA now. It could be implemented on a really small chip within an ASIC using a small accelerator, but that's only 7% of our compute utilization. So we see up there 91.9% of the instructions are executed within the DAG access, random math and random cache accesses, things we can't move off the GPU. Ketchak and film mix only account for about 7%. That's only a 7% speed gain. So what does that actually mean? Well, in layman's terms, it means that an ASIC for et hash looks just like this. High bandwidth memory interface, Ketchak engine, usually on an FPGA and a small compute core to do FNV loop and modulo operations. But a PROGPAL ASIC, well, it's gonna have the high bandwidth memory interface. It's gonna have a compute core with a large register file, compute core with high throughput integer math, high throughput, highly banked cache, small Ketchak and KISS99 engines. Wait, I think that's also known as a GPU. And PROGPAL is also tunable. So that's a really important feature of your proof of work algorithm. So what happens if in the future, if Dev else isn't around to keep tuning PROGPAL for the next generation of GPUs when they come out in six years? Well, any GPU architect, and there are thousands of them, can go and open up NVIDIA's Profiler or AMD's Codexl and simply tune these parameters to make sure that the GPU is completely saturated. And quite frankly, we shouldn't be on proof of work for Ethereum in six years, otherwise I'll be very disappointed. And really, that's it. That's the technical walkthrough of PROGPAL. It doesn't have to be that complex. Proof of work, like I said, it's just about matching the algorithm to the hardware just through some minor changes. And so that's it. Thank you very much. So I imagine there's gonna be tons of questions and I'm petrified. And so shoot. Great talk, thank you. I wanna play Devils advocate and challenge the premise that utilization and saturation is actually a valid metric. And the reason is that as a GPU designer, you're not looking to solely optimize energy expenditure per unit of proof of work. What you're trying to do is you're designing for gamers where performance is a key thing. And also you have constraints in terms of die area. So you wanna optimize for power. So basically what you do is that you will design your ASIC to have a reasonable trade-off between performance power and die area. But as a miner, the only thing you care about is power, well, energy spent per unit of proof of work. So you could build something that is functionally equivalent to an ASIC, but might have a larger die area or might have a higher latency, but consume let's say 10 times less energy per unit of work. And utilization would be 100%, but from your perspective, that is not a good metric. So there are three things to address there. One is a philosophical thing where we talk about what proof of work actually meant when it was originally created. Proof of work kind of means work, AKA energy expended, but that's really a philosophical debate. From a miner's standpoint, miners really care about balance. Now, progpow, it does consume more power, simply because it uses the core, which was not used before. But the important thing with that is that you take an ASIC, that also has the core, also has the memory, take a GPU, same, it will match. Your other point was that miners only care about power, but I'd really contest that. Miners really, really just care about fairness. That's the crux of it. We care about being able to all be on the same playing field and the same level of field. And then you touched on the fact that GPUs are just naturally tuned for video games, that's not true at all. See, GPUs are tuned for maths. GeForce cards and AMD's gaming line, sure it's tuned for FPS, but by modifying the firmware, you can also just tune it for math. That's the great thing about GPUs, they're tuneable and programmable and customizable. What was the other parts of your question? You can try and minimize latency, you can try and minimize area, or you can try and minimize power, or you could find a reasonable trade-off. If I was to go ahead and build an ASIC for progpow, what I would do is I would make it energy efficient first. So I would choose my cells extremely carefully so that I don't care if the 10x the latency or if the 3x the die area, all I care about because the revenue as a miner is basically going to be proportional to the amount of work that you can get done per drill burned. Only when it's compared to other miners. Yeah, of course, yeah. That's the difference. Yes, so the other miners will use GPUs and I will use something that is functionally equivalent to a GPU, but potentially has a much larger die area and runs at a much higher latency. But when you go and build a progpow ASIC, you're only going to be able to shave off, I get your question now. You're talking about the fact that you're gonna shave off all the GPU pipelines, the extra caches, all the things we've left to optimize for power. It's not about shaving off things, it's about fundamentally designing the circuit, optimize for power as opposed to optimizing. You can't really optimize for power when you just have a bunch of random math churning through there. That's gonna consume a lot of power no matter what. The only way you can get the power savings in that circuit would be to probably shave off a few transistors. So let me give you one specific example. Let's say you wanna build a 256 bit multiplier. You have two options. One is that you can build a massively parallel multiplier where basically you'll have a 256 bit by 256 array and then you'll have a massive reduction trees with lots of compressors. And that's gonna use a lot of dynamic power. The other option is to have a much smaller circuit which does less things in parallel, like much more sequential work. And that is going to consume less energy per 256 bit multiplication. Now it's possible and actually it's likely that GPU designers did not build their circuits to minimize for low power, for low energy expenditure per unit of work. I wouldn't say that. I'd go and look at the GPU's design. Again, the ones we target here are GeForce. But if you actually go and look at AMD or Nvidia's GPUs that do target low power for every energy of work, it's their new Tesla lines, specifically for G6 that consume about 55 watts per card for about, I think it's 9.6 teraflops. That's a great example of optimizing for power. It's just about which line of GPUs you're going to target. So they do build things like that. Hi. So there is a speculation that this algorithm was developed by AMD and Nvidia in order to protect GPU mining interests. So. And you did obfuscate it, people who worked, who's behind it by calling them, if they're else. Why does it matter who created it? So two parts to that. First, the amount of work and research that goes into designing an algorithm, both AMD and Nvidia have way better things to do. For a start, cryptocurrency mining, and I'm really, really, really knowledgeable on this considering I contributed to most of it, only made about $384 million last year for them, and then it made a tiny amount of money this year. And you can publicly track that because it's all in their financial reports, perks of a publicly traded company. Now, their actual revenue per month for GPUs, or per quarter, I should say, is somewhere in the five plus billion. Which is a lot. It's a lot. See, they don't care about GPU mining. They're fighting the battle in the AI space, in the TPU space, in the CPU space, and in the FPGA space. The other thing is that if AMD had designed this, they would probably have done a better job at fixing their shitty OpenCL implementation, and they wouldn't have given me so much hell in trying to get fixes. And the other thing is, again, the whole point of ProgPow is to protect at hash while it transitions to proof of stake. We do not, let me make this very clear, if Dev else does not want proof of work to exist when proof of stake is ready. The main reason for this is to make sure that proof of stake has a fighting chance in Ethereum, doesn't get destroyed by centralized entities while it's going into its transition phase. I just think that there is a conflict of interest for you to push that ProPow right now. Why? I can make an ASIC, let's be very, very public. ProgPow destroys every single optimization that I get pocket money for. This destroys athelagement. This destroys all my private optimizations. This destroys the Acorn line of FPGAs, which I think some of you are familiar with. The minority group is selling, and we do very much support. This destroys a lot of things that we very much like. I'm just curious about the statistics and security of the random math. How do you prove that it can be performed much better than FME? Because I checked this back, it's just some random mathematics calculation based on the previous calculated value, if my understanding is correct. Previous block header, not the previous value. That's a mistake. So specifically the random math function takes a few parts. It takes KISS9's random math generator. It takes the random block header that changes, random block hash, sorry. And it also changes the input, changes every 50 blocks. Now it's tweakable where you can adjust it to change every 25 blocks instead, or every five blocks if you really want to be painful. It's FNV1A that we use, not FNV itself. Like I said, it takes CDAG, which is a combination of your case, your block header, and some random math, mixes it with more random math, keeps doing that for about, I think we're doing it 64, 64 cycles, then spits it out. The code, the code is the proof. Yeah, you can go and grab the code today and you can profile it yourself, usually through Synopsys or another simulator. Again, all in the white paper, yes. Everything's completely public with that. We actually did a huge write-up on Medium that we published two days ago about this, about the security of the FNV1A function. Just to the point you made earlier. So this is purely an altruistic effort by FDFLs to transition safely to proof of stake. God, yes, I'm, core scientific loves ASICs. Please, we want ASICs to be around because they consume more power. It's better for me. And DEFINELs couldn't really care less right now. They're GPU enthusiasts. We think it's really cool. We like algorithms a lot. We like proof of work a lot. It's kind of fun. I noticed that someone on Bitcoin Talk claimed that he found a way to represent the full... Zawawa, Zawawa, yes. Full tag in several megabytes. So what that will, if he is not lying, what will that affect the prog pow? I will, again, we pointed that out. It's the FNV1A attack. It does, or FNV attack. It doesn't affect prog pow. But yes, Zawawa is onto something. And he will be a very rich man. Is there any economic analysis on how well, first of all, how the hardware that you can currently buy improves current GPUs performance versus ASIC miners and how would that change in the world of prog pow? So the biggest thing is when we do our economic analysis, we compare silicone costs to silicone costs. So I never play in the realm of what MSRP is for GPU cards because there are ways around that. I'm guessing you're talking about the cost of an ASIC versus the cost of a GPU in the world of prog pow? Yeah, well, I know so that comparing to the cost of ASIC versus the cost of GPU plus Acorn plus all the other stuff in the world. Oh yeah, so the cost of a GPU card, we'll talk roughly about that. Today, it's around $100 for the memory on its own for eight gigabytes of GDDR5, G6 is around 150. The GPU ASIC itself is around $40 or $50 depending on whether it's Nvidia or AMD. And the PCB is 25 bucks and the heat sink, admin, miscellaneous, it's around $15. Now an ASIC, an ASIC is really cheap. So we talk a little bit about the E3 because it's the one I have the most experience with. I haven't analyzed a Lindsay's at Hash ASICs but the E3 chips, the DRAM chips were brought for $3.50 each. That's actually public on, if you check there, Alpeda's filings, you can see how much was brought in that certain period. Then the PCB is only around $20 to $25. The heat sink, the sheet metal, everything else, assume $30. So there is quite a large gap and then you have to factor in the MSRP as well. Now, in a pro-pow world, they're going to actually match on a silicone and a heat sink cost. And the only savings, as I said before, you're going to get is by shaving off those unused acaches that we can't really find a use for. And that's gonna save you around $2 to $3 per GPU card. So. There's also the ASIC design costs, right? And fab and manufacturing and all that stuff, right? ASIC design, R&D, I mean, I'm not sure how public, some people's costs are, but like for a C at ASIC or for even a SHA-256 ASIC, you can get them as cheap as 10K R&D cost. And a pro-pow ASIC, it's pretty simple to make since all the code is open source and hell. Maybe when we have some spare time, we can just make open source schematics of a pro-pow ASIC. Why the hell not? What better way to prove our point? Thank you. Yeah, I just wanted to make a, I guess, a quick point for people in the audience from this coming from Miner. Miners mostly just care about the profit that they're making. Correct. So the revenue is gonna be, you know, how much coin you're mining, times its value, you can sell it into, you know, all the miners have real world fiat costs. They have to pay their power companies in fiat, so they have to convert the crypto to fiat. So at the end of the day, they just care about their profit margin. And so, you know, if someone wants to develop a ASIC for pro-pow, and there's only a 20 or 30% speed up, just the development costs for rolling out the silicone and, you know, building all of that, which costs millions and millions of dollars, won't actually make up for the gap. So it's not like you guys have to be like just as fast as an ASIC. Even if ASICs are 10 or 20% faster, it's just not economical for them to even go out and deploy all that NRE costs to actually build it. So I don't think it's like you guys need to even make it so that there's no efficiency speed up, but it just needs to be that there's, you know, even if there's a 10 or 20% speed up, it's good enough. So, I mean. Right now, there's like a 7% to 12% speed up, but we also just like to make it so that there's no major speed up, because A, it's fun. Sure, yeah. B, fits with the chaotic theme. Sounds good. Just a follow on question for that, is that presumably, ProgPower works on standard commodity GPUs, which can be redeployed on other uses like artificial intelligence, even after Ethereum goes to proof of stake. And that's another very big factor if you're considering deploying 10s or hundreds of millions of dollars on an ASIC, which has only got one purpose, whereas a GPU farm can be reused. Is that the case? That is very much the case. I can talk a little bit about how awesome GPUs are. They're repurposable. I love that. You can play a game on it, though most miners won't be playing games with their mining farms, but you can use alternate proofs now. Golem is probably the one you're most familiar with. You can sell your GPU cycles now for rendering. So that gives mining farms a little bit more of a profitability and that contributes to a pretty cool world. Proof of useful work is something that's being explored as well. Trying to find a use case for all of these proof of work machines out there. There are some interesting things with computational fluid mechanics. There's some interesting stuff in the medical space. Nothing has been fleshed out to date simply because people haven't solved the liability problem that comes with having a very decentralized system. But Golem's on the right track, which is great. And that's taken just a year of development. Imagine what we're gonna see next year. So yeah, GPUs are pretty awesome because they're repurposable. And on that note, FPGAs are as well, but they do have a much higher skill level. You do need to go and rewrite the bit streams. You need to reconfigure them, tune them, and the amount of GPU compute architects out there significantly dwarfs the amount of VHDL and Veralog architects. Hi, great talk. I'm just wondering, you have tunable parameters. Is there integers? Yep. How about tuning which opcodes we use? I think there was like a list of 11 opcodes that we choose randomly. And maybe tuning also the hashing algorithm. Maybe something that, you know, you say that some version of Ketchak is better for some GPUs. How about some different algorithm altogether? Or maybe two different algorithms, one different one at the beginning, at the end. Just someone that's naive that says, why this, why that, poking holes, you know, at every part of your diagram. Can we do something different here? We can. The biggest thing you don't wanna do, though, is one, we wanted to make ProgPower simple today to slot into the existing at hash implementation. So we didn't wanna get dick around too much because that would have required way more test analysis. It dicks around with the CPU verification as well, which we know is very, very important to the Ethereum ecosystem. And quite frankly, as long as you have enough random math in there, it's going to make it impossible to offload. We could replace Ketchak with another hash function. I haven't really thought too much about it. We also had a really cool idea of merge mining as well, which would be a great way where, you know, other proof of work algorithms could use ProgPower in tandem to compute their final cycles to get a bit more ASIC resistance. That would actually defeat the nice hash problem that we all have in mining. But that's a lot of development work as well. And just to be clear, these tunable parameters are just for future revisions of GPUs. So to date, all of these are completely tuned for NVIDIA's latest line of GPUs, which is the RTX 2000, AMD's latest line, which is their Vega version three, I guess they're on now. And also we did have an Intel GPU sample, but I don't think that those will be massively used by miners unless you're very, very rich. So great talk. Thank you. Even though I think I probably only get 10% of it. I wonder from integration standpoint, how feasible, how far along are we? Because I remember seeing that the time to verify the proof of work increased by a margin that might make it infeasible to actually use it. Yeah, so let's make that very public. That is simply because of my terrible Go skills. We are really great on the GPU side, not so great on standard client code. We fixed that with an optimized implementation, had some great help from Parity devs, from Fminar developers, from the whole Ethereum community really. There is going to be, it's gonna take two times as longer to do a verification, simply because two times the data is consumed. That's the 128 bytes increased to 256 bytes. But as long as everyone's verifying proof of work the same, it evens out and it levels out. And a 2x slowdown is much more palatable than what it originally was, which was a 9x slowdown. So that's what happens when you don't optimize your code, guys. Hello, me again. On the legal front, someone might say some things are copyrighted, some code is sort of intellectual property and other people are running this code. Would there be any barrier? Does someone own this intellectual property? Does anyone own the copyrights to Progpow? Would there be someone that could later on say, okay, you're using my code, you're running it now? I mean, it's open source, so if you wanna go patent the open source code licensed under the MIT license, go for it, guys. No, there's no copyright. Again, this has been public, fully public, fully open source since, I think we released it in April. Oh, God, it's GPL infected. Well, there we go. There we go. I have several questions regarding the specs. Are there official specs now? Yes, there are. In the beginning it was a bit janky. Yes, I know. I realized that there is still some use of FNV in Progpow. FNV1A, I get it. No, the normal one. Oh, yeah, yeah. Because I've seen the primes used by the normal one. Why is that? Simply because we didn't wanna mess around with the original proof of algorithm too much because it affects the balance of the compute core of the GPU. If... Is it in the deck generation? Yes, it is. Oh, okay. It is. Again, if we go back to the original... Then the question is if we shouldn't just change this, if we change Progpow, we should maybe just change this if this guy from the... We can, we can definitely take FNV out altogether. We'd need to find another really light verification fast hash that would just require some sort of research. Maybe we should just change it to FNV1A. Yeah, we can do that. Maybe, yeah, okay. Slice it all out. Then the cache, the small cache in the shared memory. Why is it filled that way? It's filled like the first 128 bytes from the deck or something or the first elements from the deck. Why that? Why not some randomness or something? Simply because that would... So if you were going to make a Progpow ASIC because it doesn't saturate all of it completely, in theory, in theory, you could go and create a sort of a compute core that takes all this randomness and does a pre-compute with an FPGA. It would only have to do it every 12 minutes. You could have a very dedicated bit stream developer. And you wouldn't always get that balance between this is the important part. You wouldn't get the balance between an NVIDIA GPU and an AMD GPU. Sometimes, if there was too much randomness, you'd end up going way over the cache availability for an AMD GPU. The AMD GPU here is the big bottleneck. I think both architectures have to save the same amount of data. Correct. But if you have the randomness changing the bytes, that screws with the saturation. And then that would cause the AMD GPUs to underperform. We did try that early on. It didn't work out too well. Then the update. So we have an update every 50 or 60 blocks. Yeah, we put it at 50 blocks. I think we finalized on 50 blocks. You could change it to 25 blocks, but there's really no point. 50 blocks is pretty good, but I guess if you wanna be super hardcore, change it to 25 blocks. What's the overhead of this update? Do you have some? What do you mean by overhead? Like, do you have to pre-comput the kernel that runs? We do. On a GPU, it's really fast. We publish it publicly. It should be like 0.2 or 0.3 seconds. 25 blocks gets a little painful. On an FPGA, you're still gonna have to go and re-push out the bit stream every few blocks, which will be painful. That will be a three to seven minute overhead. Do you know how much CPU that computing the kernel costs? I don't actually. I should measure that. I know that some of the mining rigs are running on almost full capacity on this really small Intel cores. Most of them have a Core 2 Duo, which should be okay with it, quite frankly. And all of our computation as well takes place on the GPU. And we can actually make sure that the CPU is just doing a very slight amount of verification. So we could actually tune that. That's more of a, that's not on the proof of work side though. That's more in the application or like the client side. So definitely we need to work with the Ethereum developers for that. Again, the clients are not our strong suite as is proven by the CPU verification. Will you update the EIP to have the new spec? Yes, that has been on my list of things to do. I did get the medium article out. I did update the GitHub, but yes, I need to update the EIP. Will you develop some test cases for other clients to use? Yes, I need help with those, quite frankly. We couldn't, I think we could... That would be amazing. I do not have enough spare cycles to date to be able to do all of these things. Okay, thank you. We have about five minutes left. Could you speak a little more about whether this changes the bandwidth consumption on the bus, on the GPU because of the increased cache size? Yeah, that's the whole point. It actually does change the bandwidth consumption. Again, generically, it's only 60% consumed, which is a problem. We do 90%. You can never get 100% bandwidth consumption in real world applications. It's just, it's impossible. It's one of the things GPU architects struggle with. So that's the best we're able to do to date. In theory, you could... I'm sure there's like a trick, and I'll go noodle on this because if PROGPAL does get adopted, naturally I'll wanna go and break it again. I'm sure there's a 1% speedup you could get with some clever tricks on the minor side. But other than that, yes, it changes your bandwidth consumption, consumes 90% of the bandwidth. That's basically what we mean by bandwidth utilization. It's how much bandwidth is consumed. Hi, I'm still thinking about how to break your scheme. That's okay, man. That's okay. I wish, I wish DEF and elsewhere here, even if they were wearing masks, they'd love you. So, I mean, even though you, let's say I assume you're using 100% of the GPU, you're still using... 80%, but yeah. Let's assume you get it to 100%. You're still not using the full flexibility of the GPU. So for example... Well, why? You're not using the full flexibility of the GPU. So for example, there will be a circuit that does addition and then a circuit that does multiplication, and then you call them one after the other. And the reason why these two circuits would be separated in a GPU is so that you can mix and match. But because you're calling them in a specific order, as an ASIC designer, what I could do is I could build a clever circuit that does A and B at the same time and uses less energy than would be if you first use the A circuit and then the B circuit. Yes. Actually, that was something I was gonna put in my slides. We use a... Again, I mentioned we use a barrel rotate shift. Our implementation of rotate is more... GPU implementations, I should say. It's more efficient than any existing ASIC to date. And any existing fixed function hardware. Again, GPUs are... I think most people forget, GPUs are really just ASICs for math. That is what a GPU is. So what you're specifically referring to is a barrel shift rotator, and the GPU to date is the most efficient implementation of it. That's why when I say that, if we wanna go and make a prog-pow ASIC, it just ends up mimicking a GPU because you just mimic that. I'd love to touch more with you after this on that, actually, because I have a write-up on that, because we got that question asked a lot from an FPGA developer who was talking about putting in square roots and other things into prog-pow. So I'm done. You're releasing me from my prison.