 I'd like to talk about why I'm actually working on FPGAs at the moment and what I'm doing on FPGAs. I started off in electronics quite a long time ago, probably 25 years ago. But in all that time I hadn't really used FPGAs. So let me tell you a little bit of a story about how I got there. I probably recognise some of these people, certainly the two on the left. One of you may recognise the person on the right as well. We'll say more about him in a little bit. I'll talk about why I'm working on FPGAs. So here's me and some of my friends. It's my friends that need the FPGAs, not me. I'm a lad. But it will help you communicate with them. This is something we're seeing a lot. This is somebody's idea, a company's idea of the future in terms of very high requirements in terms of getting information in and out of silicon, processing that information, by our FPGAs, etc. So for me it all started in my childhood with one of these. I don't know if anyone recognises those things. I see Kenneth smiling. I probably had something similar, which sparked my interest. It got me through a few different magazines working my way up to different kinds of publishing. I would make things. It didn't always work. It mostly didn't work in fact, but it actually got me going. Then one day I got one of these. This sent me off on a bit of a tangent. It led me on to how these things come together, how they work, how to use them. You probably see some numbers there. 6809, one of my favourites. Z80 of course. 6502. Which meant I had to understand this. Anyone know what this is? This is a van Neumann architecture on which pretty much everything I've been writing software on. That's assembly or code for the last 30 years because of him, of course. I had to learn all this stuff. When I first learnt this, we didn't have much in terms of computer to run on. We were inputting in things like hexpads and all sorts of strange devices. We did a lot in workbooks and notebooks and things like that. We'd have to talk through our programmes. We couldn't always get access to a computer to actually run them. We had things like this. It's quite a luxury in those days where you could actually put your programme in, although you had to convert it to hex in order to do so. Then later you've got things like printouts of your programme. You could record them. Eventually, I think this is from an apple, you could actually manipulate these on screen and start running them interactively. I was still also working with these. At the end of the day, my interest was primary electronics and not computers. When I went to university, I came across something that this guy did in 1957. Anyone know what that is? A perceptron? This was some work he did when he was looking at how the brain worked. He was looking at neurons. He's a generic model of a basic neuron. Then I read this book whilst I was at university, which really interested me, which was about the concepts of parallel distributed processing, rather than having a sequential type system for processing things. This examined all sorts of things from perceptrons to linear algebra to all sorts of different matrix manipulations. That got me thinking, could you combine these two things? How would you go about creating something that could do what perceptrons and neural nets could do with logic, et cetera? However, the 68,000 happened, which took my attention. This occurred. If any of you remember this, this was the very first Macintosh. I actually got a chance to use one of these. They weren't really used commercially. They were used for companies prior to the Mac Plus from Apple. I do things like this like Pascal, because we had to learn Pascal rather than C at university, which was a bit disappointing to me. That took me away from that. Then 8086 happened followed by several generations, 186, 286, pre-86, SX in this case. This actually came with a floating point co-processor as an option, which was good because the stuff I was working on required a floating point. I actually started implementing some of this. In this case, I think it's Turbo C on 55SX. Again, it was fairly rubbish because the amount of memory involved at the time processing power was still nowhere enough to get anywhere beyond early research on these things. Then we had an AI winter. That's meant to be snow, by the way. I couldn't find a picture for that. Nobody was into parallel distributed processing. Nobody wanted to talk about artificial intelligence, et cetera, et cetera. Then this happened. A lovely early version of Windows, which got me involved in working on these sorts of things. Graphics cards, working with graphics cards, manufacturing graphic cards drivers, designing hardware, et cetera. In the early days, ISA bus, EISA, microarchitectures, and later PCI, et cetera, et cetera. Also new bus and things like that on the Mac. Meanwhile, the internet happened with lots and lots of racks of computers, cabling, ethernet, et cetera. Because the clock speed ramp ran out, I moved into working with things like pi calculus and lambda calculus, et cetera, dealing with concurrent processing. These grew and grew. The example here is, on the right hand side, is dual core Intel processor. I've lost count now. I think we're up to 22 in a core on a Xeon. You can buy at the moment. There is the Adaptiva Epiphany, which has 16 cores. It was also going to be available in a 64-core version. But we're seeing cores, cores, cores, cores, and more cores. That got me back into using concurrent processing in embedded systems and real-time systems. We've got the Epiphany on the right hand side there on the parallella board, which some of you may be aware of. Down below I've snuck that one in. I haven't ever built one of these, but that's a transputer board at the bottom there. Some of you may know David May. David May started a company called XMOS. I do a lot of work on XMOS stuff in embedded hardware, ulti-core, real-time work, et cetera. Meanwhile, because of the internet and the perceptron, this has made a real comeback. People like Facebook, Google, Twitter, IBM, Microsoft, anyone that's big in data, big in the cloud, is now spending enormous amounts of money on research, poaching various different famous people from the AI markets that have worked on neural networks, convolution networks, perceptrons, et cetera. They've had a great deal of success in working with things like images, so facial recognition, picking out patterns, buying patterns, for example, on Amazon, et cetera. So this new load on the cloud, if you like, is taking up many, many cycles. It's actually overtaking the traditional loads that are running on the cloud servers. Remember these graphics cards? Well, these had very specific graphics engines in them that weren't particularly von Neumann-like, but were designed to move information about very quickly and do very simple tasks repeatedly on the same information, whether that's graphics, textures, et cetera. Well, these things grew up into this. This is the NVIDIA Titan, which was up until recently the card that you put in your workstation, or you would put several of these in a workstation, PCI slots allowing, which would enable you to accelerate your machine learning algorithms. These literally have, you know, thousands of GPU floating point processor units in them, as well as on board memory. That then led to this. This is the inside of NVIDIA's latest release, which is the Pascal, and this has literally tens of thousands of GPU processing units. It really is a monster. But unfortunately, even though these are very powerful and good for crunching things like machine learning, simulating neural networks, simulating convolution networks, and all these complex matrix type calculations, they use a lot of power. So they're being used in training, for example, because when you are doing machine learning, you will need to run through the network with different examples and teach the network how to do different things. So, actually reacting in the network takes a certain amount of processing. Training will take thousands, hundreds of thousand times more processing because you have to run through the examples of it. So these are often used to accelerate that process. But we are all really still dancing to the van Neumann Schoen. We're still running these on van Neumann architectures. And although this is a very good architecture in a generic sense, it is a very slow way of calculating things like perceptrons. You really are trying to use a sledgehammer when you just need something to tap it. Which took me back to this recently. I'm realising that in order to make these devices more accessible, this is an open source robot arm, for example. In order to get machine learning into something like this and have it smart enough to do closed loop feedback, etc. to make it useful. Then without needing one of these to do the processing in order to train it, without the power envelope of multiple graphics cards going into the system, then we're going to need something different. So we say goodbye to Mr Neumann. We say hello again to Mr Turing this time. It's back to basics in terms of what we're going to process. We're going to reshuffle the pack here. It moves on down. And it changes the game. Here is the kind of, if you like, the van Neumann curve. And this is the other curve that is starting to emerge. What you're seeing is a lot of technology emerging at the bottom here on the second S-curve. And you're seeing a lot of hybridisation and combinations on the top. Now that hybridisation is coming in different forms. One of it is GPUs, which is the kind of souped-up van Neumann. But also you've got very long word instructions, S-I-M-D instructions, vector-based instructions that are being added in to cause in order to speed up the numerical performance of the devices. You're also getting some clever things happening, such as additional tiny processes that are being added on to the side of the actual full-blown processes themselves as well. But you're still getting problems in the memory because of the memory architecture of the van Neumann machines. So what we're actually seeing is that memory is starting to be broken up in different ways to enable concurrent access to process large matrices of information. And there are some companies already down there on the new S-curve that have literally gone nearly all of the way. People like IBM are building completely new chips that look nothing like a van Neumann machine. And they're applying these to very specific neuromorphic applications. Part of that is to do with research. They've got money from DARPA in order to build or simulate a human brain over the next 10 years. In order to do that, they've got to get down to picajores per perceptron. And if you try and simulate a perceptron on a normal processor, it's not picajores, it's watts. So in order to simulate a trillion neurones, then you're going to need, you know, literally tiny amounts of power processing on each neurone and its connections. And in fact, sometimes it's the connections that they're using to power, not the processing. It's the movement of the data. Which means you need new building blocks. Here I've got perceptrons, things like fast Fourier transforms, convolutions, image convolutions. And new tools here is just a simple example in Verilog. It's one of the tools we already have to hand, bringing me to what I actually need to talk about, which is the FPGAs. I want to talk about a project that is an open source project by a chap called Clifford Wolfe. He's avatar, by the way. He doesn't really look like that. Well, he's quite similar. He's done a board that fits on the Raspberry Pi, which you can see on the left there. That has some PMOR ports. That has the lattice chip on it that I'm going to talk about. And there's his picture of a CAD file from that. He has created a complete open source stack that currently targets the lattice chip set. But it's actually an open stack. You can target a number of different FPGA chips. Yoses, which is one of those recurring acronyms, is basically the thing that produces the HDL. Project IStorm is really just about re-engineering the lattice bitstream. There's enough information that they provide that he could do this, and he's very good at doing this sort of thing. Arachne is really the place and root part of that. It does the optimisation and also produces the output files ready for moving up to the chip itself. So the three of these together form this tool that I know you to pre-program FPGAs using completely open source software on the next hall MacOS. I eventually will imagine it gets ported to Windows as well. But not in its native form. It might need SIGWIN or something similar. So why Yoses? Well, because it's open source. For me, I wanted to learn Berylog and I wanted to understand what's going on. I didn't want to download 100, 200, 1,000 gigabytes or whatever it takes to install the Xilink tools, et cetera. I don't know how much this software is, but probably less than 100 megabytes. I should imagine all in all. It's very small, depending on the dependencies, et cetera. So it's expandable by different people. People are already working on it at different levels. It's extensible. So it's not specifically tied into lattice, although he targeted lattice to start with. And it's very small, and it's actually pretty fast. And you can get stuff done really quickly. And the way I like to work is with a basic editor and a command line just sending the stuff backwards and forwards to the development board. Yoses was created to support these small FPGAs initially. In this case, the lattice ones that we're talking about, the ICE range, the ICE 40 range, come in about 1,000 to 8,000 logical units. They're not huge FPGAs, but you can still do some fairly useful stuff with them. They do have some nice features like low-voltage differential signalling to get information in and out quickly if you need to do that, with cameras, et cetera, that kind of thing. Or high-speed ADCs. They're actually low-power as well. So you can actually run these off batteries. This is known to be low-power FPGAs from the start, which makes them actually applicable in the market space for a low-power device. In terms of packaging, obviously CSP, which is a new thing. BGA, bit fiddly, if you're doing it yourself, but they do have QFP packages. So, for example, the 4K, 4,000 unit version comes in a QFP 144 package, and you can actually solder these as well, which is quite nice. You only have to go up to the BGA for the 8,000 look-up table version. And they're low-cost. They're about $5 to $10. It's not a lot of money. Anyone that's looked at FPGAs will probably know that they start at silly prices and move quickly onto ridiculous amounts of money. So they're very good to actually start with. What would you make with something like this? Well, something that's popular with things like vintage games, consoles and emulation. Not really my bag, but a lot of people do that. Driving LED arrays is very popular with this sort of stuff. Video graphics generation, whether that just be creating screens, making basic games, or doing things like colour backgrounds for TVs and things like that to create different colour rooms, et cetera, that kind of thing. Complex multi-channel audio processing. Again, that's not particularly difficult in FPGA. Mostly numerical manipulation of fast streams of data, I2S or something similar. You can also use it for doing image processing, sorry, voice processing. So if you have multiple arrays of microphones and you want to focus in on different people in a room, for example, that can be done using I2S streams. Real-time digital signal processing and ADC. Anything that you need to do mathematically fast with your real-time data, your ADC data. In terms of putting soft cores on there, it's fairly easy. There's a Pico risk example that Clifford has put on there, which is about less than 1200 look-up tables. So if you've got the eight panels and you've still got plenty left to play with. There are lots of different ones there. He's ported something like 15 different ones and had them running through Yosis and on the lattice chips. Or you could model a 300 neuron worm. Or I can because I'm actually working on doing that at the moment. Purely because the information is there. I have the map of the worm and its neurons and I can simulate it just about in an 8000. Probably less when I optimize it. So I'm going to do it anyhow because it's fun. Along the bottom here, there's a small board there which is about 13 pounds from far now that you can run Yosis with if you can get the damn things. They're always on backorder at the moment. There's a larger one which is the 8K one. I used, I got that from Mao. So that was about 26 pounds. There's Clifford's board. I don't know how much she was charging for that. That was about 90 euros but it was just a one-off he did. There are some other boards that he's taught to run on as well. There's some Kickstarter projects and there's some other boards out there. It's not just lattice that this is being targeted at so far. Has anyone heard of the Cilego green packs? These are the tiniest FPGAs ever. They're like an 0402. They are really, really tiny. They have like 18 legs on them and they're great for doing glue logic. They're almost like a CPLD but they're actually mixed signals. They have ADCs in them and some look-up tables and logic units, et cetera. Some of them have DAX as well. Some of them have I2C, some of them have SPI. They're really useful. In places where I used to use like an ADR tiny, I tend to use these. Well, somebody has ported Yosis to support these devices now so you can actually use Yosis to do your very log. Now, if you look, they have a really nice tool but it's schematic-based and it gets really complicated if you've got a complicated set of logic. It's very difficult to follow so doing it in very log makes that a lot easier. So that's going to be really useful. You can also target Zonic Series 7 as well. Although I haven't done any of that, it is possible. So it's not just confined to those lattice chip. People are already porting it to different devices, et cetera, and building on his tools. It's got a very modern very log support. I was very shocked when I started using very log. I've programmed in about 15 different programming languages and when I hit very log, it was just like crikey. This is very old-looking. It was very limited and then I realised what I was reading was a text based on Berylog 1995. I subsequently realised there was a Berylog 2001, et cetera. So this is a very modern implementation that makes writing very log a lot better than it was. I'm not going to say that Berylog is particularly nice or the same but the modern implementation is a lot better. It also includes some nice functions in there, recognition of things like memory, et cetera, just to make things a bit more simple. It's also got checking, monitoring, synthesis, and it's also got timing tools as well. Not all of those are complete at this point, but they're coming on pretty fast. But parallel is generally hard. You know, whether you're using QDCL, MPI on a newman type architecture, whether you're trying to use Berylog, what we really need is higher level abstractions that are easier to program with. I think these are going to turn out to be, if you like, kind of Turing primitives that are going to be regurgitated and combined in ways. It's a bit like writing in maths if you like rather than just plain algorithms, but it's matrix math. I'm finding ways of expressing things such as convolutions and neural networks, perceptrons, past failures, transforms, and these things. Bring these into some kind of lingua franca that will help us describe these more complex type issues when it comes to things like machine learning, recognition, pattern recognition, those kinds of things. So one of the things that I'm looking at is something called matrixed open Turing engines. It would be nice to be able to build a subset of these things that can then be recombined to actually make those basic things, because what you will probably see is a load of proprietary runs emerging over the next five to ten years. And I think we need to make sure that there's a good open lingua franca if you like for describing these things that will happen at a higher level. There's a bit more detail about the front ends and back ends that Yoastus supports. I won't go too much into that. How am I doing for time? So have a go if you already know Verilog or if you want to learn Verilog or want to get into FPGAs and you're using Linux or macOS, then take a look at Yoastus. It really is quite simple to use. You can target these or even new syliphan that we see emerging soon. This is going to help me and my friends to get even further. Thank you for your questions.