 Okay, so my name is Kem Raj and I work in primarily I've contributed to open embedded project and I'm quite interested in RISC-5 of late and I've been working on various pieces in a faster side for RISC-5 for about Year and a half now or more and so today I'm just gonna cover what it is and There's a lot of software work that has happened and what's the status what all has been upstreamed into various open source projects What works what's happening these days? So I think That will be pretty much it so feel free to ask questions or any comments discussions when we talk about it So how many of you have worked with RISC-5 so far? Good so you got only one guy That's bad So RISC-5 It's actually a new ISA a relatively new ISA came in from Berkeley in 2010 to students started it as a summer project they looked at various other ISAs that were out there predominantly RMX86 and they Were other factors like licensing other things so they thought of wouldn't be nice to have a Instruction set that is written ground up and is made available to you know other communities free And that's how it came along so it started as a three-month project and then It developed into a larger project at the University So it's having 32 bit 64 and 128 bit instruction possibilities as of today Primarily it is little Indian you can I think have a big Indian version if you want to The specs Actually, it's open specs so you can go in on GitHub they maintain a GitHub slash RISC-5 Repositories for various kind of specs so you can go and take a look at all the specifications some of them are now finalized Some are still under works so you can participate contribute You know if you have ideas I see all the time discussions around various subsystems so people contributing to you know various pieces of those specs and It's all discussed in open on mailing lists So you can download the PDFs for all the specs and read through those and Some of those specs are now at at a locked state the the user space specs for example is locked down Some like virtualization is still being discussed So there is a lot of inputs that you know you can Contribute if you are interested in the CPU side of things So I pointed the Link here you go to the specifications you can download any area of interest that you have So the architecture came from University of Berkeley, but It then is now Kind of shuffled shuffled it by or rather also promoted by a foundation called RISC-5 foundation and A lot of companies and and organizations. They have joined it as a member and What it provides is a is a common collaboration space for everyone to participate in not only specifications, but also the other pieces especially around Software for example to bring up software and discussions around those So the ISA actually One of the things I want to talk about how it differentiates itself from the previous CPUs is that it's a It's a scalable ISA what that means is the core set of instructions is really small. I think it's less than 40 so You have the RISC 32-bit instruction set RV 32 Which is kind of the minimal set so if you were to design a microcontroller or something That's all you need. You just implement that you are not supposed to implement the whole she bang off all the instruction sets That gives you a really Simple set of instructions to work with but then if you're designing more customized solution you can go and add additional instruction implementations and Get a more powerful, you know Compute engine that you are looking for so It's a ground-up approach Which is very strong in my mind That you can come up with a lot of hardware design that today we do in software so I Think that this is kind of an ISA that is required if you were to Customize chips and yet have a common instruction set where the software still can be common, right? so it's a It's a different approach where you can have different hardware and meet and have him Linux Debbie and distribution running on top right thinking in a reverse manner Where you can share same kind of software that you were to run on one Across multiple risk five implementations So there are certain Extensions that are already defined for example, you'll find it if you go through specs or other places in software you will have references to this so They have the integer integer multiply atomic instructions compressed instructions so she familiar with the arm and mips for example There is a 32-bit and 16-bit instruction set which is more compressed For code size and embedded systems. It's a similar concept here and then of course the single Precision floating point and double precision floating point and then you can't get me that together To see what's implemented in your CPU? So for example RV 64 integer multiply, you know, so likewise, so you will see this kind of can get knitted string that will represent the CPU implementation you have and Generally G stands for general which is a short form for IM FAD so wide is RV 64 GC is a representation for a common Instruction set that is being used in Linux world like primarily all the desktop distributions and other Operating systems they have agreed to use RV 64 GC So that's kind of a general instruction set But in embedded space that might not be the case. You might have you know various other implementations based on RV 32 I and other All these extensions not being present there. So So the upstreaming began somewhere around 2014 for software site, so primarily we'll cover software here So one of the first patches that were sent out was for the auto tools right to support with five tuples and That began around in 2014 So a lot of work was done by students at university to get all the basic software up and So the next in the line is usually tools like tool chain you need compilers you need other libraries like you know the system C library and others that was the next set of things that were upstream so Here there's a difference where Architecture is in definition state, but all the support for the specs is already getting upstreamed into Tools like GCC binutils and other so it's a it's a very different approach Then it was with the previous architectures where army arm architecture existed forever before it was upstreamed into various tools in fact there were Ports that were kept outside the tree and there are still port that are outside the tree for arm for mips and it's not It's a humongous amount of patches like you know in thousands that are maintained outside So this here The all the changes were upstreamed basically the target is always upstream into whatever upstream Open-source project today. So binutils got two dot twenty eight got the support for the 64-bit ISA GCC 7.0 got the full support Similarly glibc 227 so these are fairly recent releases, but they are out there in open now So today if you were to use a fairly modern tool chain You can easily build a respite tool chain just from upstream repositories Yes, I'll talk about it and Actually, I'm participating in there. So you can ask me more questions if you want, but yes It is the port is already available It's not yet upstreamed. It is actually at the verge of being accepted. So Hopefully, you know, maybe in few weeks if everything is addressed that the maintainer has reviews for Maybe we'll get the 64-bit in first and then maybe 32 bit a bit later, but yeah, it's it's in in works so As you can see that this is these are the primary level of tools that I covered here which gives you a Hosted Linux tool chain. It gives you a host a bare metal tool chain and it gives you an emulator and a debugger So basically by this you have all the basic Ingredients that you need to bootstrap a new architecture and thereafter are your bootloaders and So there are few options that are out there and few of them are already ported to To respite a code boot was one of the initial ones and U-boot got it recently and at the last year and Proxy kernel BBL that was actually the original Kind of a bootstrapper I would say from the university itself and it was used and I think Western Digital contributed the open SBI Recently which is basically replacing this implementation with a with a more standard Implementation of a spec at a platform level where all this Like you see and others they can also speak the same language using the open SBI So are you covering open SBA in your thoughts? Yeah, so I think there is another talk in the afternoon where I think you can get into more on that side we have Lot more details on there Grubb is actually very new so grubb got The port this five port got accepted into grub And it's probably gonna be there in 2.04 release So what that means is that all the desktop distributions they use grub? So it opens up simplifies, you know the desktop distributions quite a lot So it's a very positive thing in my mind that getting the support into grubb is a is a big thing There is a UC spec as well available. I don't think there is a implementation yet Some folks from HPE were trying to do it. I'm not sure whether how far they have gotten to implement UC But we got grubb in there and hopefully you know next time when we meet we will have more so On the kernel sides, I'll cover a little bit of Linux and then some of the RTOS is on high-level 415 is when the API the user base API was submitted and accepted upstream which means all the syscalls and all the Like spec based user API they were accepted and and submitted Although the we still used forks for because driver work was still happening and 4.15 was still not usable from upstream as much but Now starting. I think 5.x. I think you can boot High-five board, which is a freedom u5 40. I think you can just build it directly from upstream colonel org Colonel and it will have most of the support. Hopefully It's usable. I would say that you still have practice on the sci-fi Colonel for but They have upstream quite a lot of it like hard SW caps got recently submitted into 5.1 and So likewise, you know, they are moving all the fixes that they have they're caring for long into upstream kernels So majority of work was done by University of California at Berkeley sci-fi and andy's technology to do all the kernel support The work has been happening since 2011 and 10th. It is not that it's scrambled work. So this has gone through a lot of testing, you know at a Local level. So the patches were there. They made them better and better upstream worthy and now they're upstream So Zephyr as you might know is one of the RTOS options that are out there Zephyr is a project under Linux Foundation and Actually, this was fairly early in I think a year or so ago that somebody submitted the port for Zephyr and Actually, Zephyr is deployed now in real with five chips microcontrollers, of course and So there are a bunch of microcontrollers that are listed here the hi-fi one new port that is available and It's all part of the SDK so when you download Zephyr SDK they have bunch of packages available like you know arc x86 arm and This five is one of them with 532 data So the SDK has all the support Currency to you know the all the toolchain work and everything that happened Free RTOS it's very recently I read the blogs from the AWS guys Where they declared that they have now respite support for free RTOS Which is a great news it's currently supporting as per their article both 32-bit and 64-bit instruction sets so But of course, it's just the Soft float profile so primarily that covers our kernel pieces and then I'll move on to the next which is more Linux distribution support which is you know now putting the whole operating system together and We talked about this earlier that desktop distributions They have agreed upon a common ISA or the with extension that will be used and that is RB64 DC and 64-bit ABI That is for RB64 maybe in future. They'll be RB32 port. I don't know but so far its Focus is only on 64-bit port and little Indian is the default Indian F for the common Linux operating systems Embedded Linux Story is a bit different though where you know you can have all this variants of CPUs where a standard distribution might not be sufficient so you might have to have Embedded specific distribution and we'll cover those as we cover the distributions And I think there is a talk in the afternoon where We will cover actually embedded Linux in much more detail So you'll probably have much deeper insights into it. I'll Go over the other distributions as well in general So you might have custom extensions for example your if you have all those you would do those using those embedded in this distribution Will provide you infrastructure to do that kind of stuff so Fedora is actually very active in Doing the RISC-5 work and in fact it's in a very good shape right now So there is a wiki where they describe actually in a really good Description if you are Fedora users go try it out They have images that are now available for emulators so you can basically get it up on KMU and all the packages I Saw our own so it's now being actually added to the Koji build form, which is their standard build form and The bootstrapping images are already available, so I think at this point it's in a fairly good shape as an architecture and They support the high-five unleashed board, which is actually one Linux ready board that's out there And then the QME of course, which is the another Board that is supported Debian Debian also has actually very active community working on RISC-5 So I always like this graph from Debian which lists You know how many packages are being basically ported and as you can see this five is way above IS-64 for example, so you know it's fairly large number of packages now in Debian and that are now buildable on RISC-5 and It's it's always going up and up and I think Sooner it will be in the same league of x86 and arm So it's a good measure of how the porting work is happening Suzy so open Suzy they have At a similar level they have pre-built images that are available for you and if you use tumble bead you can just install tumble bead on you know one of the Hi-fi boards or QME and use it so you know if you do any kind of application level work on Porting any specific software. I think right now maybe you're distraught your choice is already working on RISC-5, so go ahead try it out so You have to project open embedded so Primarily I covered few of the main disc desktop distributions, but there are a lot more that have ports either available in works And I think if you go to there is a software status page on github for RISC-5 And they that's a live document So whenever something is being added being upstreamed all those Statuses reflected on there, so for more accurate information or you know if you use a Particular software that I didn't cover here it will be probably there and it will reflect the current state of that particular package or a major infrastructure So I encourage you go there and check for the things you are interested in and what state they are in Open embedded and Yachto So this was actually one of the first embeddedness distros that was used at UC Berkeley to port basically to bootstrap the architecture So there was actually a port Called poke RISC-5 That was used To do you know all the work, but they never upstreamed All that work into open embedded so it has diverged quite a bit for a while and it was based on a very old fork and so What we did later on so I also contribute to open embedded project and we basically upstreamed a lot of the base support into our core layers and then we created a Architecture layer which is hosted on again RISC-5 github handle and It basically interacts with upstream layers and you can build actually for a lot of boards actually including emulator 32-bit as well as Muscle, G-Lib C even all those options that is what Open embedded gives you you can pick and choose I can put together your distribution the way you like and and You can also put together SDKs for you know other Like bare metal work and all those things because I talked about Zephyr Zephyr SDKs that actually generated using open embed So you could use open embedded just not far as a distro, but you can also use it as a Infrastructure to generate your tool chains to do your embedded work or if you are doing some You know FPGA work where you are using it in a microcontroller But not a full-linux distribution So this can help you to put together all the tooling that you will require So I think more there is in the talk in the afternoon where you know they just one slide probably will be discussed in a lot bigger sense so we now have a Recently we were using the BBL as our bootstrapper and that has moved away, so Open SBI is what we use in Open embedded now, so it's actually at the moment one of the leading distributions for with five to get to implement Various pieces and I think it's on the latest and greatest kernel and thanks to Western Digital for a lot of those contributions and emulator is also in a very good shape and They have also contributed the openness the open SBI contributions came all from them and 32 bit actually RV 32 bit also came from from them The way it is set today is actually where I talked about scalability of architecture it really goes in here So we currently only support like the base common ISA Variants, but you can easily enhance it to say you implemented a chip where you Implemented a double floating point you can or you eliminated it You can easily represent that in open embedded and generate your own distro So I think that in future this will go hand-in-hand when this five really scales up It's a good distribution to go with that philosophy Yeah, so I think We've been always Targeting it one of the I think the entry points for currently in open embedded is testing So for it to get into so right now don't get me wrong like you know most of If you look into matter is five, it's very light layer. We don't have much things in there It's only BSP definitions and you know few other things and few passes that we carry for certain packages But as we move into future, I think one of the acceptance criteria for getting into core would be The p-test framework and that still doesn't work. I think It doesn't work. I think it's a It's a small problem that we need to fix But the hope is that in in coming releases probably this year We should if we fix this problem, then I think it makes a fairly good chance to basically get our QEMI machines into core and So I think That is what we gonna kind of was talking offline here as well that you know, that's probably a Priority for us to get fixed and then start basically pushing it for the next release Provided we can fix those issues They are still at large Build root also got the support and so actually if you are build root user build roots have both 32 bit and 64 bit port since December and I think 32 bit port was added immediately after there. So right now So I think the recently start is was the branch that was Used before but and right now you can just use their upstream master branch and you can build for the emulator as they target So and because I'm was actually the contributor for that work primarily So so that's primarily like a high-level What has been done so far and is available upstream for you and then I think I'll go over Some things that are around on going work So If you go to like risk 5 handle on github you will find there are several forks of the projects that are still there It's a temporary staging area for you know, there is five community to host those repositories and do the ports and Then the idea being that it will be upstreamed and eventually those Repositories shouldn't exist. So what you would see is that all those ports will get upstreamed and Like it has happened for other Projects so you will still see that there are several projects that are hosted there and that are still being worked on LLVM port is actually available in a.o. You can actually clang is also available as a as an experimental port So you can build it And and use it. I've tried to it's in a it's not in a production stage or anything But it's usable you can build you can generate like smaller examples. It's progressing in the right direction Missal see library we talked about it a little bit the patches has been done Last year and they've discussed gone through a few rounds of reviews and for lack of time sometimes the reviews hasn't been addressed completely and so I think there is another review cycle happening as we are here and And I think most of the 64-bit port is actually in Has kind of all the reviews addressed so hopefully Maintainers would accept it in you know come coming weeks What that gives us is that now there are certain distributions like open WRT, which is his muscle As their sea library in Alpine Linux, for example, you know the most widely used Distribution in the Docker world They use muscle as their standard sea library. So it opens up the way for those distributions to be upstream So Alpine actually already has taken these patches actually and wetted them against a A I think one of the developers. I don't know They have their own fork that they reported and blogged about it saying that they have The basic port for Alpine working on this file Which is a great news Open WRT also has a port that's available and they also publish the feeds. They are not upstream upstream yet But they are in a staging area on the way upstream. So that's happening But if you dabble with those projects, I think these are the areas you would look into and you can kind of like get in and Spend your time if you are you know participating in those communities Go lang actually is There's a port that exists on On a fork on with five and get up handle It doesn't have all the features for go for example CGO is not there yet, I think and It's not upstream as well yet. So I Don't know what the upstreaming plan is there, but I'm pretty sure that there is some but and I'm played with Riff I go yet myself. So don't have much details into you know in what shape it is to tell you but In the past, I think it has been there Being used on some distributions already. So I assume that it's in a usable state and and the fork Open JDK doesn't I mean it's a very basic support that's in there. There's no hotspot support and all the things V8 engine. There is nothing that has been done so far. No. Jeff is in the same boat So I think the larger ecosystems than you know, the bigger Ecosystems are still not yet fully done So maybe, you know, there is if you are Help me with quite a bit in those areas if you work in those Communities, you know Free to participate and Get with five as a first-class citizen in there. So Hardware wise, I think Talking about like full Linux hosted Linux distribution. There is sci-fi You know freedom board. It is kind of a bit expensive right now on this is like under thousand dollars but It's actually a fairly good like four core CPU and you can basically do a lot of native stuff on this one So hopefully there will be new Boards which are affordable coming out Like, you know in a Raspberry Pi factor. Hopefully, I don't know that's my wish But if that happens, I think that'll be really good for the community What I see is that without all those so much has happened in the on the architecture putting all Large pieces of software. I think this will just take it to the next level. But it's a Matter of when it is going to happen Somebody will come up with a board which is affordable. I'm pretty sure You know the way. Yeah Yes, so You can go especially on sci-fi's GitHub handle that they have published several Schematics in there so I Think they keep publishing every non-den as well. So I don't know what the current state is but they have several Open like, you know reference implementations that they're published. Yeah, it's a bit complex Yeah, so I think there are certain tries in that area that people are doing and I think it's an area that probably You can Yeah, yeah Yeah, so I think definitely more You know, if you dig more into it, probably it's doable. I think so but probably know what I'm trying it yet Given that, you know the complexity you need for the chip for running a full Linux kind of it with all Virtual memory and stuff, I don't know Maybe it's doable Maximum Mm-hmm Yeah, I think that I'm not privy into like all this licensing and stuff, but you might be right like, you know in the ballpoint figures it depends Who do you know the other guy is and you know, they have all those contracts, but it is expensive. Yeah Yes, so that's that's why, you know, it caught my attention and my kind of imagination that you know What it can do is amazing in you know in this given, you know, AI of things world right Where everything is a compute which is not same as others a general computer is kind of lived its way it's good in data centers, right but as AI becomes more and more common your computing units are very unique so You know, you cannot do everything in software for example, right? So the philosophy that we had for so long was you know, let's create hefty computers put them in cloud Get the data process it there It can only get us so far So it has a fairly I mean it's coming up in a right time. I would say and interesting times So I think the there is more information that I've covered here There is a software status as I'm mentioning it's a live Document you can always go in there and look for and as some people announced different so sees that they're developing They get added there too. So, you know, that list is growing. It used to be for and now I see it's like 20 So that's a good good progress And then there is a separate kernel meaning this if your kernel interested in kernel Then there are several other mailing lists that are available here that if you want to get involved But the philosophy is that Everything is upstream. So you really don't have to interact with the communities to accept patches You can fix a package and send it to that upstream where it is So you really don't have to kind of get through with five and then upstream it it doesn't matter So most of it will be upstream if it is not fix it send it upstream. So that's as simple as that and There is actually IRC handles to so there's a respite channel on free node a lot of Folks hang out there. So, you know, you can go ask questions or just listen to what people are talking about, you know We think it's a pretty informative channel People talk about a lot of interesting stuff Stack overflow also has a Tag so you can go in, you know, people are asking questions about opens, you know, respite in open on the stack overflow You can go answer participate provide your feedback. So I think that kind of spreads the knowledge to you know, people who are trying to to use it so So I think it's also getting a lot of press coverage that I've seen in past few months and in fact late last year where People are now looking at it. What it means, you know, is it Too open or is it not that open so they are doing comparisons, right? How the processor industry has been doing and what it can get us and so Interesting articles out there and I read this article about too open to fail, right and They talk about what you were alluding to is their licensing and all those kind of things and At the very end. It's interesting the author said that the next article will be too open to succeed right, so So that would be interesting read to move forward and I see it being covered on a lot of other places on Hacker news and other places. So, you know, there is a lot of Good stuff and that's happening around this file and I'm excited about that So I think, you know, I always compare it with this analogy So if you recognize this is an email from Linus Torwald when he announced Linux, I've written a kernel right and it works on IP 86 and Go have fun at it, right? That was in 1991 and Today we know Linux is everywhere and one thing that it's been common here is that it was open source so As we know that history tends to repeat and maybe this time it is for a processor who knows Time will tell but it's interesting time and I think What I see is I was talking to a few folks and they were saying that, you know, the number of Startups who are doing hardware has gone up in the past in the recent past like in a one-year You know given that the architectures were getting so so less in number that, you know People have been innovating on software so long that hardware innovation cycle is returning and There is several People who are trying to do like a lot of stuff using Using with five and other maybe other architectures too, but certainly I think this five is contributing to that wave and I'm just hoping that you know, this will turn out to be something really interesting for us in future and 20 years from now, maybe you know This will be one of the architectures to reckon with and it will be basically dealing with our all AI and machine learning loads which require this specific compute that is not general in nature so I think there is a Directions towards that Nvidia has announced their course for example, which is Reside based I think one of the open course that they've announced so I Think there are others who are now also working in those directions like there are Like tpu's are a typical example that there is a requirement in this industry or Like car industry is trying to do you know The visual compute and all those kind of things those are special kind of loads Where you can't put in a server class, you know cpu Into your car And with the fans and then do that can kind of compute in there So you need like specific compute modules and I think this is a good core That somebody can pick and say have an idea to implement a neural network, you know, and then I have an assay I don't have to license it from anyone. I can just go download the spec design my my hardware, so I think One of things that would be the tools that other especially, you know Idiot tools that people use for designing. I'm not sure that those are As good as others yet, but I think sci-fi has done some of this diesel stuff Which sounds pretty interesting where they're trying to do a high-level design using, you know, scala, I think so which is a very different approach for Doing the design But I'm hoping that you know the tooling around the hardware will come up and will be pretty modern so, you know instead of How we do software today, right? We do and we redo we redo you could do hardware too, you know It doesn't have to be that difficult Probably you can design it redesign it and the cycles will be smaller so Today we don't think about software. Okay. There's a bug. We can fix it But in hardware, it's a big thing. It's a very big expense Because of all the verifications we do So the reasons Some veteran at Intel told me that in some cases they do a local optimizations on assembly in very long and So they cannot recompile the whole thing because that tweak goes away, right? So most of it is very manual for them. So verification is very very Long for them. So you need high-level tools Right for all this to speed up. So I think there is something happening in those areas as well that people are doing so I'm hoping that this will be a tool for people to try out hardware innovation again So I think that's what I had for today. And so I think if you have questions, we can discuss those and Thank you. Thank you. So there is a wrap from Western Digital here. I'll let him answer Do you have the high-five board the freedom you the freedom you you 540 board as well or yes, so Yeah, so there's the FBC from sci-fi which is not cheap. It's like $1,000 something like that but Yeah, so RISC-5 Robert by foundation joined RISC-5 foundation recently Yes, so So I would not read too much into it, but I'm hopeful that you know they Probably would come up with a design probably in future. That is Maybe there will be two different kind of Raspberry Pi. I don't know sci-fi is the company the board name is high-five Yeah, so I think there are some some like Microcontroller boards that are already out there that you can get very fairly easily and they're under 100 bucks 50 bucks But they're microcontroller boards So far for full next distribution level work You would either use chemo or you would have to get that thousand dollar board Yeah, yeah, yeah, so that I think is also shipping in market right now I think there are a few products which are already shipping RISC-5 into market in that Compute Yeah, I think so. I mean, you know, I so This is my take on it like unless you do a native bill like you know You have a system that you can give to a developer where he says okay. I do native work The software systems is not gonna ignite so much right so this five has to have that full system to a certain extent maybe it is Not very cheap and you know, it can't get to that level for whatever reasons, right? But unless we reach that point, you know, it will be like it'll remain a little bit. This will be a problem for developer community, so I think It does have a value to have that kind of system, you know, that will be there but I think your network cards definitely I think that is a use case that people are actually exploring As of today, I think that's more practical from product products point of view cheaper. Yeah Oh, what I mean by that is that the the API is that you would generally use for for design right so the all the implementations that has been finalized so the specs has been So version right so we implement version 2.0 of user spec so the instruction sets they are known Right Yes Correct. Yeah, so so say yeah, so you want to implement an instruction set, right? And then what those instructions do right is in the standard so you implement that So basically if everybody was implementing 2.0, right, then they are compatible with each other That is what no Yeah Yeah, because everybody did that before right so they had prior art So it's not that they came in and they shut their eyes off and I said, let's do something new For example, they took best practices from previous architectures, right? Like a typical example is like the zero register from MIPS, right? That makes your instruction smaller, right? So they took that design and implemented it in the specs here so likewise, there are other things that they have learned so they The hope is that the the I say that they will come up with is It's basically better than what we have done in the past, right? We always will have future and I will not shut it down for future that okay This is what it is. You cannot improve it, right? They might be but what it gives you is a common kind of a scalable set that you can start off and it doesn't either enforce you like, you know with Arm event like totally different ABI is completely right so They were not like each other or it is like the other way you look at x86 where everything has to run on, you know x86 At a lower lowest common denominator, right? So it's somewhere a kind of a balanced approach For the design primarily giving you a common set to begin with and then scale it up And then have the building blocks that you can put together to build a logic so Mm-hmm Yeah, so I think that's one advantage. So like there's a memory model is verifiable. It's Put out there somebody can write an emulator or simulator for that and verify it and obviously, you know There is a glitch that is found, you know, we'll fix it quickly because we all know there is a problem Rather than so, you know that security by obscurity won't be there and obviously there will be holes, right? It's we will never We can expect a architecture to be 100% foolproof because it is open, right? But just the process will be different to fix it Yeah, yeah, so I think I'm not sure whether they are published it but sci-fi does have benchmarks against There I forgot the name of that board High-five DS High-five one high-five one, which is a microcontroller against Cortex-M zero and They claim that they are a lot better in terms of power Lot means lot. So, um, yeah, I think there are I mean the the the version one of the u5 40 board, which is Linux capable Wouldn't be a fair comparison as of now Because I would expect the v2 to come out Where I think then you will start to see like real-world benchmark like fronix or something That you can start doing that kind of stuff so Yeah, so I think you know it opens up like an opportunity for people who are new designers, right the young kids For example, I'm excited about those people Like especially I see like I think India is invested into it like 42 million dollars as Their national architecture, right? So they are basically designing this into their school curriculums and university curriculums China has a lot of projects already in there. So I think the reason is because it's open, you know It basically will be a lot more chance for people to try it out. And so I Think if it stands the test of time, it will be the architecture to reckon with so cool Yes VSD. Yeah, it's from Berkeley. So what do you expect? Yeah, it's it's commoditization basically this is what is required right in general Otherwise you do like a lot of duplication, which is not required. Yeah Yeah Yeah, yeah, so I think You know the demon that you're beginning Mm-hmm. Yeah, yep. Yeah Yes, yes, I think that's big But the capability is there now you could do it Yeah All right, thank you. I think we 334 That's better. Okay. Good. Okay. Welcome to this afternoon session The embedded track. We're gonna be talking about FPGA's how many here have heard of an FPGA How many have used an FPGA? That's good. Good number of people. Okay, great All right in this talk We're gonna not give you the complete nuts and bolts of it But we are going to be talking about some of the exciting new developments Which are open-source tools to actually be used on FPGA's Which means that instead of having to download Five gigabytes of stuff for one particular thing you can download and build the tools yourself and they're a lot smaller So that'll be fun all right Initially this talk was going to be given by Merrick He had trouble getting a visa because the embassy was slow and prog this year And so he was not able to come and he asked me to give this talk Although I've seen him give this talk before so I knew it At least know it and I've also been working with with this as well So thank you to Merrick for for making this talk available So basically we're going to talk about these five things we're going to talk about what it is What type of design tools there are and what languages are available. We'll take a look at a small design Talk about debugging a design a little bit about debugging a design and the conclusion His lab did not work for me So I had to come up with my own and I was working with a friend of mine in Phoenix who has built an interesting project on A very inexpensive board, so we'll start with that Okay, FPGA stands for field programmable gator a It's a programmable device that you can multiple times program it because it generally has an external Prom to actually program it It's all logic there's no real CPU in it or anything there's no real programming for it You have to decide to build up all the little blocks for it And to whatever you want it to do a lot of them have a minimum of let's say a hundred IO's So there they got tons of IO space available They're not really all that fast when you compare it to like a CPU that no modern CPU that you have But they give you plenty of options for IO some things that some of the more modern ones have endpoints for things like PCIe or For gigabit ethernet, so those are all available Extremely parallel if you're used to programming things mostly go down one two, maybe three four threads In this you may have thousands of things all acting at the same clock cycle every time another clock cycle happens A whole bunch of things happen in parallel, so you have to think differently when you're programming an FPGA We use the lot in DSP and does those signal processing and That's because one of the things that they do really well is they do things repeatedly over and over and over and over again Very rapidly because they can do things in parallel parallel data processing if you've ever done anything that requires a bunch of parallel processing Let's say you have a custom hardware interface that's different than us somebody else's standard Let's say you have a camera for whatever reason that has 27 bits coming out of it and FPGA is what you do to capture that data and then pass it on to your computer to be processed Asics anybody heard of an ASIC seen an ASIC how expensive are they? Okay million that's correct, so because ASICs are very expensive And oh by the way if you do a spin once that's a million bucks. Oops. I did it wrong in that silicon That's another million So rather than do that we'll actually use an FPGA to do the prototyping What other uses do you know of for an FPGA anybody else who's used anything here high-speed what? Absolutely high-speed and like I said camera capture and stuff like that How about this let's say you've got a part that was made in the 80s the manufacturer no longer makes it But you happen to have a contract for a government or a military that says oops We need you to replace this because we still got the system that we're using and we can't get these parts anymore By knowing the data sheets you can actually build it in an FPGA We'll talk about that you could emulate spark in that as long as you knew the internals of it. Yes Okay, here are some of our common vendors common vendors are Xilinx and Altera, which is now Intel Lattice which we'll be talking about today Microsemi those are some of the normal ones that you'll see out there in the field A lot of these guys have been around for a long time. All right. This is kind of what they look like I Say kind of because every one of them's got its own internal architecture and they're all slightly different Okay Within each FPGA there are logic elements And you can see within the logic elements. There's a logical Device and the register the lot as they call them They also have global interconnect that connects them across this thing They call it fabric That's the typical term for it is fabric and it kind of looks like a fabric because it's all interwoven together They have the blue Which is global interconnect the green which allows you to connect within Locally within that piece of the fabric And then they have a logic elements all these different logic elements that are inside of the fabric Additionally, which is not shown on this are things such as local clocks and global clocks So with these logical elements mostly turns out to be close to flip flops You can actually create most of the things that you need in an architecture and gates or gates all the other different pieces that you need Any questions about this before I move on so what things are available? Like I said, I wasn't going to do a deep dive into how an FPGA is built because every one of them is a little bit different And everyone's done as a proprietary thing by whatever whoever the vendor is and all their data sheets show their architecture and how it's done so Altera and Intel the ones that people aren't normally see. Thank you for telling me that Cyclones four or five the ten versions max ten Anybody ever played with the hybrids where they have a couple of hardcore arms in them and then they also have FPGA fabric That's another thing that's being coming very popular especially in the Linux community They're using Linux to run in the hard cores and in the soft cores. They're doing whatever special thing that you need Today we're going to be talking about lattice with two T's and There the product we're specifically interested in is the ice 40 and the reason why we're going to talk about the ice 40 is because That's where they started with the open-source toolchain that we're going to be talking about today Specifically then there's micro semi with their polar fire and other products the Spartan arctic and zinc Which is that which is a hybrid right here the zinc is And I said when I say the word hybrid I mean it has hard cores and soft cores in it So you have hard cores of arm processors specifically and in addition to it You also have FPGA fabric available So both of those are available within one on one die and Brought out to balls on the on and the FPGA itself any questions All right, there are other vendors These are probably the big four that you'll see in most in most projects. All right now. We're going to start going into the tools each one of these Vendors has their own proprietary tool and tell has quartets Xilin says Vivato and ISD and all bunch Their proprietary their closed source and they are enormous. I just recently downloaded Vivato and it was 15 gigabytes Now that's for every one of their pieces And it also includes some locked up things like some IP that they'll sell you and what have you but that's all locked up Unless you get a license code so you see a lot of Big big things if you have to do it It's a big learning curve and oftentimes it requires a fairly substantial machine to actually run it and Part of it is because they've got a lot of optimization for these big FPGA is built into it You know the ones that cost $5,000 of pop if you blow it up, that's not good So we're going to talk mostly about the open source tools for making For using it. We're specifically talk about one specifically for the ice 40 Ice 40 uses these tools ice storm next PNR PNR meaning place and route That's how you actually tell it which logical elements and how you're going to interconnect it That's what the place and route does the lattice ectp5 using the trellis tool Think 7,000 there's a project called x-ray and then Altera's original cyclone one is Chibi and these are all tools that have been developed by open source people To deal with the problem of how do I take these relatively inexpensive FPGAs and FPGA boards that are coming out and actually use A tool which I understand what it's doing or at least is open source If that's important to you these tools here like chibi x-ray trellis and ice storm all our efforts at reverse engineering the fabric of Whatever whatever device that they're looking at So for example the ice 40 probably has the most work been done on it again in Clifford in in Austria Austria And he has reverse engineered most of the elements of it Any questions about any of this so far, okay? All right. Yes, go ahead define bender vendor support No None of them None of them that I've seen They started with the lattice ice 40 because they were inexpensive and they had a lot of a lot of stuff on them This was actually done as a university project like a lot of other things get done. It was started as the university project Absolutely, absolutely. We're not there. We're definitely not there yet Yes, yep Yeah We're at an open-source conference here and a lot of the people I know would rather not use somebody's proprietary tool in order to do There whatever they want to work in This is part of the reason why we're doing this the other reason why we're trying to do this is because Close proprietary tools that as soon as you get to that certain point where you hit that point You can't use those IP blocks that are that are for the higher-end parts so for example the PCIe block or the Or the high-speed serial ports that are available on some of these larger FPGA's even even the ones that cost twenty or thirty dollars They have some of those hard points on there. You can't get them unless you buy a five hundred or seven hundred dollar part of License to you not just you have to have the license for the tool and then you have to have the license to the IP block That runs that so those are the kinds of things that they're working towards So ice 40 came out some of the original ones the HX and some of the other versions And they started to be reverse engineered other people Jumped on and said oh, I understand this process. It's a tedious process by the way You put in some code using the proprietary. You see what comes out you try and emulate it with yours and say Oh, yeah, I got it to do the same thing. It's a very slow process to reverse engineer this Yes Yes, it depends on the hardware you're using If it's any of these this is probably the most mature right here The further you go down this the less mature the tools are It's not that the tools are but the less mature the definitions are They have it had a chance to find certain blocks. I'll put up a link to the symbol flow talk that was given at LCA this year the links coffee you and which they talk about the state of where things are I Just I was saw that talk when I was down at LCA and it was a very good talk Unfortunately, the only thing I have I didn't wasn't able to get the slides I've only got a YouTube video and it's kind of a pain and it's a long talk, too But he was talking about the state of where things are certainly the lattice ice forties are probably the most mature in terms of being able to use them However, other ones are starting to gain speed as people say hey, I would like to use a Spartan 7 or I'd like to use an Arctic 7 People are starting to work very much on trying to make that happen Okay All right The flow is pretty much similar depending on whether it doesn't really matter whether you use the proprietary tools Or you use the open-source tools. They all have the same basic workflow They take HDL HDL stands for Hardware description language There are five or six of them that are available system C Barrelog VHDL are the more popular ones. There's some other additional ones that are coming out as well Those hardware description languages give you a not list If you're familiar with EDA tools, you know a net list is how do we wire all this stuff up? That's what it's for from that. We take the net list and we place and route it Remember I showed you the fabric if you take a look at this fabric. We have to place and route What element am I going to use and how am I going to connect it up through this fabric? From there we go to the technology the technology is that reverse engineering bit that's going on today That's the how do I if I tell it this thing and I put this piece of information in what does it do inside the FPGA? And what do you get out of it? that's kind of the next piece and Then once you get done with doing the place and route then you have to create a bit stream a bit steam stream is the actual programmed instructions to tell the FPGA how to hook things up and that goes into your Proud your prom they're almost always some kind of a serial prom They also have parallel ones for the higher-end devices to get them to load faster Most of them are get loaded over SPI so you can actually if you want to you can jam directly in over SPI and avoid using a prom But of course that means when you turn it off you have to reprogram it again every single time you change the hardware I'm sorry every single time you unplug it and plug it back in again Additionally, there's analysis tools for timing remember. We're talking about defining hardware so we're going to define the all of the clock signals and make sure that The next memory cycle that you're going to have inside the device is all lined up with everything else all that Analysis on the design checking to make sure that your design your timing constraints so that things aren't Skewed or off and so you your memory doesn't actually get access at the correct time and Then eventually we also have simulation and visualization tools and those have been around for a long time if you've ever used GTK wave or some of the other things those are available and then we're just taking the output of The of the the verilog and turning those into into a waveform that can be seen Pretty much what we're going to talk about here. Any questions on any of this? Okay? All right when we write HDL When you get output and go through the tool which is a verilog usually a verilog or VHDL compiler you'll get a net list out of it that net list again is this is the list of how you hook up the different pieces Logically, this is ands and or gates and flip flops and memory and how ram looks like all that is a hardware description There's also a behavioral model that you can use Let's say you know what the let's say you know what it's supposed to look like combinatorial logic You have ands and ors all put together that can create an actual schematic So it's not unusual to see you go from HDL this hardware description And the output is a schematic that you could actually look at and say oh, yeah That's exactly the way that I wanted to make it run. That's exactly what I want it to look like The analysis tools are trying to ant are trying to parse the Hardware description language all of this circuit how it's supposed to look and then validate it for timing typically It's validated for timing because remember we were talking about in the in at a parallel universe Because that's what an FPGA really does is a lot of things at the same time in parallel in different pieces of the FPGA These pieces here all look the same Mostly look the same. You'll notice that this and this is repeated over hundreds of thousands of these blocks That are inside of it But you program them as to what's what the actual gates are going to look like inside and how they're going to be used and One block in one section may be completely differently programmed than another block in another section But they're all using a same common clock. So when the clock ticks over The next thing happens in every one of the blocks that use that clock and then eventually We take and synthesize it much like programming. We have to since we have but in much like the GCC or GCC will take and Do run First of all, it'll run the C++ parser which will then go to see which will then go to Object then be assembled and then eventually they get a link together and put out into your project Same thing with this It does different go through different pieces in order to do that and eventually we're going to end up with being able to do the Place and route after it synthesizes it down to a net list then it'll figure out How am I going to more? How am I going to efficiently use the? Logical blocks and interconnects Inside of the FPGA in order to be able to create my project and actually put it on the device Any questions about that? Yes, go ahead My answer my answer to you is depends on what blocks you want to use within that part If you want to use some of those proprietary blocks You have to use a proprietary tool if you want to use the normal blocks that have been reversed engineered The answer is you don't you're not required to do it. You can use the open source tools Allison You can actually do it either way most of the tools allow you to do either use a schematic capture And turn that into an HDL if you're if you come from the EE side of things you come from the programming side of things You'll use a hardware description language like Veralog or one of our VHDL and then let it go ahead and do that for you No, well, there are C to there are C to HDL, but they're not good There's also HDL to C as well, but again, they're not great Um, if you want something to see like use Veralog that uses a lot of the C conventions if you like Ada like some of us do then Use Veral VHDL because it looks very much like Ada. It does. It's almost all the same okay, in The synthesis step We're gonna call it YOSIS because that's it. That's the name of the tool that we're using is YOSIS And it converts the HDL into a netlist That can then be used by the next piece. Okay, better fuck we're gonna talk right about YOSIS here It's an HDL synthesis suite. It goes from Veralog as its HDL language to the netlist. Okay it has some logic optimization and Minimization I don't remember what he's calling ABC here What it does is it it knows about the part that you're going to actually use it in and Because the reverse engineering that's been done already by Clifford in this particular case He understands that This is the way to do the best optimization for this particular part It supports the mapping you'll notice that it overlaps with the place and route It's trying to give the place and route Every chance it can to do the most efficient use of that part because you only have so many logic cells and so many Clock lines and so much pieces of memory So he's got some different mappings for different ASIC cell libraries He's using little tiny ASICs and his in his research He's also done some series seven FPGA stuff and of course the most significant one is the ice 40 That's the way he started and that's where things are going this Is where you can get all of the tools Clifford dot AT Yosas and you start there and it's all it's all there So I'll leave this up for a minute so you can take a picture. That's where you get the actual open source tools Everybody got it? All right. I'll move on in so place and route Place and route says okay. We've got this intermediate step That looks like a net list of how I want things hooked up It's been as optimized as it can because it understands What device you're actually going to talk to and so the synthesis tool tries to give you the most optimized output? Netlist so that it uses that chip efficiently and then from there it goes ahead and Knows which one it's going to go. It has to know exactly which part you're going to program And so the place and route actually knows what to do with that It clumps netlist elements in their larger blocks so that it can make it more efficient Places them in the blocks on the FPGA and then roots the interconnect between them So when you hear place, it's placing the logical elements in the blocks and Then the rooting is how it's interconnected Across it. You're trying keep things. You're trying a lot of lump things in together to make them use these local interconnects Which there's lots of rather than trying to use these global interconnects, which there were fewer of so much like Ram and Rom in think of this is this is the RAM bus and this is a Much slower and longer path to go to some other blocks somewhere else You want to try and use as much as possible these green ones Because they're faster and there's there's more of them right next to it They're on the same level. They're on the same piece of the fabric. They're right next to it If you were to take a micrograph of it and open the lid on it and take a micrograph of it You'd look at it and these are they look like you've seen the pictures of RAM blocks You know well think of it that the address lines or the data lines are right there And then beyond that there's clock lines and beyond that and just keeps going out further and further And they may not even be on the same level and have to have interconnects that go between they go between layers It's it's it's all of that. It's all of everything. It's capacitance It's it's the speed of light things. It's how much how much the fabric has available to it How much of that very often you'll see very often this thing You'll see that it has so much local interconnect and then it'll say I have so many other that are global interconnect And I have so many local clock lines, and I have so many global clock lines The tool does That's what the tool is doing for you You you can you can write your HDL in such a way that you try and block it for it and make it better for it As the module and so that when it does the synthesis it attempts to lump things together So you can tweak your code Pretty much that's the same kind of thing only here We're optimizing the use of these logical elements and the buses that are surrounding them You probably couldn't because you don't know enough information about how this is organized to be able to do that That's what the tool is capable of doing. Yes, right exactly exactly Okay, I keep coming back and referring to this because this is the hard part is to get the concept of how When we're writing a hardware description language that we're talking about this we're going to use this lot because it's the game This logical unit and we're going to use this register, which is your IO piece Yeah, yeah Exactly Okay, so they pack Again, this is all in the tool. We're not good. We're not actually going to do this We're going to try and help the tool, but the tool is going to do this itself There's different versions of it and they keep rewriting the place and route tool because that's the tool where the Optimization really takes place and so they've gone through iterations of oh We'll start doing this and it'll work and it works in most of these cases, but not in this case So they rewrite the tool again and try and make it work better and more efficiently and get more out of the parts and Then eventually they root all the interconnects between them Okay So the current version that's out now, which is the which is the best for which is the best for Lattice parts is called next PNR and it depends on the technology in which they're using as to which tool is better or Which tour is more mature? I should say for that particular part So you'll have to do research depending on whether you're going to use an Arctic 7 or a Spartan 7 Versus using a lattice or some other part Okay All right, this is the place in real tools called it next PNR. That's like generation 3 tool Timing driven remember we're dealing with hardware. So since we're dealing with hardware We have to figure out what is what is our constraint? What is the thing that's driving this whole? This driving the whole tool in this case the tool is being driven by timing. It's trying to keep things that are Using the same clocks and using the same elements Close together You ever heard of clock cue anybody heard of clock cue it happens when you have this time of beat of light problem Across the chip because the chip has capacitance in it and so these global clock lines Have skew in them when you go between logical blocks So you want to if things want to be lined up perfectly you try and keep them in the same block That's what this tool tries to do tries to keep those things that are using the same clock in the same block Or at least close as close as possible so that you don't end up with clock skew so that they don't aren't off slightly supports Works McGossas again. That's the synthesis tool Supports the lattice parts. These are the ones that he's worked on the most You'll notice they says has visualization support We're visualizing what the timing looks like and we'll see some of it We'll see a demo of that in a bit and then this is his This is his at next P&R tool if you go to his flipper dot a key site and you go down and you find out go to the Yoast this site this Yoast this HQ site what you'll find is that it has a whole list of All these tools in order in and how do you actually build them you build them from scratch? Doesn't take long. It took I think on my laptop. It took about 45 minutes for everything on the desktop It took about 20. All right They have then assemble they take the place in routed netlist and they turn that into a bets bit stream That bit stream is what can be then actually programmed into the device Okay This is where the secret sauce lays for the proprietary bits The technology is undocumented by whoever and this is where you have to do the reverse engineering If we put these things in what do we get out? Oh, okay We got a tweak it to get what we want and we figure out how it's arranged inside and how you program it as you go through this iterative process and That's what's being done by the people who are working on on this project But on all the different architectures whether it be ice 40 or Arctic 7 or whatever Any questions on any of this I know it's a lot All right in keeping with our theme they call ice pack which is the assembler It's part of this ice storm project It turns the textural output of the place in route system into the binary bit stream And that's the one that you can actually program into the device now Because of to make this easier almost every one of the projects just uses a simple make file That follows all the steps and makes it to where that it does all the steps in order And you just give it the input files that you're going to use and you create your own You create your own make file But there's a lot of places where you can use that as a starting place and it makes it easier just to tweak them They're almost always MIT licensed, so it makes it really easy to use them in any project Okay, they use them All the tools that we talked about are Complete that is to say you can start with an HDL and you can end up with a bit stream that you can run in an actual device So these aren't oh, I can only go so far and then I've got to go somewhere else. It's all self-contained I storm is primarily related to the ice 40 Simba flow is an overriding pride an overarching project that is looking to create something just like these GCC For all the different architectures So you mix and match the tools that are appropriate in order to make it so that you can effectively so that you can effectively Work with whatever parts you have I'm gonna show you a slide which I took as when I was down in at LCA recently They had just moved this over Okay, let me just show you this slide. This is actually a stop motion of the talk that I went to So they're using project trellis. They're calling it. I just happen to stop it here, but it shows all of the tools So in this particular case because of project trellis They're looking at all these are separate projects that people are doing on their own X-ray and trellis all are look at these three right here are looking into different architectures and reverse engineering them This is the synthesis tool. Yoast is Used by all of them the architecture definitions. These are the ones that have been reversed engineered by these projects and As they get oh, yeah I understand how to make it tour that if you say this and we synthesize it down to something that it understands that the place and route understands Then we actually can create the the final output with it a RACD PNR with the original Place and route been replaced by the second generation which is the one we were just talking about which is next PNR It also has a test and verification uses Icarus. Anybody use Icarus. It's iverilog It's an interactive verilog out guide Python. It also allows you to do Do visualization of it? But this is the overall Flow that they're trying to do you create an HDL You tell it what architecture you're using it goes ahead and creates a uses a place and route And then from there it turns it in through the assembler into the actual bit stream. That's going to be used So it's very very similar to what we're trying to do with to what we're trying to do with To what we're trying to do with GCC. So that is what Simba flow looks like. It's a little more It's not as specific as ice storm Now the ice tools and all these tools are don't have a GUI interface to them. They're all done via command line Again, it hasn't been you know, it hasn't been turned into an IDE yet. Okay, any questions on any of that? Okay All right, so we're going to talk mostly about ice storm because that's the device. I brought Is an ice 40 chip? Verilog the bit stream goes all the way from the hardware description language all the way until you get a Bitstream that you can actually program into the device The tools are YOSIS next PNR and ice pack those three all work together There's also a programming tool called ice Prague and then there's a timing analysis tool called ice time And all of these are at this location If you're familiar with C Verilog is the easiest to start with because it uses both of the C constructs How many are used to using simulation and visualization tools? The one thing is because we're using These tools for timing you might want to look at the timing as to how the tools are going to create it and say oh that time Is off That's why it doesn't work. You're not going to have to actually yep This happens this happens this happens. Oh that one's late That's why it doesn't work because that one's late So you have to you'll have to tweak your your code in order to make it so that that code so that that happens on a particular clock cycle You can also apply triggers and constraints constraints are usually in timing. I am want this to happen 14 nanoseconds after something else happens after the clock cycle If you've not done it before it's Greek but once you start playing with it it becomes a lot easier to see and They're trying to make it so that you can actually do this visualization before you go to hardware You can actually see what's going on and make sure everything's lined up the way that you want it lined up Okay, the only use during development It's not actually used in real life because it runs like every other emulator. I don't it does not run in real time It runs slower than that Okay, so let's talk about the design choices a lot of this depends on where where what you're going to use It's going to be based upon where you came from if you come from C or Fortran You're very likely to want to use Verilog And the reason why is because it looks a lot like C Very good support in the false tool. It's a better fact It's the best supported in the false tools if you're a very if you're a VHDL person like I am You spend most of your time implementing things in in Verilog VHDL and by the way, this is not strongly typed just like just like C Verilog VHDL, however is very strongly typed. It's influenced by Aiden Pascal and has type system and Then there's a whole bunch of other HDL this chisel based on Scala if you know Scala It's an abstraction top of everything else. So it's like the C++ of that What was that? Oh Oh Wireless again being hit. Okay This is the normal workflow that you would have you'd implement it in the HDL You'd simulate it using a test bench. You'd get put stimulus into it. I've got a 10 megahertz clock You go ahead and use that as your stimulus you go ahead and Iterate through getting in all everything timed the way that you want it to be timed and Then eventually then you synthesize it to hardware in Very small test code. You don't you can go ahead and eliminate the test bench state And just see what it does and then you keep adjusting the HDL till you get it What you want just very similar to getting time you to work in C, right? If you've got something that's got very tight constraints or you've got an ISR That's too long you have to rewrite it until you get it to fit in the time frame a lot is very similar with HDL Remember we're working at the hardware level. We're not working at software So we're going to look at what's going on with a particular bit And it may be something that's inside of the part or it may you may be looking at the actual IO That's that's around the part So it could be all the way at the register level where you're looking at to see whether or not You're putting in the correct data into this register so that you can take it out again And use it for something else. This is an example of I very long now this particular one You've got some different output LEDs, and you've got an input of a hardware clock you sign the LEDs based on the count you increment the counter always at the positive edge of the hardware clock and Count is defined as count plus one, and there's the end of your module a register Is it is an illogical element a wire is a physical element in this particular case? They call it the top of the top and they have these these are the definitions of the port that they're using That's what that's the definition. They're looking at there in this What they're trying to do is they're trying to get the output of this simulation This is a test bench piece, and they're trying to send that out to an LXT file The output of running the simulation ends up going to this LXT file, which are going to then be looked at through GTK wave so You run that you run it through the VVP tool So you see actually what's going on it converts it to this it converts the Output to something that GTK can use and then it starts to look like this This is very hard to see if I could turn off the lights for a minute if somebody could turn off the lights I could show this a little bit No, you're gonna have to use the right hand lower right hand one that one right there, okay see it now So you can see that it has This is the LED one it's gonna blink at a faster rate LED two is a half the rate and so on and so forth You can see you can look at the different signals specifically As to which output you care about and which one you turn on and off over here so that you can visualize it you can see okay Well, let's see this one doesn't quite get turned on at the same time This one does and it's only running at approximately half of the clock rate and the same thing here It just keeps moving and that's because they haven't lined it up very well in their code They haven't said on the positive edge everything happens So oh that may not be what I want to do and I want to realign I want to line up everything then you have to start tweaking your HDL code to make that happen Till it actually lines up the way you want it to line up go ahead and turn it back on again Thanks any questions on that. He's gonna use the ice storm parts Because this is a piece of hardware We have to tell it what the pin map is from the internal pieces to the actual physical outside IO, so we use a PCF file to do that you'll see that and those are actually standard for whatever part you have in In addition to that the pin map file this PCF is also used for it Let's say a particular pin can only go so fast if you try and it could go faster than that It's not it's simply not going to you put the restrain information How fast that thing could go in this file and it makes it so that when the synthesizing tool is actually trying to synthesize it That it won't try and make that pin go too fast Where it can't go? There's all kinds of programmable logic of my devices available. Let me show you one that we have here This is the I stick It's $30 $29 it's got about a thousand logic elements on it and Of course, you can see that's got an LED on it. I programmed this with the accurate with the the tools that we have This particular one today has a 6502 microprocessor in it. It has 4k of ROM and 4k of RAM available to it and it has a Piece of C code running in it 6502c and some assembly language and it was all put into this now all used open source tools and that's all available out there on the net and Again, we use the single we use the single make file for it So you can get a lot done with one of these little things and there's a lot of open source Processors out there if you want to do that some of them are common some are not common You could use it is just without an operating system or I was sorry without having to have a processor in them You can just use them as logic elements depending on what you're trying to do with them a lot of them have little pins on Them so that they'll go on to one of the the common boards use them for for prototyping This one has a ribbon cable as well, so you can go off on to something else This one runs at about 12 megahertz. I mean in this one. It only runs at about 12 megahertz But they're inexpensive Devices out there that you can start playing with today Any questions? Yes The HX-8 is 8k logical elements the new version is I think the latest version is 50k elements That you can actually use in something that you can use I've seen 60 to 80 megs Yes, they're in the back hundreds of input lines The places that I've seen them use is a lot is in crypto Where they're used and they're used a lot in crypto they're used in video Well, you've got a lot of data that you're trying to move back and forth I've seen them used in other DSP applications you may not need all of the IO that is capable of doing but you need the internal bandwidth in order to be able to push 256 bits across at the same time when you're doing them some math and FPGA math Hard real-time you cannot do that with an arm doesn't happen. Yes. Go ahead Most of the time I'm using them as code processors of some kind or another. Yes, sir not there yet getting close Well, you can you could certainly a PCI bus is a known thing Okay, it's not it's not an unknown thing. The spec is out there. It's been out there forever Right PCIe same thing the question is a question of whether you can make the timing constraints or not You have to that's where you spend the bulk of your time isn't in writing the code It's in making sure that the hardware can meet the timing constraints and signal requirements of the bus that you're trying to emulate There's several of them There's crypto there's communications. There's all kinds of stuff. Yes, sir. Go back back there Yes, that's what I was going to show you what that's what we talked about earlier Remember I said within this pin map here this PCI file you can put constraints in it And those constraints allow you to say oh this particular pin isn't going to move any faster than this Okay, and so or it or I need this to happen 14 nanoseconds after something else happens We can put that in there as well in in some of the in the code that all that code is part of the simulator Code that's what I Icarus and I vera log is used for for creating The for turning that into something that you can visualize what the HDL is going to do Now that's going to show you roughly how it's going to work I'm going to use the term roughly because that hasn't had the Constraints of the actual physical device applied to it When we write a test bench that includes the the actual physical constraints of that particular device as applied to it then it's a very accurate representation of Of of all of those of what it is. It's actually going to do you like on an ASIC Or maybe you're actually programming multiple FPGAs Would you compile it differently and have a different bit stream come out? Possibly stripping out well the typical way is that you try and make it so that the Whatever it is your emulator and you're using is the same So you put an FPGA board out there with all of this little IO pins and you plug that into something else, which is going to be your Device that's actually that all the IO is set up on And if you've set up the constraints files correctly these PCF files and some of the other pieces It will very accurately emulate the final product Okay, of course as soon as you turn the crank on it on a ASIC. It's a million dollars It's ten million dollars. It's five million dollars. Whatever it is Depending on which founder you go to Yes, sir Pretty much Well, okay, so this is this looks like a hundred pin park It's solderable by people who understand how to solder reasonably well you can drag solder You could drag side of this by hand Or if you can get a carrier up this device in a carrier You'll notice that it also has some pin outputs over here as well as this ribbon cable here So 10th and center of stuff so that you can you've got some IO options If you want every bit of the IO you're gonna have to build your own carrier Okay, but you can a lot of these especially the low-end stuff comes in 44 pin packages or 32 pin packages that are hand solderable That's one of the reasons why it's changed from being. Oh, it's a 484 pin ball grid array which is impossible To solder by hand right we know that you're gonna have to have an oven You're gonna have to have six to eight layer board. Maybe you've 12 layer board. That's why everybody says, oh, I can't do FPGA You can't do an FPGA with a little one and They're fairly reasonable as far as what they have like I said this one right here has 4k of RAM and 4k of ROM Which means that if you haven't a semi language program running in this processor, that's a lot of code and that's a lot of RAM And it's cheap. This is 29 bucks Well, you need a clock you need a power regulator and you need some form of being able to do IO and some way to actually program it absolutely There's often times the older processors had multiple the office or the of the older versions of FPGA's had like four or five voltages that you were required to make them work the newer ones Just simple single one and they have an on-board regulator to solve a lot of the problems You don't have to have a 1.2 volt for internal and a 5 volt for this and a 3.3 for something else The bit stream the bit stream gets loaded usually into an SPI flash or directly loaded into the device via SPI That's the normal way Well in this particular case, I'll let you see they use yep They use this one uses an FTDI which emulates spy So it goes USB to spy, right? Let's go back What it does is it tells the fabric when it's time to configure the fabric way at the beginning Tells the fabric how to configure all these blocks how to configure all these Interconnects how to configure all of the which which block gets which Gets which clock signal all those things come from that. That's all part of that bit stream That's all part of the encoded bit stream I say it's encoded because it has the super secret sauce that tells this how to configure itself Okay, sure This is a generic by the way. This isn't a real thing Thousand of these logical blocks and they have so much block RAM and they have so much stuff and block RAM is what? It's a flip-flop That's a bit right a block of and depending on and by the way you can configure these to be 8k is one bit times 8k or 8 bits times 1k or whatever configuration you want And if you want a 256 bit wide thing Knock yourself out. You can do that now You the timing gets to be a little challenging when you're trying trying to make that many things wide, right? Because they're probably not in a single block here They're gonna take multiple blocks in order to do that 256 bit wide thing so timing gets to be a little challenging at times if you need to do that Depends on the device every device is different so rather than say it's this Every single family is different every single part is different. I Know it's not a standard because this is all secret sauce for whatever the the fabric vendor is No depends on what family you're using depends on what they're doing. There is no standard I know yes in the back sir the what again I stick there's no analog on it Most FPGAs are not mixed signal Now there are a few that are mixed signal the ones that I've used that are mixed signal generally started about $1,200 apiece So I try not to use them because they're a lot of my price range now The one thing that's cool about this is as we talked about earlier when somebody asked me about the analog to digital stuff and that is An FPGA can be used to to clear that high-speed ADC on a camera very very quickly Matter of fact FPGAs are used If you take a look at all of the high-speed stuff It's almost always paired with an FPGA or a CPLD some kind of logic. That's really quick at Making a FIFO Any other questions? Okay, thank you very much for your attention. I ran a little bit long, but that's okay. Thank you. Oh good even better Yeah, it's one of those One of those How are you doing? Oh I used to run a meter. Okay I Know One two three four, okay testing Test test test test one two three one two, okay Good afternoon. Welcome to this afternoon's three o'clock session We're gonna be talking about the Internet of thingies and can that should be did I say that right? Yes, perfect. We'll be talking about that this afternoon. Take it away All right. Thanks for sticking around. I hope you had a good conference so far So this talk It's called the Internet of thingies. I've given it a couple years ago here with an older iteration of the microcontroller We'll get into it. So I'm gonna be a bad speaker and stand out here so With any project electronic like this you need to ask yourself what your goals are for it and What are the parameters? Where are the limitations? That kind of thing so These are some general types of Considerations I guess Power with so actually let me stop and ask ask you who's familiar with using Microcontrollers and small electronics and things like that. Okay, so about half the room so So voltage and and draw through these devices is a very important consideration because You can't always mix and match if you do you have to do a little more design sometimes and and make it all work, but So that's why it's at the top And these aren't really in order The how your devices will communicate if that's what they're going to do Things like community is important if it's a if it's a dead community or not supported or Is archaic then you're kind of on your own Unless the Internet archives has something things like that So in terms of how to Find something that fits your project. This is just an example in the Direction that I was going with this project. So I was comparing Microcontrollers that are pretty easily available pretty inexpensive and More and more capable. There are a lot of options out there So even even this kind of choice is overwhelming So I'm I'm going after the ESP 32 in this case This is this is an older project and I have it on here because It's been bouncing around in a pickup truck and lots of Horrible environments just partly to test the durability of these things This is a ESP 8266 controller right there that is just got two temperature sensors and and Shows it to a little OLED display But that was that was one of my requirements because I'm interested in making This is your basic Toyota LCD digital clock That pretty much everything in the 80s and 90s and 2000 had And I wanted to replace it with something better. So I'm working on Kind of a really multi-function display Now I have a newer truck and it's got even more space to put things in so kind of in over my head So language considerations I put two up That are pretty popular But there are quite Large differences between them I think So again depending upon how comfortable you are with a certain language Maybe there's a use for it and you can implement it on on the project that you're interested in For me, I mean I'm using Arduino like C++ basically because library availability Flexibility within the libraries you have many options for the same type of device So it's a it's a pretty flexible Setup takes little memory And and works fast even though my application really doesn't need speed So back to back to power requirements I'm doing repeating because I think this is Pretty important thing in in the case that I'm working with on this project. I can I can deal up to higher voltage with devices and so converting that down to Your typical hobbyist microcontroller is kind of a trick so It was very much on my mind looking looking at this There's some communication considerations the IP addressing down at the bottom What I'm doing is going to be used on a small university campus. So How you do if you're doing automatic addressing or static and you need to manage those devices There are lots of dependencies that way to Keep track of where things are Give them proper names do everything you do for a network or system management use and Finally a community I touched on it, but I think it's a important piece of it I'm not really a fan of Needing an app to get into a community. So that's why I crossed it out The easier to access the community. I think the better and some people like it doing going through apps. So, you know This is my opinion So Again, it's all about choose what fits best for your project in you and what you're comfortable with and That will get you started. Don't Don't begin with barriers. I guess would be the would be the idea So the chip that I'm using this time is the ESP 32 It's it's the next generation basically of the 8266 that expressive makes It's a very capable chip. Here's a Hopefully not a Chinese replica that Of an original chip, but it's one of those things that it's sometimes hard to know this one this in particular has a few features It's got a battery port and charging logic built-in Because these chips tend to be very efficient or can be very efficient with power It's got a built-in USB connector for programming. So it's kind of a everything everything on one board And I think it was like about six dollars. So Incredibly cheap. I want to say like four or five years ago When they 8266's came out they were up around fifty sixty dollars a chip for a board and Now they're super super inexpensive So more on the 32 this is I I think this is impressive the amount of options you have the amount of Connectivity to different buses and different ways of communicating is astounding on this chip Very common one that I'll be using is the the ITC and I'm very interested in using the CAN bus To get data out of my vehicles to make use of and displays or other kinds of things So this will interface to that I have a lot of learning to do before I get to that point, but you know It's it's the possibilities are there and they're built in in a very inexpensive package This chip is also a little special because It has a lot of options for power control or power management With the one that interests me is the steep-sleep mode where there's a It'll be the next slide But it's an ultra low-power co-processor that it in and of itself can remain awake with the main core shut down And it can even Interact with sensors connected to the board and make decisions to wake up the main processor if certain things happen However, you program it in so it's Possibilities again. This this would get you, you know many months on a small lithium-ion power pack so remote locations or Very difficult to get power to this is this could be a base for it for a nice project There's more about the ULP and just some ideas of Amp draws in the different modes and the different resources on it Though what the radio chips use a lot of power, but it they that also depends on what type of What type of mode that you select it to be in so in the 82 66 is the older generation There were a lot of manufacturers building different designs of boards. It was kind of a new frontier in a way Arduino kind of really started But then then this came along and had built built-in Wi-Fi and that was pretty exciting But in those days you had to do this kind of double conversion to get an understanding of which Which pin was which because you had standard Diagrams and then you had to go find or test and discover The pins to map it correctly So in on this chip all you have to do is go to a standard get a standard diagram of the 32 And you're basically numbering the pins as the the label on the Standard diagram and here's an example of that. This is going to be the chip that I'm specifically working with but here you see I Guess the ESP 32 labels and these are all different buses possible and kind of that pin intended use and the Just is the header that that is on this particular board But for example, if I want to use IO4 for one of my uses I just define it in in code as I as for and off you go It's light years easier than it was in the past So the board that I'm working with is a project on hackaday.io They were here. I don't I think they were a sponsor So I came across this somehow and a guy was working on Power over ethernet board the W stands for wired It does have Wi-Fi on it and it fully works has Bluetooth just like it has all standard features of the ESP 32 But wired was particularly interesting because on a campus and in multiple closets and data center rooms Running power is a hassle. So one wire to do everything Communication and power was very attractive and in a network environment. You have a lot of switch ports generally available and switches are generally PoE now. This is an older switch. That's just a 10 100 megabit switch and it's the base level the the first sort of iteration of power over ethernet and it's Works perfectly fine. It's probably 20 or 30 dollars used and There are a million of them out there This is the the basics of the chip. It's a it's Basically a standard chip with PoE added and so a lot of the circuitry on the chip is dealing with power Manipulation and and the ethernet communication and then heat dissipation to because that is a thing when you go to different Change different voltages you've got 20 pins to work with and You've got some options on how to power it as well You don't need to use ethernet, but that's the idea for it So I would think that would be the use case, but you can power it off of a off the v-plus Pin as well it can deliver some limited power to your peripherals so fairly flexible kind of a Board here's more about power of ethernet. Bless you So this is the standard and the the board conforms to the standards There's some details about delivery and Kind of design of the board. This is all from the kind of over overview Data sheet that the project has Here's the use case Typical network closet has some Maybe some HVAC controls. Maybe security controls other things I don't think it's this room, but typically we would see We I live in Oregon and it's wet and we have a creek running through campus and Sometimes that causes trouble with buildings and I'm dealing with 100-year-old plus buildings sometimes so Things get a little messy. So one of the use cases is I I want to Provide a way to detect Water existing on the floor so that we can get a hold of a situation before somebody loses power Things short out and you know more trouble and expense occurs so These are kind of my ideas for how I want to accomplish this This is one of the sensors. I want to use this is just the temperature humidity and air pressure sensor I've used it a lot. It's a good chip I did kind of an evaluation of a few different sensors and I ended up liking this because it was very regular and the data that it returned was Consistent in how it changed some some chips will Temperature humidity changes and they'll they'll sit and sit sit and then jump and this one is more of a smooth Arc that it when when things change. So I like it simple to Connect because it uses I to see and that's basically four wires A data clock and then ground and power. So it's super easy and it's inexpensive. I Guess one note. I'll say is Don't get a chip that says BMP 180 Without the e that just means it's a temperature and pressure only so you you won't get humidity if that's what you're after They're more common a couple three years ago and now these are kind of more common There are plenty of other types of chips out there, too. This has a lot of libraries for it or Variety of libraries. So you have a few more options with it It's not typically the one you see with Instructables or other kinds of projects This is so this is I'm going to run through some parts of the code This is all in Arduino C++. So this is this is just a sensor setup So I have to identify the libraries I need This can be an adventure sometimes But this is a typical kind of a setup I define a the pins for the data and the clock and again, it's that easy It's that easy numbering scheme. I Initial wire library and initialize it this way different different chips different library or different libraries for different chips many approaches, so you kind of have to roll with the punches and Use the one that accommodates you the best I guess I would say The only thing in here is it's basically now BME is kind of the reference for you can whatever you call it whatever you want, but it's the reference for accessing this this sensor, let's see, so this is This is this is initializing the chip and basically doing a check to make sure that You've wired it correctly or that something's not wrong Actually today. I was setting one up and I had two wires crossed and of course this This this came up so I was like are you kidding is something wrong with the board and sure enough pins were swapped so Yay for error checking The ethernet side is a different game it's a bit of a hunt to Understand how things are connected I think because this isn't as common and The library I am using is based upon the chip maker expressive library And so it's I would say it's like very raw in form To typify it so you have a lot of definitions going on here to pass to pass along To to bring the the chip up or the that piece of it up And so I'm going to go back and forth a little bit But we'll go find some of these things and I'll give you an example of what you may have to do sometimes to make these things work So to start with we have this clock mode and I've I've got it set to the GPIO zero input right and so How do you know to do that because there's usually not good instructions on this and so That one lives I was down Where the heck did it go? Am I blind? right here clock in I o zero and I o zero this is the actual If p32 ship I o zero is right here and it is I o zero so easy, right? I mean once you find it it's easy Likewise, I know that this chip is a land 87 20 base ethernet so That's like documentation. So that's easy to set As well as the east type easy to set and then the next one is the I Guess it is for the line a 270 of the power pin. So this is I think documentation as well to use internal APLL source so That's just kind of digging through documentation and and matching things up and Trying it and then being surprised that it works The final two are these two this This mdi o and the MDC and so that was a little bit of a hunt as well they Are right here and it shows 13 and 12 and So you come over here and you find you Find them right here MDIO MDC, so I guess the numbers don't match up on that one and that turns into 17 and 16 which is I'll get this right hang on 16 and 17 were the ones on the diagram and those are the two to find so That's a little bit of the effort that can be involved in in doing something like that So the next I guess would be a layer two layer three networking like the logical kind of level And so that was also a bit of a journey to understand The way that these are done when I first the previous iteration of these When I tried to you statically give a static IP address to the to the chip that was a Wi-Fi chip It ended up being really difficult To find out how to do that and I'm not good at reading libraries so well, so I'm getting better But this is ultimately the same set up that I used for the previous generation so IP addressing is not bad I've got serial stuff in here because this acts as a Validation that things are set up how I want them So the other part of it too was that These don't really well they do and they don't come with a Mac address Let's pretend they don't And so what do you use for Mac address because on a network you can't have To the same Macs or a communication will work similar with IP address. So There is a concept of like private IP address. There's a concept of local use Mac addresses and so That ended up being the result of this The chances that you run into trouble are pretty low, but it can happen. So It's worth it to research it and make sure you're not Stepping on a server or something your DNS server so this is just For how to read a couple of values from the sensor temperature and humidity. It's pretty straightforward again Toward the start I said I just called it BME for the sensor And so now it's it's doing the the functions built into the library to get the data and then print it out to the serial console Pretty straightforward. There's Many examples on that kind of thing. Yeah Which one? many It's it's all about defining That's what this is It's about defining how you're going to do it and so I've done it in the past as building mini web server on the chip and scraping it for data With XML collector or a HTTP collector my Intention this time is to use MQTT to push to its stream telemetry kind of a approach so that if I chose in a different project to Use like a battery powered one where you'd have everything very very asleep and maybe every 15 minutes You wake it up and and send a little piece of data and it shuts back down so that you can extend the life of it I can make use of that logic And in this case it's all wired So these are these are all the devices and they're they're sending to a broker they have client libraries running on it and configured and I have two different examples here one is To store this MQTT data in influx DB and and graph it with Grafana the other the other way is to use a network management system and If it has a streaming telemetry in it take advantage of that use it as the broker put it into its whatever its time series a data store is and Send it to Grafana and likewise. I drew this other path because I'm also interested in not only metrics but Alert conditions So there are other mechanisms that can be involved in that so there are a lot of options. There's that's two options and I would say there's dozens more ways you can go about you could do something as simple as Have your chip send us this log data and then have splunk or elastic or whatever Go and and make use of the data that way and store it as a time series. So I mean the lots of lots of approaches to doing this but the The chip itself lends it to it because you can define whatever protocol you want and you can define The message being sent very pretty explicitly. So I think this guy's a limit in in some way Love it That's why I went this route to because you're not constricted to what a vendor defines The capability of the device to you're kind of out on your own because you're programming it and therefore you're a little responsible for it or your team is but You can choose everything So it's not all smooth sailing There are some general things I've learned over the years by multiples because things will come dead and you will kill things and right and and things will just get flaky or maybe Your solder leaks over and and causes weird behavior that you can't find. I mean there's so many electrical things that can go on But Prepare for it and these are at a price level where it's not super hard to Have multiple multiple wiring multiple power supplies. Yeah Nothing yet. I can tell you the 80 to 66 is a champion and The so I I do a lot of like off-road Exploration and I'm in dusty sea salty hot cold Many different environments like 110 115, you know Death Valley or hot day here Yeah, yep, I mean they all have specs and they're built to right there well within that kind of stuff So yeah, I'm surprised the one in my old pickup is not dead yet and happy that it's not dead yet But so yeah so far so good. I haven't also Built a lot of protection to the chip either. It's not enclosed It's not in any kind of vibration resistant kind of set up nothing And it's just it's just zip tied to the dash and off it goes, right? So That's part of the testing though. I intentionally did that to see what it could take we uh, I worked for the state of Oregon as a network engineer for many years and and we had a flood router once we called it where it was submerged and for a day or two and Took weeks or whatever and let it dry out cleaned it all this kind of stuff fired it up and off it went so there's you know, it's an ancient router now, but Some things tend to be are super well-built so That probably gave me the idea to test this thing this way it was the past experience Uh So IDEs are interesting They're they're easy When you have this clean example and you go implement the steps and you do your thing and it works and yay but things get interesting when there are significant shifts in technology or Devices that come up and people want to Adapt their their application to the new thing so that people use it and you know, it's a good thing so That's that's when things get messy Linux has been a better experience and it with using an IDE because file system to me is cleaner like for Windows you'll have You'll have like sudo kinds of folders things that the system references and it causes some havoc sometimes on on a development environment because assumed paths and things like that get really weird so That's just been my experience. I've found Linux to be a much nicer environment to develop in And then I alluded to the bottom part where so when the ESP 32 came out Everybody wanted to make their libraries work with it right so ate a fruit bless them Did some work and made the old libraries compatible with the new chips and without a lot of hassle but in doing that right they they load a requirements like you have to use the newest version of Arduino IDE and you you know you have these other things and even then you you may have to do house housekeeping and cleaning up of things and moving things around and making it work so You know I think the struggle is worth it because you get a better understanding of how your environment works And you can maybe do more with it and it helps for troubleshooting debugging so mixed bag Libraries are funny things because they're made by people and people do things for the reasons they do them right to Address their issue and and make something work in the way that they chose right they don't they don't make it for you They made it for them so Given that I mean being aware of that helps understand Some difficulties you might run into and in doing dealing with these things so that's all I'm really trying to point out here and so in my case It stopped me for the moment from using MQTT as a as a telemetry streaming technology because the library that The library that the board works with Doesn't Work with the the pub sub client library, which is the Arduino or Adafruit MQTT like default kind of library right people have written many more libraries So it's going to be an adventure and learning and trying to find the right fit I've done this in the past. So I know it can be done or I may need to modify I may need to modify a library myself or write one and that just means I get to learn more things to Make things work the way that I want them to other things that crop up are Things that are hard-coded in libraries, which you don't expect To happen, but again people write code. So why not? I know that I've I've been lazy in certain instances and done things like this. So Why wouldn't somebody else kind of a thing? So you just have to be aware that Why isn't this working? Why isn't this working? Finally you get to the point where Okay, I need to look at everything and see if anything jumps out and sometimes things are very well documented and Leaps out and you can deal with it and move move on, but it's just something more things to be aware of things can get weird so this goes back to Dealing with the the issue I found in the ethernet libraries and Again, I'm it just makes me more excited to work on this and gain deeper understanding The other things I ran into were these two pins Well to so so for using the I2C bus for the sensor to use the data and clock You see I2C 0 and I2C 0 However 2 and 12 are a little special come to find out in that They can prevent you from uploading your code to the chip If you've got something attached to it, so that's bug Bug reports tickets on GitHub things like that you you investigate and you eventually find it However, because pin assignment are so easy for the ESP 32 that you just pick something else and Check it make sure it works. Make sure it doesn't smoke your sensor and you're good to go So the work around easy, but coming up against that again It's like that's another variety of ways that that you can hit a bump. So That's talk I have some devices I was going to do a demo, but I don't think the demo is is that exciting the I Was going to just think like breathe on it show temperature Raise and all that kind of stuff, but because I couldn't get MQTT Working I couldn't build the dashboard do all the rest of the stuff that just shows you on a screen So looking at console logs are kind of boring But it works the whole thing works We have stereo lithography Frinter at the school which I never knew anything about it's liquid resin base and it prints things upside down and builds the Software builds the structure so that as it's printing it doesn't break and there's lots of stuff to it Unfortunately after it cures it it this material that happened to be in use is very brittle So the idea was to and I will but this is the very first draft was to build a network rack Capable enclosure so you can just bolt it to the rack and run an ethernet cable and you're done So that's I'll get to that point But as I started drilling in it it things even if you do find size increment To get up to the size of hole you want it still starts cracking the edge of the bit will catch it and it'll just fracture things so You know revision But that's that's the idea with the switch But this is just showing kind of an example of a few devices in action and they're all they all work ethernet works They're all ping-a-bowl Another goal is to be able to update them over the wire and modify the code that way so there's a lot of things Details I can work out. Yeah Many Yeah, lowland Yep, partly that Aliexpress is how I tend to buy chips and components so it had a price advantage and That was about just over six dollars. You probably get breaks for every five or ten or twenty that you get and I had used previously 8266 as I used we most chips and This is that I don't know exactly how it works, but this is like the same company and Like and like I said at the start. I'm who knows if the chip I got or the board I got is a knockoff but It seems to work. Well, like I there haven't been surprises yet, so I've got it here. Actually, you can you can check it out. Oh, and I don't know if Where has it gone? Does anybody recognize the background? It's your cover of your program guide Because some of us Work at the last minute So, yeah, if you want to take a look just be careful for static electricity, so probably don't touch too much, but Thanks for attending Moisture they're both. Yeah And then you can put in a reference pressure and get a rough out Here in a mobile situation, yeah Right, cuz I know Yeah, what's north and what's true north Yeah Wow, yeah, but you find that chip to be reliable. Yeah, because I have four I hope I have four nose. I don't why I do five nails at home. It's just reporting temperature I have a TV You know that with a dashboard that I feel Yep One of the things I wanted to know is how hot does it get in the attic? Yeah Stable does the basement right and and so part of the attic with softers like one can I go up and work in the Yeah, right? When's it not freezing? It was right 120 degrees So I kind of enhance mine with my family activities Hey guys, they got the trash and we should make this up and then they get points for doing that I Want to ask you a question It's got five wires because I just Okay, there's really only four functional. Yeah, some of those BME boards have you have five and they use a different bus. Oh, really? Oh, yeah, you have a choice. Oh Okay, that doesn't mean these yep, and these are native fruit No, no, no, they're just that's Bosch B is Bosch BME to 80 They make they make one that's really good as well. Okay Serious I Okay, and they're you doing fabricated? Yeah, we actually fabricated You're on your way to I forgot it was They will be your friends forever Every time you go in they'll give you a coupon Yeah Yep, cool What's the what is there Two or three weeks it depends Sometimes it's super fast because they'll stop warehouse, right? Yeah one here and it's like hit and miss Oh Yeah You know basically you can Watch by the time you click enter it'll be six weeks. Yeah Yeah But he's like, you know it's like gosh you do off parts like turn around basically whenever they fill up It's like an optimization program. Yeah, whenever they go up the big board So I heard the turn around time of those like days 16 week class Weeks We made a circuit board Yeah, and I still have it Speaking you know, yeah Just really I mean like Oh There's so many variations, but I just I know and that's but I'd rather get it right The hall everybody's like You didn't I think you were too much right on I I Yeah I I Yeah Okay, okay, let me see what it does Oh it manages And let's say this is Libri office. Okay. Well, that's even better. I don't even know what manages it because I see different. Yeah, that's good One two three four, okay If something else will work we still got about nine minutes. Yeah, I want to see if something else will work I I I Give you a little Oh But it's like She brings she's still trying to figure it out like she's going to do it so I just started using like, I have like three days to print something for this right, you know, where do I need to start knowing nothing, right? so Autodesk has a young little tool, a web design tool, right? and unfortunately the first thing they ask for is your first name oh god, LibreOffice went crazy it won't open the slides now okay, this is a problem it wouldn't, okay, why isn't that loading up the slides now? it was working just a minute ago oh I know, I was panicking yeah oh I know, I was panicking I'm happy, yeah, good last time I did this talk with the old chip yeah, I remember I'm like, I'm glad I'm using this yeah, exactly, it's like wait a minute, you know about that and I didn't well it's kind of like when you tell people that there actually is open source tools I thought you had to use all the proprietary aren't you guys in trouble for doing that? no, we reverse engineered it, who cares? that's their problem, so awesome it is, you get a lot of fun welcome trying to recover it's a big trouble it's a big trouble we can't work enough to make sure it works but when the quality goes down and we have a balance the pullback on it is not as good as the drag but it learns about the Pino and of course depending on what material it also matters if you had PLA, you could do yellow with one speed in temperature but you had to do pink at a different speed and purple at a different speed this will have to do because theater mode did not work it messes up everything else, or let me see if I can get rid of it let me just use that PDF that's why I end up using the PDF the whole time welcome everybody we're going to get started here in just a minute, as soon as he's ready this is the final talk of the embedded track for this year, we've been doing embedded since Thursday morning, between training, IOT and we're also doing open embedded down below this is a build system and also this track today we've talked about FPGAs and we've talked about new parts that are risk 5 and it's coming along and Ken just talked about the internet of things and all the wonderful things we can do embedded and now Hunyeh? Hunyeh I'm always getting it wrong he's going to talk to us about machine learning which is a new field that a lot of people are spending a lot of time in and right now in the embedded space it's really hot, so take it away hello everyone, thank you for coming, thank you for staying for the very last session of the conference here I'm going to be talking about machine learning the reason that I want to talk about this is that I've been working on several projects mainly for a customer, I'm an independent consultant so I work on all sorts of different things most of what I do is on a system level getting embedded devices working there and my customers typically are our room specialists, they understand all the stuff about machine learning but they don't know about actual underlying hardware, that's where we kind of work together there but it's really interesting what they could do, they could take that sensor and all of a sudden get all sorts of information out of it there so I figure why not learn something about that and when I went through all this stuff there I noticed there was some difficult areas like references that would point me to academic papers, things like that there doesn't always make sense from a system standpoint there, so I went through there figured out some of these things there and I'm hoping that I could transfer some of this knowledge that I went through for you guys that want to do something simple the goal here just so that I have some kind of goal is to be able to take a sensor measure something there and then tell me what it is you might have seen something similar out there where people take a camera, they aim it at something and they'll identify what it is, similar idea but on a much simpler level so with that in mind, this is roughly the things I'll be going through there it's not exactly the exact order but some of the things they kind of interact with each other, so the goals here just so that we have found ways to make things easy is that whatever tools I'm using it's going to be C-based and it needs to be simple to build because I don't want to waste time dealing with build issues figuring out all the build systems while you need this library and then you go down this whole spiral where you want one library, another library, another library and these tools out there, they are based on Python there you could use Python, the problem with that is if you're trying to build embedded device and you have ported something else there you've got to go build Python yourself which means you've got to support all the other things and most of the time Python is how you use pip install to pull these other things there it's easy, you're in a desktop, all the libraries are there, well pip installs are sometimes in both C-compilers, all of a sudden you have an embedded device and you have a working C-compiler and a whole bunch of other tools, that's why I'm saying I go off, you've got to be simple if possible, there's no virtual machines involved, no other weird libraries to pull in there and as I just said, the goal would be we'll be analyzing some kind of sample and be able to identify what you could do machine learning could certainly do a lot more there, like I was hinting earlier you could do things with cameras, identify pictures, but that's really getting out ahead of ourselves let's look at something very simple where you just have a sensor and you're just processing a data there and all kind of finite something that fits in the one hour format is that we won't be looking at images, though I will be talking about things where you would do differently if it was a case of an image there another thing is that the inputs we'll be talking about would be static, they're fixed, they come from my sensor so that means if I get a reading now, it's going to be independent of time in contrast to a more dynamic input would be like a voice, a sound where you have a time series you have things where the time is actually an element in your input there another thing is we're looking at purely software options there for some of the more complicated things there, there are certainly hardware accelerators you could use GPU, some of the embedded processors have hardware that accelerates machine learning, but we're trying to figure out the basics here so let's forget about all these acceleration options there and another thing is I'm coming from an embedded background there, so I'm looking at this from an embedded device where you have control of the whole distribution there, you're not looking at desktop you're not going to have something like say you're running Ubuntu where you can just do app get whatever to install something there, so that's feedback to my first requirement it needs to have relatively few dependencies, something that you can actually manage another side concern here is potentially licensing because you're putting an embedded device oftentimes there, if you go have some really restrictive license you really can't do too much with it just because what the license imposes there so let's start with some of the basics there, what is machine learning different from all the other things that you might have heard like traditionally there's ways to look at data and then come up with some kind of identification of things there while machine learning is basically just taking a bunch of inputs there and you're making inferences based on something that you have trained the device for so you're looking for patterns in the things that you have trained machine with that's in a nutshell what machine learning is, all the different forms there be it images apple identification, prediction of patterns there they're taking inputs whatever you use to make your prediction with and you're trying to look for how well it matches the previously trained pattern there that's your training data there, this is in some ways very similar to how a newborn, a new kid, a young person would learn you were your first learning things there, you don't know what an apple is you get shown a picture of a nice red apple labeled apple there and after a while you've seen enough of that you realize you go to the supermarket, that's an apple, machine learning is very similar to this thing now let's look at how is this really different from something that you may have done before without machine learning there let's say you have a sensor that produces some kind of number output there and you want to find out based on that sensor what is that thing there what do you just measure, well the traditional way may be something along the lines of if the input is within a given range you would say it's this basically or what I'm calling outcome one here, if it's not that there we look for is reading in a different range and repeat that well in the machine learning approach what you would do is you look at you create a bunch of representative samples of those things there and just label them so you identify to machine what are those readings there let's say if you're reading in case my demo will be doing is I'll be trying to identify fake sugar from regular sugar so I made some measurements of what is readings I get from sugar what are readings that I get from fake sugar there and I train the machine with that data there then I feed that new reading into there the machine learning that would go through all that and determine it does this match a sugar pattern better or does it match the non-sugar pattern better you notice that one thing here is you don't actually have to figure out thresholds machine learning will figure out how the threshold is different how the different combinations are so it takes that part of it and simplifies it another thing to keep in mind is that machine learning this is not a new idea a lot of the stuff dates back to research from the late 70's 80's there the big difference right now is we have a lot of computing power even in the embedded space which typically tends to lag a little bit behind lag a little bit behind compared to like a desktop machine you still have a lot of computing power this is the main thing that makes it practice one of the big barriers that I found that whenever you look at a new topic or new subject is vocabulary everybody likes to use different vocabularies they like to have their own abbreviations for things there and it makes it very hard because if you don't know vocabulary you try to read documentation tutorials papers on there you're just completely lost while this is not a complete list that I'm providing there this is some of the more annoying terms that run across there neural networks I'll explain what they are later on there but neural networks they're described different ways depending on which different variation there are artificial neural networks that's to show that how they're different from the neurons in an actual biological subject like your brain there is a DNN a deep neural network there's a convolutional neural network there where basically you apply convolution in front of the neural network you have a recurrent neural network that's a neural network where you take the output from a neural network and you feed it back to yourself so you get a sense of time so you could say look at the previous set of readings there and make a decision based on what happened previously then there's a simple neural network that's kind of the opposite of a deep neural network deep neural network is a network where you have one neural network that feeds it to another one which may feed into another one there so you have a cascading chain of neural networks there another concept is I'm not going to go through all these since I don't think I actually have time to go through all my slides and actually get to the end I'm going to point out some more important ones since the slides will be posted either on the scale website or at least on my website it will be available one way or another ok labels so when you train a machine there you provide it with data there and then you need to be able to tell the machine what that data associated with it that name that you have there labeling your data there and the training data is what they call the truth this is what it is in the real world there so a lot of times you might read about yeah if you get this much data there you should divide your sample data into a half we call that the truth or you have some way of confirming that's the truth you label that use effort training and then you could use the other half of your data there for test purposes at the other ones you spoke to know but I won't go into too much detail now it's just an interest of time ok the overall flow if you were to apply neural network or not neural network specifically but machine learning in general to a particular problem there the first thing you need to do is figure out how do you want to deal with data how do you want to deal with a problem build a bottle are you going to be taking measurements based on some specific characteristic there you need to map it out ultimately a computer works with numbers there so you need to figure out what are you going to be measuring and how do you feed it in there you can't just say I'm going to be based on color and call it red yellow blue computers don't quite understand that you need to look at are you feeding it an RGB sequence are you feeding it input from my spectrometer so you need to build a model same thing with output there what are you actually going to be measuring there are you going to be measuring are you going to be outputting names of things there specific types category groups there so you need to build a model the next thing you need to do is have data this could be something that you measure or generally you would measure this thing this kind of thing there but sometimes you may have to have subjects that would do things like one example is let's say you want to measure or you want to have to figure out is someone running around so you may put an accelerometer on that person there and then you take some measurements accelerometer and you feed into a machine learning system there you train with that data there but that's the data you need that data to tell the machine this is running and you also probably need to tell the machine this one is not running so you need to come up free and identify data there you can't just look at a book say oh in theory it predicts that under these conditions you're running or these conditions are true that's usually not sufficient you want to have real-world data to train a machine there and the data should cover a wide range so it could be that you may have a short person running if you want to use an example of running there you may also want to have a really tall person running there you may want to have examples of a person running through a very rough road there just different examples there just like if you have a kid there you're trying to teach them something there you want to show them the different variations that are all mapped to the same thing there that's why you need a proper training data that covers a range otherwise the machine will not know all these other variations once you have all that data there you need to give it to the machine learning algorithms there it will go take all that thing there and it will go right through itself build its own network build up the internal models coefficients depending on which variation of machine learning you use there and then after you have all that there you need to somehow take the numbers and map it back to your problem there it goes back to your model there remember machines they work with numbers not actual text text is for human so you need to do some mapping after you're done with all this stuff there and I'll show you in more detail later on machine learning there are all sorts of variations out there but there's just two large categories that's generally in use there there's a concept of a neural network this is very similar to what would be the neurons in a brain a biological it's model after the same idea there there are many different variants out there as I mentioned in the previous slide back with the vocabulary depending on how you arrange the neurons there you get different types of neural networks there and that could be very confusing if you're just first time jumping in looking at the stuff there another thing to know is that neural networks are just multiplication additions it's pretty straightforward you just do a lot of multiplication additions there the other common type of machine learning is support vector machines there this one basically says you have data there another common example I have a diagram later on there would be you have two sorts of inputs there and you just somehow you figure out what's the best way to draw a line to separate two regions there and then when you have new data you figure out where sits in that whole graph and that's your output there it's a very simple concept it doesn't require a lot of CPU to do the hardest thing in both of these cases is actually the training part where you have to teach a machine neural networks there now let's look into a little bit more detail about that there a neural network is just an array of neurons neurons are very simple ideas I keep pointing out similar to what's in a biological brain and depending on how the different neurons are connected there you have different types of neural network like you have a deep neural network there you would have an array of the neurons there that would connect to another one there mostly common neural networks are fully connected so sometimes if you want to build a specialized type of it you may not want to have connectivity by fully connected and off pictures I think a next slide or almost slide later on where the neurons there you may not want to have them all connected if you have too many of those there so that's what would drive which particular type of neural network you have there and neuron all does this it just multiplies whatever input is by a particular weight that's usually a number less than one and that's the neuron when you are connected with another neuron there you add up all the outputs of the free of all the neurons there and that's your output from there so it's very simple the simplest neural network that you could build would be something that you would have an input layer you have an inside layer that has all the different weights there and then you have an output layer there that will give you your output or your predictions there it's neural networks there trying to decide if I should okay yeah it's worthwhile talking about this given I'm just worried about time right now the neural networks there if you have the inputs of a neural network you could apply other operations to it it's not just simply machine learning by itself there there's different things you could do you could let's say convert your data instead of looking at images where you have a lot of pixels these days there you have a phone camera you talk about 13 pixels 30 million pixels 20 million pixels there each pixel in turn has three values RGB that's a lot of data to feed in there so one way of handling all that there is you apply convolution convolution if you're not familiar with that it's a filter that's all it is in the case of images what a convolution would do is it would try to identify features in there be it a diagonal line a vertical line a particular curve there so it tries to break up your image into so in a convolution neural network what you might do at first thing you might do is apply a filter there to just reduce it down to features there and there's a way of combining all the different convolutions to reduce that say 20 million pixel picture down to something much smaller much more manageable once you have reduced it down enough times there you could apply that as an input to a neural network and the neural network would figure out oh this particular combination of features that's what it must be so that makes it a lot more manageable the problem with doing this is training it training it you need to figure out what filters need to generate and that takes a lot of computing power that's why you may have seen all these references oh we're training it we need to have this so-and-so GPU there or we need a server array that's why that's the case okay the neurons themselves there one important thing is you have all these neurons connecting to each other you may potentially chain them the problem with chaining them and you do a variation on them is they may potentially explode let's say you feed in something there and you wind up a value that's greater than one you multiply by something else there and do it too many times you get much much larger data well there's a concept of a mapping function there usually is a non-linear function there what it does is that it want it may potentially exaggerate some of the particular outputs there you may not want this neural network to be just slowly linearly changing things you might want to just suddenly connect to a particular thing there or you may want to limit the range let's say you want to limit the range from minus one to plus one so you notice less than one that will guarantee that you chain next one to it the whole network wouldn't explode and there's different functions that you could do that there there's a common one called sigmore which is essentially a hyperbolic tangent that guarantees that your output would be between minus one and one there another one that's common is what I call ReLU it's essentially a dial all it says is if your input to this thing is anything below zero below it's mapped to zero anything greater than one greater than zero it'll be mapped to exactly a value there and then anything greater than one itself will be mapped to one so it has that clamping effect and it prevents your whole network from exploding this is the picture that I keep alluding to this is a simple neural network there you have your three layers your input that's where your values that you provide your machine is available this image is from Wikipedia so you could take a picture of it but you could also go to Wikipedia it's from there you have your input layer that's basically all that does is this takes input there and it feeds in there and then you have a hidden layer which has all the different weights that does the multiplication and your output is your output prediction there all those different lines that connect to each one those circles there that's your connectivity there all those lines means that addition so let's say in this layer with these neurons there this one multiplies each one of these guys there by a particular value that's the weight it takes that and then that's the output of this one there so it goes to this one this one and depending on how that sum of output this one this one this one this one it will give a value to this same thing here this value will be composed of the output of this one this one this one and if it's done right there this may be say a high number this might be a low number whichever is your maximum would be your predicted output let's look at the other common kind of machine learning that's a support vector machine like I said earlier basically all you're doing is taking data you're dividing your data space into different regions there and that's pretty much the whole process there this whole thing runs very well on relatively low power CPU I've seen this running on something like a Cortex M0 if you're familiar with that that's pretty much the simplest of all the current modern arm my controller CPU is out there and here's a picture of it this is another picture I borrowed from Wikipedia it's probably not the most ideal picture but it illustrates the concept there you have your data space which is composed of two inputs x1 and x2 there and you have two different types of outputs there one's represented by the circle that's open, the other one's represented by a fully filled out circle there and these lines represent potential training, potential ways of breaking up space into two different kinds of outputs there you have h1 there which doesn't really I tend to separate out all the which doesn't fully separate black and the white dots here you have h2 which does separate the black and white dots here but it's not ideal you have noticed that it's much closer to the black compared to the white ones there whereas h3 tries to keep an equal distance between those two that way if you have some data that may say land right here right here this will be classified as white there but if you have data right here if you use this one there it will be essentially incorrectly classified as one of the members of this black dot so conceptually this is very simple now those are the basics of that let's look at some software packages out there the first link that I have up there is a list of a lot of different packages that's available there so if you really don't care for see stuff that I'm talking about there you want to go with some of the python options there they probably have some offerings there so it's worth far looking through see if there's something that's more comfortable for where you have the first group is what we would call the commercial offerings there they are usually backed by a company and sometimes they're actually used internally by the company there or you may be able to actually buy a service to use those things there I personally have some concerns about using those in that category just because it's backed by a company decides to somehow stop using it is that going to be completely abandoned what's going to happen to it there is it really too specific for what they're using does it really fit what I'm using so it's really up to what you think there the first one there TensorFlow from Google that's particularly notable is that Google has been making a lot of effort to try to get people to use that one there with kits that would demonstrate TensorFlow at Target they have a little I think it's a Raspberry Pi kit there with a camera and you could do some basic image recognition on there there's I think even an onboard accelerator for machine learning on there and it all runs this stuff with TensorFlow it's common you might have heard a lot about that there but that one pretty much forces you onto the python track there and also kind of ties you to that particular package the second group there is what is open source offerings as some of them may not necessarily be commercial friendly so if you're looking at doing something for a company you need to look at the license one particular note is the SDM light there the author has particularly mentioned that he really prefers it to be used for research and well I didn't actually read the license text there that's what his preference are I think it's best to just respect that it might be a freer but if that's what the author wants let's respect that from that list there there's two things there that's relatively simple there is darknet which is specific to images there but that was incredibly simple and easy to build there so if you want to look at images there that might be a simple package it doesn't require too many libraries there it can even make use of GPU for acceleration on there but just an interest of time and as I said I'm not going to go into images here and just talk but just leave it at that the other package there is fan that's the one that I'll be using for examples and demos here it's a relatively simple package there it's all written in C and some examples there and the code could fit on one slide that's how simple it is and usually people say yeah I could do the same thing with Python well Python calls all these other objects which calls other objects there and you have to use a system it just becomes a lot more convoluted there hardware acceleration well it won't it's not particularly applicable to what I'm going to be doing with a simple stuff there for learning it's good to keep in mind that these things do exist if you want to go on further look at more difficult things especially if you want to do something with vision there the first is the Mobidius compute stick from Intel there this is an accelerator it's relatively close you are doing something that matches what they have provided as examples there it may work that's pretty much all that is GPU that's a common thing for accelerating especially on that training side a GPU if you're not that familiar with it it's basically a super computer it does a lot of parallel operations very very quickly there and that's why it's useful for this machine learning stuff there for using a GPU there's three basic approaches there if you're using NVIDIA hardware either on embedded stuff or even on desktop there you can use the CUDA NVIDIA does have an arm chip with GPU on there that I believe supports CUDA or if you're doing something else there or that's not NVIDIA's GPU there there's OpenCL that I believe that's also available even on some of the ARM devices or worst case you could always fall back to OpenGL OpenGL ES well that's specifically designed for graphical things there you could still use it for computation you just need to look at things a little bit differently in the case of OpenCL CUDA there things are pretty simple you give it data and it does this thing well in the case of using OpenGL or GL ES you need to tell it oh I have a texture a texture is a 2D array but your texture could really be your data itself there and they have something they call shaders shaders are programmable it's short snippets of code that looks a lot like C it runs on I think every single one of those pixels there that you provided there and output there is texture so if you map your problem to something like that there you could potentially use the GPU for acceleration and of course there's FPGA you could always use hardware for acceleration for this kind of thing or even some of the newer chips there like for example if you have seen some of the BeagleBorne stuff there there's a newly announced board called TI it uses onboard video accelerator we purpose to do machine learning TI has a software kit called TIDL and you could do a lot of the vision stuff using this stat okay let's look at data as I mentioned earlier I'll be focusing mainly on static data what that means is the data there doesn't really have a time element you're not looking at a time series so you're not looking at example audio having sent that there that doesn't necessarily preclude you from doing some processing involving that you may have to phrase your problem slightly differently but to keep things simple let's just break it up into different categories because you're also looking at different algorithms for handling things there like for example if you're looking at microphone, accelerator accelerometer output there you're probably going to be using other algorithm like LSTM long term short term memory where you may feed the output of the neural network back into itself as an input so that you have some sense of history there and depending on how you do the feedback you may have a long history or a short history there and there's even other variants there which may involve the history there and then images this is really a special case of spatial data there for if you want to treat things as the image that means you could do certain things like for example you could do convolution on it, you could do filtering that doesn't necessarily apply if you just have a 2D array of data there because convolution may not necessarily apply there when you do convolution you're making assumptions that there's a pixel especially to next to you or below you and so forth whereas if you just have 2D data there that doesn't necessarily imply the case there so that's one important thing to keep in mind and by filtering there I'm really referring to convolution if you're not familiar with what convolution does I mentioned earlier all the data that a computer deals with they're numbers that's all they are to apply any meaning there it's really up to you as a programmer designer if a human is what applies there but with the new numbers there there's a few things that you need to keep in mind there we're dealing usually with small numbers here which range from 0 to 1 such as you multiply things unless you want things to explode there you're dealing with small constants there represented by a computer that could have several issues one is 0 multiplied by anything that's going to be 0 so if you somehow get your number you multiply by a small thing and you're stuck near 0 you may get rounded down or truncated at 0 which case that may not necessarily be very useful there so now describes the issue of precision you may have potentially double precision that could potentially help you get smaller numbers there but the problem with doing things like double precision or even simple version point is you need floating point hardware to do a lot of multiplication that implies performance issues there so one way that for embedded devices people work around that is to use a concept of a fixed point all that is is instead of using let's say a number 8 bit number normally 8 bit would go from 0 to 255 you scale it so instead of going from 0 to 55 you divide that by a fixed number let's say you divide it by a factor of 2 so now all of a sudden 0 to 128 each number has granularity of 0.5 by adjusting that you could get different level of granularity by doing this by just doing this simple scaling you could use normal integer multiplication there and just remember to scale later so that removes the requirement to have a lot of processing power it does not require you to have a floating point unit another thing to keep in mind is the computer is just a number you may be doing a physical experiment you might be measuring an accelerometer accelerometer to measure in G's which may go from 0 to some hopefully small number or it may measure in meters per second square which may be somewhere like 0 to some much larger number the ranges there may not be convenient for computers if you're looking at simple binary numbers scaling it you may go from say 0 to 128 but you're measuring G's that you're only using a small portion of it so even without dealing with a six point issues there it may be useful to just do some scaling so a unit doesn't really matter when you're feeding things into a computer there and then the way you map all these features and numbers there that could also impact your training time if you have really large numbers there you wind up hitting the activation function and then things get clamped there it will have a much harder time finding out appropriate weights for that there the range that you provide there could make a big difference in the overall model you could have things that work but it could take forever to train which is not necessarily very useful and similarly for the mapping of numbers there you have the input obviously you have some number sensors there a lot of times a fixed thing is not going to be a number you're going to have to map that number or translate that number into something that's appropriate for a problem if you're doing a simple identification it may be translating that into a name another thing that's important to know is that with machine learning generally your inputs there you're dealing with an array of numbers you're not dealing with a single number so let's say you have 18 inputs like what I'll be using for my for my spectrometer there that's an array there so a lot of times there you're dealing with vectors and you're doing vector multiplication there if you have any kind of hardware or anything that makes speed it up that will help it a lot there just as a side note there the name TensorFlow the package from Google what tensors are basically arrays of numbers there that's kind of where it came from processing this is a method of where you may somehow massage your data a little bit that may help things there like I said earlier there I'm going to focus mainly on static data there but you kind of still have static data at the same time have some kind of time dependent input let's say for example you process a microphone input by looking at derivatives of the signal there integrals a signal there's different processing that you could do you may potentially simplify your training if you just eliminate all the noise you want to do some focus on particular things there you could apply a filter ahead of the input data there or be looking at where things are you're doing some kind of measurement identification there for training purposes it actually matters depending on your data if it's a polar input where you have like say a degree and then distance from a center point there your origin versus a certain more common X and Y rectangular thing there it comes down to what does your data which one produces a more obvious pattern there so that's something that it's not always obvious that you will just know offhand everything that I have read so far such as that it comes down to experimentation sometimes some things will come off faster sometimes it doesn't there similarly with the space converge with I would call it space conversion but that's a poor choice of words just because I'll be confusing with that domain what kind of domain conversion is that you could be looking at your data in time space or you could be looking at in say frequency space or there's all sorts of other transforms there the most common one would be like a Fourier transform you could also potentially apply like a Poisson transform depending on your type of data you could look at different spaces there and that could help with training there and it could also help with the potential identification process later on another thing is if you have a lot of data there let's say you have some really high sample rate coming in there you could play games like average off the inputs there combine multiple samples together there what that's really doing a lot of time is you're basically when you average samples you're applying a low pass filter low pass filter that's how reduce the amount of noise in your data there you could also do things like just simply instead of combining discrete samples let's say you have 10 samples coming in instead of averaging every 2 you could have a rolling average there so whereas every 3 you do a every 3 you do an average and then you drop the first one and add another one there so you have a rolling average there's different games that you play there and that could potentially help you improve your data itself there if there's even your pattern that involves figuring out a maximum minimum there it may be useful to feed more than just your actual sample data in there you may want to feed the first derivative of the data second derivative of the data there it really comes down to your problem because machine learning is not magic if it helps you as a person to actually figure out what's going on there that particular data may be helpful to find out for machine learning if machine learning all students looking for patterns if you have some way of processing data so that the pattern is more obvious and that could reduce your computational load both on the training side and potentially on the prediction side the first point is very important machine learning it's not magic if you feed a garbage your training data is no good there or you just don't give it enough data there you're just going to get garbage out it's not going to be able to predict things from nothing one way I'm looking at this is if you have a problem an expert let's say that's another country if you give it the same data that you would give the machine over a phone line and that expert cannot give you a reasonable answer or a meaningful answer based on where you have that you probably don't have enough data that's probably the simplest way to look at this another thing that may not be obvious that I ran across in many references your training data there's apparently a magic sweet point there if you have too much training data the risk there you have is that the patterns that machine will find will match things so precisely it will only predict things in your training data set there on your hand you don't give it enough training data there the machine doesn't know what it's going to be looking at there so there's kind of a balancing point that you need to do there the other problem with having too much data is it could take too long to train now comes the more interesting part of machine learning or off the training part I would consider the interesting part there it all comes down to a very brute force process most of these algorithms and there are several different algorithms for trying to train a machine there when you train a machine especially with neural networks there all you're doing is you're trying to find weights for each one of those neurons there depending on how many neurons you have it could be a lot different weights that you need to determine there the process is essentially you start out with some random numbers you put some random coefficients into your neural networks there you run your training data through that thing there it could be just one sample depending on your particular algorithm you're using or you send all your data through there and you compare it with what it expects that's your labels as I mentioned earlier compare those things there you compute an error based on that and by using the error you could feedback and adjust those weights there and you iterate you keep iterating until your error goes below a certain particular threshold and then once you have that you're done so this is purely a brute force thing and it could take a long long time with SVM it's very similar to this thing there but it's a little bit closer to math but I'm not going to say that much more about SVM just because it gets used but it's also not the more common one that gets used because of this brute force approach there it sometimes helps to have like a giant server farm so some of the more complicated machine learning algorithms you could potentially paralyze it either over multiple CPUs you could paralyze it over different individual bits of GPU there so much earlier GPU what it does is it takes pretty much all the pixels that it has there and it runs on parallel so it's a giant parallel computer and it does a lot of multiplications there in graphics programming you may have to adjust shading based on particular coefficients that's pretty much what we're doing in machine learning so it's very similar and it runs in parallel and it's available at mostly a reasonable cost the other half of machine learning is the prediction flash inference phase there depending on what you're trying to do if you're trying to say look at a pattern in the stock market and then what's going to happen next day that we refer to as a prediction there if you're trying to look at samples there and match it up to what you had prior before we trained a machine with that would be called inference but it's basically the same thing it's the output of your machine learning there this part unlike the machine learning training side of it doesn't require as much CPU it could often be run on some simple microcontrollers there like I said you could set your run on something as small as a Cortex M0 and M4 is not unreasonable if it's not overly complicated there after all you're doing is doing multiplication additions there and the thing I covered that I kind of went out of order so I covered most of that and the output of your prediction thing generally when you have an output of prediction things it will give you an array of numbers there sometimes you might have seen some examples out there especially with the cameras they say oh this is 60% prediction that this may be a person 25% that it may be a car there even though it may obviously not be a car there these numbers are outputs of different neurons there and generally if you want to identify what something is there is when you get the output there where you have each value is probability is what it's referred to but it's not exactly a probability but I'll just use that term anyway since it seems to get confused with that anyways each value will be a probability that is a match for that particular thing there and what you want to do is you want to go to the whole array find the highest value the most probable item and then just call it that's my prediction there so that's generally your output now time to think back to embedded things look at time, okay hopefully I might have to drop off some of the bits there but feel free to interrupt any questions of it if I'm going a bit too fast or raise your hand to say something there for embedded devices that's mainly my focus there you need to think about are you going to be using floating point numbers on there a lot of embedded devices especially if you're not running Linux on there even though this is a Linux conference may not be able to handle floating point numbers that quickly you may be emulating in software that might not be ideal even with Linux there if you don't have a hardware FPU it could be payfully slow or you have it maybe your FPU only does single point position like I think on neon only does single point there if you want to have double precision you need to make sure you're using a cord that actually has the VFP unit there so you need to make sure your data really fits what you have in your processor otherwise really weird things in the performance that could happen there software licenses a lot of times when you're building an embedded device there you're probably not expecting it to be kind of ripped up and redone there even though you may be building an open source thing it might be useful things for you know you're still combining potentially several different pieces of software it's very important to know what are the licenses and if you have a library to reference another library with reference to another library they may be compatible if you're using that on a desktop but if you're using it somewhere else there that might impose additional restrictions on your device that's part of the reason why I was really trying to avoid looking at Python just because Python fits in far whatever looks easy but unless you know exactly what's pulling in there that could get you into a lot of trouble later on okay yeah I'm running short on time here okay now let's look at the thing that I was trying to build like I was just a recap earlier there I'm trying to build a device here just maybe for learning it's not really a product at all I'm trying to build a device that uses a spectrometer and it looks at a sample and it tells me is it real sugar is it fake sugar like aspartame equal any of those other things there and for hardware I'm going to use a BeagleBone just because it's a nice length computer there most of what I do at desktop is just kind of a drop in there and it interface easily with my development board for the spectrometer that I'm using there and I'm going to use desktop cpu for just training there I've tried using training on the BeagleBone stuff there the same code it works it just I stopped it after about two hours it just took too long there whereas on a laptop actually this particular laptop that I'm using for presentation I think it's an Intel i5 there it takes maybe about a minute or two to do the same training there and software package I'll be using to do this is span it's reasonably mature it's been around for a while and it's simple to build essentially you go in there it's a CMake and then just make yourself it doesn't really have any major library dependencies other than I think Algebraic library which is commonly included and then they also have versions of it which would use a fixed point map so it would be helpful if you have to do that on an embedded device and a fan is written to see if you want to still use fan there are bindings available for other languages like for example on line of found and tutorial for using fan with PHP if you like that I think there's also bindings for Python if you also like that there so you don't really have to use C it just C is easy to understand what the pieces are you don't have these hidden pieces coming in okay just to show you how simple it is for fan training this is the example that you find website there essentially what they're trying to do is in this whole example they're trying to emulate the export function you have two inputs there it's your basic gate and you have one output there and this is the code use fan for training that the first half up to about here is all decorations there the first one there just kind of gives nice names to how you want to define your network there you have a basic two inputs that represents your two inputs for the XOR you have one output that's your output for your XOR and then total number of layers you have your input layer the hidden inside layer and then the output layer that's your three and then where if I have three neurons there that's somewhat of an arbitrary number that seems to work well and for training we need to find out how accurate do we want to be this is the one I was telling you earlier where you just kind of iterate over and over again until you find that the error is acceptable in this case we set the threshold 2.01 and what's the worst case before it gives up there it will try this up to 500,000 times and every 1,000 times it'll just give me some statistical output on where it is where's the error how big it is roughly it gives me an idea that am I getting somewhere if it doesn't look like I'm getting somewhere maybe I'm doing something wrong so it's a useful thing but trade off there is that you have too much output it may want to waste a lot of time just doing output there so you probably don't want to set this to like one and to create the neural network there you pretty much just call this thing there with those values and you know all they're doing is just using nice names for it so you want to get rid of those lines put in numbers there and we want to configure the neural network to use activation number in this case we're using sigmoid for yeah we're using sigmoid for both the hidden layer and also the output layer and then we're training it on a text file there you could also give it individual samples but in this case just to making simple there we're giving it a file to read it'll run through this thing there when it's done it'll save that to output so that we'll use it later and the rest will just clean up there so incredibly simple and this is all written in C now let's look at the training data there first line just tells the fan how many samples I have and what the samples look like there the first value is a number of different samples there we're doing X4 so there's only four possible combinations each one of the samples has two inputs and each one of the samples has one output that's the one and then minus one in this case means one is true so if you have two falses you also get a false and so forth there so this is your training data there for what I'm doing I'll show you that later it's very similar it's a very simple text file and then after you train the thing you want to be able to use it this is pretty simple all you're doing is you're creating the neural network there based on the weights that you found earlier from the training there it loads that in there and then we're going to run it through we're going to set up two inputs there in this case minus one and one we run it through the network there and then we print the output and all we're doing is repeating that in this case we're just looking at raw outputs there which case it would be a number in the case of a more slightly more interesting application like what I'll be doing I'll probably be taking that number and then having a rate that maps that to a name but this is pretty simple there okay this is pretty much a black diagram of my little demo hardware that I put together there there's a 18 channel spectrometer there that's the one that measures the light it also has a little red light to reflect it off of there it all feeds into the Beagle bone and just to make things simple since the Beagle bone doesn't have a display I'm going to send it over a network and I'm going to use the standard protocol use MQTT there so I can just use like a mosquito and just dump that on the screen it makes it easier to debug and if I want to it's easier to slap a nice UI on there okay let's look at training let's actually go to a console in this case I'm doing the training on a laptop that's x861 I've tried it train before I do that I'll just show you this is pretty much the example I have earlier the biggest difference is going to be in this line there I find that it's a little bit faster if I configure to use a slightly different strategy for finding the coefficients there in the neural network there other than that this is pretty much the examples there I bumped up the number epochs there just in case I need to go through more I reduced the errors a little bit and this is pretty much the example so all it's going down is that it started with a random number there and from there it's trying to find an optimal strategy for getting to something that matches the output there the bit failure that's a number of outputs that is completely wrong from what it was expected to be the current error that's the total error for all the outputs there and it's iterating over each time it tries different combinations based on the previous pass there and one other change from the previous example I think it outputs every 1000 epochs right now I'm outputting about every 2000 epochs it doesn't take that much longer it should converge pretty soon but even for something simple like this this is done I'll show you the data set this is the data there's 18 samples in here each one of the samples has 18 readings there they represent the 18 channels of my spectrometer there and there's I have a 25 number vector output that basically tells me what kind of artificial sugar is it a Splenda is it equal and so forth there and the last thing four bit it represents is it sugar or not there's also something that I haven't actually defined so it is mapped to a false but this is my training set there's 18 entries in here so it's not overly large and yet it takes a notice amount of time and it takes much longer if you were to run it on the target itself and let's see if I can get back to this slide Libre Office wasn't behaving nicely that's the training and then running the process what I'm going to be doing is I'll be collecting data there I'll do some pre-processing which basically means I'm just scaling the data from the range that the spectrometer provides into I think a range between zero and one then feeding to a neural network and then I'm mapping the output to a name there that code I'll show you that code later let's try running this first okay so what I've done is that for I've built up a little board which adds a switch basically it's a GPIO thing that triggers the whole process and they log in there manually through that and right now I have the spectrometer sitting on a sample of a Splenda and that's the output for this particular thing since it's a demo there I purposely made it more verbose the outputs here that's actually the output for the spectrometer and this is the output for the neural network there it found that the maximum for a specific identification is 0.76 it's roughly about 76% I believe this is that aspartame which is I think what Splenda is and then is a sugar substitute for identifying all these different sugar substitutes one thing I found is if you do some research on these guys they're a very small amount of particular sugar substitute and a lot of it is filler apparently that stuff is very potent it's like 1 or 2% of it is aspartame the rest is some kind of a filler I think dextrose it seems to be a common one and that's common with pretty much all the other ones so sometimes the neural network gets confused it would misidentify aspartame but I find that it's reasonably accurate on the sugar side so let's try this again with actual sugar okay it's misbehaving right now got distributed sugar around so that the spectrometer doesn't see the tablecloth it was working earlier guys here in case you guys have flights there my session is almost over okay let's go back to the slides where did it go oh yeah another thing keep in mind if you're doing machine learning you might want to consider how are you using data where are you getting a training set from there is I don't have an answer for this and it may be something more for a lawyer to tell you it's a data let's say you find on the internet is that considered to be fair use of it or do you really need a license for it that's something to think about because Google has access to a lot of things there and are they using your data that you provided there they may have found a search engine who knows but that's something that you should keep in mind another thing is open source licensing there your training your load network in my case that little .net file there is that pretty part of the same copyright as part license that's your regular code there because a lot of times the licenses don't really apply to data after I am loading this as data there so should my potentially GPO code apply to that particular network there or is it separate that's something again think about it may be again a lawyer thing and also with machine learning things are a little bit imprecise so how do you QA these things there again it's another question to think about because normally if you do work traditional way you have thresholds there you could potentially iterate all the thresholds there and see it look at the output but since each time these things you train it you can get different weights there how do you know this is correct do you just sample a few of them and hope that it works something to think about so yeah in this time I am going to skip ahead if you want to look more into that I had plans log into the view point to show you the code there but I don't want to keep you guys here too late any questions yes yep you could potentially do that there but the problem with doing it on natural embedded devices training takes so much longer time there you have the processing power to do that there one way of potentially you could do that is do this in batches there let's say you have a connected device you could save the data there and then if it's wrong or someone tells you it's wrong there save what's wrong add it to your data set and then train it and periodically provide updates to your network there and that will improve your data but all this really comes down to how good is your training data if your training data is garbage your output is going to be garbage sorry hyper parameter training let's go to where is that right which parameter I personally manage it largely by experimentation one thing to keep in mind is like for example max epoch that is actually an upper bound on things there if it keeps failing yellow that then I start investigating if there is something wrong with the model or is it really that complicated to do that thing or maybe I don't have enough neurons there that's when it's more a flag to me to go okay there's something I need to tune there or it may just simply be it's a difficult problem so it's more of a trout error there really isn't a good thing and I did some research on this thing that most of us do say is largely trout error yes yes actually the way that everything you have come across and including that includes talking to people about this the way you manage that is actually the way you partition your data there that's why earlier I say generate partition about like 50% of your data if you randomly as your training set and the effort set if you do like a 90-10 split there it may be very specific to your training data and like what you point out earlier there I might be very specific to what I have there so there's definitely that problem there that's something to definitely watch out for it doesn't come over camera I'm actually not using a camera I'm using a development board what it has is that it has 18 channels of sensors in there it's done by filters and photo dials it connects over a virtual serial port which interfaces over the USB port that's separate this is a development board by AMS who makes that sensor I think there's another question okay there's two ways to go about that one way is that you can use a different function in fan there where you just feed a sample of the time there that's one way of training it that could potentially be very painful the other way is that you have to use that specific format and you just write the script to do that there like for example the output from my sensor it actually is a whole bunch of different format data there and I have if you want to I'll show you the script later I have a simple auth script that reads that text file converts to that format and outputs that there because in my data there I actually annotate it so I have more information in there than what fan needs but that tells me what I need to do it also has the human readable name so I can always go back and check things any other questions nothing specific well there is some of the stuff that I think Google is doing there is very can be done on FPGAs there a lot of deep learning stuff there any other questions thank you for coming hope this was somewhat useful you guys oh yeah this is a development board yeah it's a development board that's all that is I think I may have at the wrong angle where we're seeing through the wrapper yeah it's not the best spectrometer it's 18 channels but it was cheap oh this guy this is a development board by AMS they made sensors there it normally interface your PC over this but they use the AT command set so as an FPG board it's pretty much a PC plug it in the one thing I really meant is this thing to give me a switch and you mount it on the pvc yeah pvc because I don't want to get ampianized if you're looking there that thing is flat so if I hold it too close it won't get a proper view there this thing just nice cheap thing yeah that's all this and it keeps all of it this board I think was like maybe about 90 bucks from digikey there's a project there to build like a 25 dollar spectrometer we use the exact same chip set except that one that you got to cut some traces to use it in the UART mode which is what I'm using there by default comes from I2C which is a little bit more work to get it working here this guy is a development board for about 90 bucks there yeah there's a project that uses the same chips there I think like spark fund they have a version for 50 bucks but that's wired up for I2C instead of UART the chip supports both and you can cut some traces and convert it there's no reason why you can't you could potentially do that there the only problem there is now you can have a rig that gives you like probably I was a cuvette that's what the labs use it's almost like glass container with a thing that passes out like that you can put the sample in there it's just more complicated yeah yeah that's what I look at YouTube where AMS actually has a video where they have a nice little app there written that will identify three different substances that are all white but I'm just trying to replicate what they have there they wouldn't I don't think they provide the sources or any information about that particular demo probably search for the company AMS they make sensors there and the AMS spectrometer that might be good and then see if it's from the manufacturer I think it's a demo video from them yeah and then if there's some baseline that you have in your computer to characterize what sugar is supposed to look like it doesn't need that way to do is that I take measurements from different samples there and I label that sugar then I run into a training so it tries to look for it does it match that pattern so that kind of becomes a baseline then a given further sample is referred back yeah so that's the difference because traditionally you would have a baseline there how close is that maybe how far can you be off of that and still call it sugar that would probably be more traditional way with machine learning you give it a whole bunch of data there it would try to match out the figure of pattern and then when you do the prediction phase it would say which is the closest to that pattern if so that must be it you know I'm looking at this very recently that's grouping the points and they upper left these are black and these are white but like with chemical sugar like there could be a whole multiple number of different characters in all of the investigation how do you handle that? kind of multidimensional mass exactly they just have a line that divides it they have a concept called hyper plane and it gets a lot heavier when you get to multiple dimensions there then what was that fan thing that you already know how to do it there are different concepts I think you're mixing up there there is neural networks and then there's something called support vector machine that's the SVM SVM does what you just talked about with the line that divides up things there with neural networks there you don't really have that concept but that line say all you're doing is you're building up different weights for the neurons there and then they get summed up and then looking at which is the largest number that's the output so it's two different approaches to that fan does neural networks what are you looking for? I'm looking for particles in the water oh because you're basically measuring turbidity that's what for that it could be something as simple as something like a light and you measure the light output data once you've got particles and they're probably scattered and they'll impact the amount of light connectivity will imply that you're looking for conductive ions so if you have something complete and they're like very small parts of clay that might not be conductive enough for the matter I'm not sure what the threshold is going to be but you could potentially have dissolved things that are non-conductive there which would but that might be like a special corner case thank you now I'm not to create a mess any of the particles that you're showing are go about trying this on our own at home? I eventually plan to actually put most of this stuff online on my website it's more of a time-constraint thing but if you want to try a fan there there is obviously a simple example you go to the fan website there that's the most common one you could also find the same example written in PHP so that would be a very simple thing and once you get that concept things kind of just fall in place what how you wire up your sensors there that's a different topic well yeah but that would be useful for me too so you had a website? yes it's an actual research how can we find your website? there's a chicken egg problem it's on the bottom corner of my slides I guess I'll put it back up I was going to show plots of the individual sample readings but I just ran out of time actually I thought this was supposed to be from 430 to 530 that was an hour I suppose it's like an hour I thought you had extra materials that you started early on the schedule I think the time for your time came by at 4 I just counted half an hour I thought I was a little bit later in that but yeah that's the other way of going about it I have another way out of 60 instances well I'm thinking really of the popular voting in the electoral college vote and they are related 60 elections and you have popular voting in the electoral college vote and you train that and then you apply to the most recent election where there was a 3 million vote difference and just to see how I would probably do that I would think it would be easier to just simply do a statistical analysis on that you could look for a pattern there basically you're computing correlation between those two but you could also do statistical analysis you just talk about two different types of data coming in