 So, a quick brief introduction from myself. My name is Chinmay. As Harish said, I am a CTO at Sabero. I also organize Hackware, which is a monthly electronics hardware meetup in Singapore. We just had one two days ago at Hackerspace. Next one, next month, if you're interested. I love everything that's sort of at the intersection of hardware and software. I've been doing this for, I've been working in this sort of intersection for 15 years and you can find me here on socials. I want to talk about our journey at Sabero to sort of look at using Julia language as, or on an embedded system. It's a very strange, it's a very strange relationship between Julia and embedded software. Usually embedded software is written in C or some kind of global language. But I'll go through the journey and tell you how we decided to go this way and why and how it worked out. So a quick brief overview and context of what we do at Sabero. Sabero is a local grown-up startup. We make underwater wireless devices. So we make physical devices and we make, we write all the software that runs on them. And these are what we call software defined underwater models. So these are basically a communication system where all the communication bits are done in software instead of being done on the hardware. So instead of what we talked about earlier with the previous startups, FPGAs and ASICs and all of those things, here we try to move all of the processing to software because it's much easier and you can update and configure and change things much faster than being all of these in the hardware. But that's a different topic. We run a lot of this stuff on physical devices, Jetsons, NVIDIA Jetsons. And we use embedded Linux as the main operating system for these things. But we have to do all the single processing and the numerical computation required for the communication side of things on the devices in software. So we created a framework called Unetstack, which is a network stack that sort of your standard network heating is. But that's designed for underwater and does a lot of special things for underwater. So the entire journey was basically talking about how we should implement Unetstack. It has a lot of special requirements and that's why sort of how and which language would be implemented in was something that we have to spend some time thinking. So let's go and look at one of the requirements. It has to be software defined, which means it needs to do all the processing in software. It needs to do a lot of numerical computation, so we have to do error correction coding, a lot of MAC, basically. It needs to do single processing, so we need to do, take all the data that's coming in, process it and then process it further and deal with it. And we need to be able to work with hardware because the data is all coming in from physical hardware. And lastly, we need it to be configurable in field. So what does that give us from the language requirements as to how we implement this stuff? So the software defined means it needs to be something that runs on some kind of Linux platform, not necessarily, but it just makes life easier for us in an ethio. The numerical computing means we need something that's high performance and also high level in a way because writing very performant code, very low level just gets very hard to make. I'll talk about that in a bit. And then we'd also like to have GPU access. Hardware we run on, Jetsons have GPU, so we can access the GPU easily. That's really good. We would really like that. The signal processing means that we need something that is low latency. It's something that can work really fast and be able to deal with low latency, scheduling and threading and all those kind of things. Hardware integration means whatever language we choose needs to be able to do low level stuff. We need to be able to talk to GPIOs, talk to hardware buses and all of those things. And then build configuration means it needs something that is scriptable and high level and users should be able to use it and maybe write some little scripts a little bit like what we saw with people, what kind of thing. So we have this really weird set of requirements, low level and high level and also scriptable but also low latency and that was hard to solve. And this is what in a lot of real-world scenarios people call the two-language problem. The easiest solution for this, which is what initially we went with, is a two-language solution. So we have two languages, one does all the high level stuff, one does all the low level stuff and you sort of match them together somehow and that's your solution. So that's what we went with. So it was about 10 years ago and the idea was to use Java for the high level part and see for the low level part. So Java does all the high level stuff, performance. People would think Java on an embedded platform is strange but I guess Android has shown that Java is actually quite performant. JVM can run on very little memory and is quite performant on an embedded system. It's quite scriptable. There's a JVM language called Groovy that we use which is a sort of a scriptable version of Java. And then for anything that was low latency and sort of needed like real low level access, GPU access. This worked for us for a while but this entire sort of two-language has a problem and that's basically the bit where they talk to each other, which is in this case JNI, is where it gets really painful because every time you change one side of the machine, the other side and then keeping everything in sync and working just gets hard. So JNI was rigid, it got painful and basically we ended up not changing the C side as much, which means we didn't add more features to the C side of things, which means a lot of the product features we wanted to ship, we could ship or just were getting detailed. So seeing this for a while, we were thinking we need to have a better solution for this and I've been following Julian Lang for a while and by the time we decided to look into this, it was mature enough. So we decided to look at using Julian Lang for some of this. So who here has heard of Julian Lang? Cool, that's sweet, lots of people do. So it's a high level high performance dynamic language, Greater MIT version one about 2018, which is about the time we started looking at it, so it was stable enough and it's open source. But it's really strange because Julian Lang is normally used in sort of the machine learning space. So why unembedded platform? But it's got a lot of really interesting characteristics for unembedded use cases. It's super fast. It uses LLVM at the back for difficult compilation. It provides a lot of features in sort of IO versus control, logging, profiling. It's dynamic. It's got some nice scriptability to it. You can write BSLs in it very well. It has a great community on the numerical computing side of things and it also has some really nice language features to make composition of things very easy. But that's what the website for Julian says. For me, I think the interesting thing was the community was really great. I think Julian has a really nice community. It's got a lot of open source packages that are available, especially in the numerical computing single processing kind of things site. So we could use and leverage a lot of that. It's very low level in a sense that you can actually very easily interact with OS and hardware. I'll show some of the examples of this later, but it's quite, it makes life very easy when you're trying to do low levels of unlike Java. It's got a great support for GPU integration, even on unembedded use, which was quite surprising to me. So you could, and I was, example for this coming up as well, super fun. And I think this was really what personally took the cake, which was the community really cares about memory and speed and memory. So the community always tries to run benchmarks for performance and memory use continuously. So every package, every, you know, every core feature, everything tends to have discussions about, hey, Anna, if you do it this way, are you gonna do more allocations? Are you gonna do more? Are you gonna use more of a style? And that entire mindset is really useful in unembedded role because you don't have much memory. You need to do things fast. So having that mindset consistently to run and dive them in, I think was really what sealed the deal for me. But yeah, it's not really easy to learn. It has a pretty steep learning curve. There is a couple of very specific new ways of thinking that you need to pick up. And once you have those, then it's straightforward. But it took me and then the team quite a bit of time to sort of ramp up on this. But once we got going with it. So what worked? I'll go through a few examples of things that were that worked for us. So the speed of development. So this is great for high performance code. It's very test code, but I can really go through what you're doing. This is where we're reading something from an ADC into a buffer, and then sort of reinterpreting it and then sending it up to a call back to a high level function that deals with the data. But in the middle of the reinterpreting, we need to do funny things like take a 32-bit integer, shift it down by 8 bits, and then take a 24 bits of that and then convert it to a 32-bit sign integer. So all of these things can be done in a nice and terse signal liner. But the best part is while this looks very high level, this compiles on to something that's very, very low level. There's actually a construct that allows you to look at what low level code it generates. Unfortunately, I didn't have it on the processor, I could otherwise have just shown you the assembly generates. But it's very short. So it generates very, very short, fast code, even from something that looks very high level and complex like this. Things like the dot equal to operators do in place operations, so you don't even do allocations. So this entire code doesn't do any allocations, memory allocations, which means you could run this in a loop forever and it's not going to do any allocs, which means I'm going to do a GCs, which means you're probably going to be able to get pretty good speeds for these things. So stuff like this is what I think makes Julia great for high performance code. Low level control. So this is something that I struggled a lot with Java, which was how do you do IOCDL calls because when you're talking to hardware, you got to, you know, you got to do IOCDL calls for random stuff, and then you got to go through J and I, and it's painful. With Julia, they have a C call function in their base library, which basically allows you to link it to any static C library or even to OS calls like an IOCDL call. And then all you need to do is write some data structures to what kind of data you want to pass through down into IOCDL and then just call this. This is basically your IOC right in Julia. So very easy to write, very easy to reason with, very easy to maintain even low level codes. And then I think that the great thing about Julia is there's a lot because of the ML relationship and what people use Julia for. The GPU integrations are very mature, especially on the NVDS side. So this is for example, a simple function that does a vector multiplication. So it just multiplies x and y and puts the output in out. And this code is just pure software code. It will run in Julia as is. So if I call it like this, it will just with this vector multiplication is 100% software, no GPUs involved. And all I have to do is add this at the beginning and the whole thing is going to run on a GPU. So there's lots of magic involved to get this working. But for maintaining code that you want to be able to say, you know, in certain scenarios, use a GPU, sometimes I don't use a GPU, this gets really, really comfortable and convenient. So we really enjoyed that. But not everything worked fine. And there's something that didn't work, threading and chatteling. So Julia at least the earlier versions didn't have much granular control over thread and chatteling. And this came in contrast with our requirement following it. So we need to be able to deal with that. We didn't run that thread that I was showing you earlier with reads data out of the ABC, all the time. And that can't block. So nothing else can block that. And getting that work in Julia was a struggle. So Julia uses green threads, because green threads are great for computation, but they're not great for you. So that was a struggle. That's depth first chatteling, which also means it's not great for real time. So we struggled a lot with that in the older ones in Julia, in Julia 1.9, which is coming up soon, they have some. So this is the problem that the community was. So they have been trying to fix it in various ways. The first solution they have for 1.9 is basically allowing interactive threads. We did come up with solution for this, using some low level stuff that we had to sort of invoke, but it wasn't pretty, but it worked. But now with 1.9, we don't have to do that. The other problem is doesn't really have a standard approach to sort of bundling and deploying applications. It's designed to be used more like a REPL. That's the main use cases, and a lot of there is no sort of a bundling story for it. Of course, I think a bit of our thing counts out. But in a production device, this is something that's very critical. And again, the community who was about this, there was a keynote at Julia 1 last year, which basically talked about this and how this is a big problem, and the community to solve it. Hopefully, we'll have a solution for that soon. And the third one, which also bid us for a while, which is that Julia takes a long time to warm up. So Julia is a JIT compiler. So the functions are compiled by LLVM into native code the first time they're called, which means a new launch, your application is going to take a while and compile everything, cache it, and then run everything fast. So if you will get it speed, but it takes some time to warm up. It's a little bit like JVM, if you have used JVM, it needs some time to warm up. Yeah, so large applications can take up to minutes to start up, which is very painful. And in an embedded use case where you want to reboot a device and it needs to start working, that is very difficult. There are some solutions around it as a concept of a system image, which basically just caches that compiled set of functions. It works in some success. We use it initially. And then with Julia 1.9, which is coming up soon, they will cache this code to disk, which means as long as you don't change code, you should get fast code up. So again, community knows about this. People are looking on it. So hopefully we should see some fixes for this. So while talking about this talk, I was thinking, you know, we did this, we went to Julia. So what else could I put? People love Rust. Everybody's like, why don't you use Rust? I think. I mean, personally, I found that there wasn't much community around single processing. We were competing inside of things in Rust. As compared to Julia. So we didn't have much leverage on, we would have read on all of those things ourselves. So there was one of the reasons we tried to look at Go as well, but it's even worse in Go. But although I do like those multi-threading side of things, there's a lot more there. We could have done a lot more there. But yeah, the library and the community is very different in Rust and Go. But yeah, there's anything else that you guys think we should have used? I would love to know about these things. These are stuff that I generally like to do and play around to ground all the time. So maybe now. So 100% of the low-level code is in New Vienna. So most of our devices, shipping devices run Julia for all the low-level stuff. Most of the high level is still in German Ruby, the migration is slow. As in when we write new code, it's written in Julia. A lot of the tooling is moving to Julia as well. But the embedded stuff, like the really low-level firmware for microcontrollers is still in seed. I hope we can move this to Rust at some point, but it needs a little bit of time. So as a summary, what's the takeaway? What I really wanted to share with everyone was Julia is not just for logical reasons. Julia can work very well on small embedded systems, small and that's more Raspberry Pi stuff. It has some treating problems. It's new. It's only a 10-year-old language that's new in programming language systems. But I think it has a lot of problems. And I think if you're looking for a programming language that meets some of the requirements that we had, give Julia a chance. Try it out. It can do some really fun stuff. It's really powerful and they can make it great. That's all in a quick shout-out. You're hiring at Subnero. So if you like this kind of stuff, come talk to me and add some stickers for both Julia and Subnero at the back. You can talk to me after. I'm going to be here the whole day. And I'll be happy to talk about any of these things. Thank you. All right. Thank you, shake me.