 Hello everyone. We are live. Hopefully the cat was on my lap and now he's getting off. If you're watching this after the fact, the stream's pretty chill so check the notes stock down below for time codes to all the different stuff that we end up talking about. Definitely live. Thanks Keith. We'll get going here and have a good time. I'm gonna pause my other video here. It insists to send me the video back down. Hey Jason. Yeah we're hanging out. Hey Bruce S. Yeah should be a good one. Pretty chill. Spooks in the cat cam. I almost like kicked him out but it's not as hot so we're gonna have it. We'll get going in just a couple minutes for those of you who are like why is he just talking. Although I think that's more more and more common. Lightning storm. Ready to chill. Yeah. Hi Mr. Certainly. Keith asks how's the smoke there? It's gone. We only actually had it maybe two or three days which is like better than last year. Last year we had it for like a full week but yeah it's gone. It's it's fine out now. It's cooler. It's only 72 in here. We'll attempt to be wonderful. Hi Hemslab. I hope I am well. I am well yes thank you. Been eating a lot of delicious tortilla soup that I made earlier in the week with corn from the garden and tomatoes from the garden and jalapenos and peppers from the garden too so I just spent like I'm sure I smell all like tomatoes because I picked a bunch of tomatoes after lunch here. Well now I'm hungry. Sorry about that. I know for some of you it's probably closer to dinnertime but I always eat right before I well I always eat before I stream because I'm not that pleasant when I'm hungry so it's better for me to have eaten and I've got water and I've got a water bottle too. Haha yeah Phil's in the chat says happy five years. First email to us five years ago. First email from you. Yeah that was so exciting. Bruce made a Spanish tortilla which is not a tortilla. It's an omelet of sorts. Hi Dexter. Thanks PT. It's been an awesome five years. What really got me was I was like oh that means that like I'm almost to the point where I left Google like this will very soon be the job that I've had the longest since my since I graduated college so that's that's even more wild to me to think of that. I was almost six years at Google so it's pretty wild that CircuitPython is getting up there too. Hey doctor. Hi Anthony. All right should we do housekeeping. David's out again this week I think so notes will be a bit more limited than they than they usually are. I think that's Korean somebody typed in the chat Korean but it doesn't auto translate it. I thought YouTube might do that. Hi Paul. Hi Andrew. Okay let's get going. It's going to be another chill week. There's a video of a bobcat kitten critter care which is definitely cute. Hi Linux 203. Hi Kiba. Did you go to UW? I'm in Seattle by the way. I did go to UW. I got my computer engineering degree in 2009 is when I graduated. I first heard about MicroPython in college and CircuitPython a bit after. It's awesome seeing its growth over time. Awesome. Yeah see I hadn't heard of MicroPython until 803. It was like do you want to work on it? And I was like this is amazing. Like I didn't know it was a thing but I was I had obviously done a lot of Python and I had just discovered how awesome embedded it is. So it is perfect. It was really perfect. It was all just like a very happy like right place right time for all of us. Okay let's do some housekeeping and I'll try to take time stamps like I did last week. So hello everyone. My name is Scott and I go by Tenu online. Forgot to write what I was time coding. This is a deep dive. I do these every week at Friday at 2 p.m. This is a chance for me to hang out and answer questions. It's a chance for me to show the work that I've been doing regardless of like how deep or technical the work is. That's why it's called a deep dive. I work for Adafruit. Adafruit does open source software and hardware based out of New York City. I work remotely for them so I'm in Seattle. So if you want to support me and you can do that by supporting Adafruit by buying stuff from Adafruit.com. Last week we noticed I think was last week we noticed that like macro pads were in stock for example. So buying those supports Adafruit and they paid me to work on this stuff and pay me to stream as well. So feel free to do that. Circuit Python is usually what I'm working on. It's an open source version of Python designed for microcontrollers which are these little tiny inexpensive computers that are mounted on these printed circuit boards. So you'll see here there's a clue and the black is kind of the circuit board. So a microcontroller is a really tiny computer and here it's under this lid and it's like you know 100 megahertz ish and a few kilobytes well portions of a megabyte big in terms of RAM. So yeah Minnesota Menta says it feels like Friday. Yeah so deep dives around Friday at 2 p.m. Pacific. So yeah typically we go for two hours or more so if you have questions I'm happy to take them. I think we'll kind of bounce between different topics and yeah if you want to chat with me and a lot of others we have an Adafruit Discord server which you can go which you can join by going to the URL adafru.it slash discord that's what the middle box here is is the Discord chat and the kitty who's now back last week it was so hot he wasn't up here. His name is Spook. He is epileptic so he has had seizures during the stream but not in a long long time because he's switched meds and he's doing way way better but I just like to give that caveat just in case. And I think that's all the housekeeping there is. Good morning Rod. Roger says the cat looks very relaxed. He should he was just on my lap wanting pets but I was like dude I can't I can stream and pet you at the same time but not necessarily ideal I would say. Yeah he's hopefully he won't lick his butt which he's definitely done on stream before. Doctor says I wish I could have skipped college and just worked. Some people do that. Keith says there's a lot about my college life I wish I could redo but I was fortunate enough to loosely aware of MicroPython and from that circuit by then and that's something I love from my time back then. Nice. I liked college yeah I don't wish to go back though I'm just like so happy to be done with school. I learned so much just on my own. Ask Patrick W says hey we all made it to another deep dive. Hey Patrick. And shout out to Patrick for posting all of the notes in the repo. So if you ever one if you've just joined our deep dives we have this repo and let me switch to the desktop. So on GitHub we have this deep dive with Scott repo here and it's just like the circuit Python weekly we've got all the past notes and Patrick wrote this awesome script that actually annotates all the time codes with the link to the YouTube. So consider this like very basic search so if you ever wanted to like try to find when I talked about something you can clone this repo or you can actually just search in this repo for let's see if I search for like USB. So here we can see like some things that we talked about and the time codes and then if we wanted to go look at it we could say like here and then here's USB so we could click through there so yeah thanks to Patrick for that and thank you even though Dave is not here DCD he's a huge part of why these notes are so good because he usually does it so he's taking some time off he'll be back but yeah so thanks to Patrick for doing that it was it's always great to see the PRs come in and take a look at that. Minnesota Mentat says sure appreciate that effort and diligence yeah I try to I take coarser notes kind of when I'm doing it myself but yeah hi Nerodoc. All right well I guess I haven't seen any questions go by so if you have questions feel free to ask them otherwise I would hi Patrick I thought I would say or I would start by going over the two main things I was working on this week so the two main things were well on Tuesday I met with Antonio oh let me take a time code speaking of time codes Keith says did you get any signals from the We Balance Board? I haven't poked at it I'm kind of hoping somebody will will figure it out and then email me I haven't poked at it I've been very distracted and I think I'll cover why I've been very distracted later in the thing it's not Adafruit related which is why I'm trying to like minimize minimize my interest but I have another project that has sucked me away and I'm about to get nerds typed again I think just poked it and hoped yeah yep I I'm not sure how to get it going maybe it needs to be inverted or something I don't know like clearly you can do it like there's no way yeah so we were talking about this last week is that the Wiimote controls have your tx and rx on them in the battery compartment so we were trying to figure out how to get how to talk to the remote over the your tx and rx so if anybody knows about and that reminds me I do have a topic I want to ask people about too but first I already took a timecode let's talk about this unit code stuff first so I met with um I met with Antonio and I met with um did you say nerd sniped I did um nerd sniped is this idea of like somebody brings up an interesting topic and distracts you um in a nerdy way so okay so Trevor and Antonio and I met on Tuesday to talk about the apps for the BLE workflow stuff there's really good progress and in fact I need to test I just got I um I just got invited to the test flight for pi leap so that's really exciting and maybe I'll try to do it I don't not sure I have it on um my iPad yet but I would like to get it going on my iPad I need to try it today so then um one thing that Antonio said was that he had a problem with circuit python when trying to write a file that has unicode in it um and I was like oh like one thing I like about mobile experience is that like you can use emojis in lots of ways that you wouldn't before because I'm I guess I think that's cool I'm like on the edge of age that you think that's cool I showed previously that I was doing like code that pies with variable names that were emojis um so I'm all for getting really good uh unicode support in circuit python and and the stuff around circuit python um emojis are fun but the reality is that it also gives us it kind of ensures that we would work across languages and things um when you see me working on circuit python during the day yep you got nerds night yeah um yeah so Antonio had said like oh unicode doesn't work so I dug into it and I fixed it um or I'm in the process of fixing it so I thought I'd just do kind of a PR review like I had before um and we should actually check the ci2 to make sure it's actually working so there's this turn on unicode for fadfs so one of the first things is that there's a switch in fadfs that allows the api for fadfs to be unicode uh otherwise it has like some very old kind of historic encodings instead so that was one of the things that I did um and then because I was in there I noticed that we had x fat support turned off uh x fat is really common for larger sd cards so it's not going to be on in all builds but x fat support will be on in more builds which I am hoping will kind of like reduce the number of people we hear from of like my sd card's not working if we can get more if by turning it on we can get more support or better support for sd cards with x fat support that would be awesome um so I found one I actually I found a number of bugs with this unicode thing um if you don't know we also have a glider repo so glider is the like very basic just send a file to and from the device it has nothing to do with circuit by then it's all about just like this bealie file transfer protocol um and so what we can see here is that there's uh this pr that I did and one of the tricky things with unicode is getting your lengths right so unicode characters are variable length um ascii is always one byte um but unicode is not at all just one byte um and so there was an I found this error in the glider app where um this is the code that's in so in the bealie well I'll just pull it up so the documented file transfer protocol is here in the repo and generally if you see um a lot of these I think a lot of these are all of these commands take paths and when you're given a path you're supposed to provide a 16 bit number encoding the encoded length of the path string um so there is the subtlety when you're dealing with unicode strings of whether you're counting bytes or whether you're counting characters uh because in unicode in utf8 in particular you can have variable lengths of variable numbers of bytes per character um so I was I tried to be specific here to say that it gets the encoded length not the um not the character length um which is why I had to do this fix here um so what what the bug was is that path here is a string so path dot count is the number of characters um and then you'll notice like previously the data section was already doing path dot utf8 so for all the lengths uh being sent from the iOS app you have to do path dot utf8 dot count and that will be the count of bytes not the count of characters so that was like one of the first things I found was that like the the way that I I tested it is like on the circuit pipeline slide I just put a print I said print out the the path that I'm getting and when I did that I noticed that it wasn't perfectly um correct like it wasn't printing the correct path which meant that like something was broken between the app and like printing it out and I looked at the count or the length and it was wrong so the reason it wasn't showing right is because the it was considering it was too short um so yeah this was one of the bugs and then I had to turn and then I turned it on and um in circuit python and fadfs so we used this library called fadfs does that mean that you're doing the encoding twice yeah potentially although um I kind of assumed that uh it's stored internally and as utf8 as well like that's kind of the case in circuit python is like we just it's a string and it's encoded in utf8 so like doing that work to to encode it to bytes for that is like just a copy um so this is the this is the module we use a derivative derivative code from this module so this is really well documented you can see I was actually looking at like the changes that they've made since we've updated damien made like an object oriented version of this library that that is used in circuit python and micro python um so I flipped the switch so that all of the apis for it would be utf8 the encoding in the file system itself is like a a different slightly different unicode style encoding um so it does have to change it a little bit but it's not too bad it does change the file size a little which is unfortunate too so this was one of the fixes um good morning unexpected maker and hi christian too I didn't say how do you yet I don't think um so let's take a look uh one of the things I had to do and we'll see this a lot is um I had to reduce the size of everything and so that's why it's still not merged in is I was just having like build and build issues that I'm working through so we'll we'll take a look at those two uh first up uh I had to add some mp fall through designations this is a the compiler does a check if you have a switch statement and a case if a case doesn't break out and goes to this that a case after it it has a way to check to make sure that uh it's deliberate so we have to mark it as fall through I had to add these because uh because I turned on unicode support I had to make sure I like it was compiling a new portion of the code that it wasn't encoding before um so here's me plumbing it through I just say like oh it's no longer just zero we allow this micro pi fadfs lfn unicode so the lfn bit is that fadfs fadfs actually stores two copies of the file name the original version of fadfs only allowed like 8.3 encodings of really short file names uh but then they came up with this kind of hack that ended up getting patented and it was so long ago that the patch expired now uh which is great but they the lfn is the long file name so in the docs you'll see sfn which is short file name and then you'll see lfn for long file name and it's kind of a hack because the file name gets put in a directory structure or something uh but that's why that's why this is called lfn unicode so um that's that um so here we see that I'm just doing a lot of board size optimization so one of the optimizations for code size that we can do in circuit python that's really handy and this is only for the samd but for the samd we can do these defines to ignore pins um the samd has a 12 byte structure for every pin that you can be used and that's what stores the information about like how uh circums that do i squared c and spy and you are are connected to it what timers are on every pin that sort of stuff so it's 12 bytes a piece and so if we need to save some bytes what we can do is we can do these defines to basically omit that struct from the from the code hi johnny someone told me that to do this day windows is still 8.3 internally it could be yeah so both things are available um it maintains both things it just prefers to show you the long file name so yeah so these are just pin things I looked up like the uh looked up the data sheets and and ignored pins that actually got a lot of space especially for these datum boards which is great um so then I had to do a little bit of like oh for this samd 51 board that uses internal flash like we need the space so don't do x fat um turn off get pass as well get pass as a new module generally like I try to turn off the things that like people won't be using because it's new this is a minor minor thing for audio core yeah and for our samd 20 warren boards for all samd 21 boards we're turning x fat off um along with the small nrf boards the x fat support is like three or four k i think so it's like when i did these builds they they had no chance of fitting these ones with the x fat i also added this circuit pi micro python advanced so there's a micro python module that we have on and it has things like heap lock and unlock it has um stack usage statistics and stuff and I wanted the ability to turn those off um so here's the config changes so I made um code pages generally ignored it's not fully ignored um but setting the api to use utf 8 which is what we use internally uh the circuit pi micro python advanced apis are now behind the full build the only thing in the micro python module in the non full builds will be the const although it may not actually be needed but yeah we we still want it and then x fat is on for full builds by default as well which I think people will be excited about so here's the change I did to turn off uh those advanced apis like stack use heap lock memory totals optimization level but sort of stuff like that's pretty rare that you're gonna need that it's got not categorized into ladies and gentlemen and it's got oh it's fine I don't mind um I don't need to be a lady or a gentleman huh so once I got the file name from the app into circuit python correctly like my my debug print looked okay um I opened it up in the terminal and I just did like the file was written it looked all good so and in fact on circuit pi I showed it as well so I was doing like hello smiley smiley or something and then um Scott does Scott yeah uh what I did was I um I did os.lister and I noticed that it wasn't printing correctly what it was doing is it was escaping the smiley so it was like hello backslash you number number number number backslash you number number number um for the unicode encoding and I was like oh that's that's weird and I tried it in CPython and CPython printed it as I would expect meaning the printable unicode characters got printed um so what I discovered is by digging into uh into micro python this is all micro python code circuit pi done is that it was just um angel um what I noticed is that the this is the print uni print quoted is used for um like the wrapper version of the string so if you if you did um print of the of a string that has unicode in it you it would look okay but if you did just that thing and it does a representation of it it would escape everything even though CPython doesn't um so what I found here is that this escaping was doing um it the only things that it printed verbatim was ASCII characters between 32 and 126 um so I deleted that because I wanted the the last case to actually be to print out uh the correct to print out the full character as is so that's the else clause now so all of these previous clauses are all the ways all the things that we need need to escape instead so um this test here is now inverted so the this test here is all of the ASCII characters that are considered not printable so that's like your return character and your tab and some other ones um control control c control d those sorts of characters um so it's not even iso 88 59 I'm not sure what iso 88 59 is it reminds me of character encodings but I'm not sure what it's referring to um yeah so so this prints out uh it uses the back slash x form to to escape characters uh ASCII characters and then um these are ranges here um where they're unicode ranges that I'm also escaping using lowercase u lowercase u is two bytes and then there's a four byte version but we're not going to escape we don't escape any four byte version anymore um basically I wanted to assume so actually let me just follow this link here so I was curious about uh c python's policy for this and I'll pull it up all Microsoft stuff prefers 88 59 so I found this cool comment in the c python code that says returns one for unicode characters to be hex escaped when leopard zero otherwise all characters accept those characters defined in the unicode character database as the following categories are considered printable oh 88 59 is 8 bit ASCII comes in different flavors we use 88 59-1 yeah so I think this is important for folks that I that are like new to computers this is it's these sorts of standards that are the thing that convert numbers that the cpu deals with to like representations of things like like letters right um hi pier they say hi to Randall hi Randall and Sadie um yeah so this was really interesting this got me looking like trying to figure out what um what uh category every unicode characters in because I I kind of had a couple thoughts on how to do this I had this test script where I was like you know what I can just generate all the characters and see what it prints out but then I got a lot of I was gathering it into ranges that were not printable and there was a lot of stuff there was a lot of like ones and twos and threes and fours ranges and it was like I am not going to write the world's largest if statement to be able to do this so I really wanted to like figure out exactly why um why or what uh policy we should have instead in in micro python so um basically what I what it came down to is like control characters they're in ASCII and I basically did those um but private use is like like private use characters are not really anything um interesting so I kind of just wanted to say like you're not going to use private use characters in strings in circuit python or micro python so we don't escape those not assigned these are the ones that um these are called reserved and maybe let's find the wikipedia wikipedia has an awesome um awesome chart not this let me find it unicode oh wait here we go this gets me there so this is really cool so um here's the unicode character reference oh it's in wiki books hi fatty do this is really cool so I was using this so all these gray boxes these are the ones that are unused right now um so my test script that was going through everything and seeing whether it was printable or not we get a range of like three characters that are not printable and I'm like I can't do all those checks um and you'll see that even these characters here are being printed in a weird way by the font that I have um so these uh this is the range that I am escaping and so if you hover over it says this is a three per m space so these are different lengths of spaces so um in reppers these are generally escaped so that you can see exactly what is what um um so there's like paragraph separator is another one that gets gets escaped and if you see uh here this is exactly what this is saying like the the line separator and the paragraphs that are get escaped and then like the only type of the only type of spacing that is not escaped is the ASCII space um so that's kind of what my decision was is like we'll only escape the these ones that people are probably gonna actually run into and then these uh c control characters are the ones in the like lower single byte ASCII range I think um if we look at this wiki books I mean this is awesome I like it's pretty neat um so like braille I I did there's one in unprintable character that has to do with like Egyptian hierophage hieroglyphs not hierophage hieroglyphs and I was just like okay I'm not going to support that but then you can get up to here and this is like um selector supplements let's say variation selector and then unassigned so like this is a huge range that we could like choose not to but because it's like unassigned like why bother checking for it if like assuming your utfa is correct like we're not going to have that um unicode is the same as ISO 8859 for the first 255 characters yeah so we looked on here if we click this first set like this is so see this is the c0 control so those are the things that we do escape and then it's basic latin and latin supplement so these first two large chunks are those those single byte characters um I would vote for the unicode range from 2800 to 28 ff to be supported it says thomas braille I think we I think it it should just work in circuit python um just got my board supported by circuit python and found a whole new rabbit hole to spend my time on awesome neat hieroglyphs egyptian and my n plus cuniform yeah so what this change is doing is that whereas when it was reppered in circuit python previously it would get printed as the escaped version it will now just show you the character instead um and I think that thomas 2800 I think we just went by it didn't we wait that's 28000 2800s here this is a great reference I should work it here save and you know what let me put it in the notes too unicode character reference I'm doing a great job on time codes but we're talking about so I think unico is really important um which is why I kind of wanted to dig into it and make sure that it worked well which also meant that I really wanted to make sure that all of our builds had this enabled like all of our builds will support unicode and fat fs um which is why I turned it on and I like had to rewrite and find some find some space for things because I really wanted it to be a universal thing not x fat x fat's different but uh the unicode support I wanted to turn on for everything um oh yeah thomas says for braille so yeah here we've got the all of the braille characters which are awesome um so you can hover over it and say like pattern dots 1468 johnny says now we need megabytes of flash to store the full unicode font maps yeah that would be cool that's basically what um like joey did for the open book like joey did a lot of work with the with unicode for all of the open book stuff it's really neat and his his solution was basically like uh his his solution was basically basically to have a second flash chip just for the flash to store the fonts um yeah I do quite a bit of transliteration a little bit of translation sweet that's awesome yeah so this is really important to me and that's why I was like we got a if we're going to turn on unicode fat fs we need it everywhere um and this switch to the way that we printed out is it will be true everywhere as well so that's um this is the c python version or no that's not the c python version where did it go so c python they have this thing of like they just have this database of all of the characters and then they just see like oh is it printable or not so they they can get really fine grained but i didn't want to store all that data to do it so instead um there is this cool c python um unicode database what's the module called unicode data this is really handy really handy um this is what I did is I wrote a script to take a look at this and just see stretches of different categories because once I looked at this comment I could see all the different categories that are um that are not printable and therefore I was like okay well how many different ranges of things are there and and the one that's like the reserved category um is a lot of sporadic stuff so it was like first and foremost I'm not going to do that and then there there were these other things other characters that were pretty like often by themselves I was like I don't really think I need to support that like a few control characters for different languages um the worst case like that if I choose not to escape it the worst case is that it changes the way it prints out it still doesn't like remove it or anything um so yeah this this python module is really neat um it's a standard library module um of course they have that data so they might as well expose it but um pretty neat um I went from like figuring out like reverse engineering what it printed out or not to being able to just look character by character what uh category it is because you can just give it a character and ask what the general category is so that was neat um yeah so what I chose to do in in circuit python is that um I said these settings approximate see pythons printability it's not exhaustive and may may print unprintable characters all ASCII control codes are escaped along with variable space widths and paragraph designators unlike see python we do not escape private use codes or reserved characters um because we assumed that like you had to get the you had to get the unicode in there somewhere and if it's corrupt then you have bigger problems you can always print uh the byte the underlying bytes as well um if you want to see the the byte representation so yeah so I had to switch this um and then oslister worked which is great so anything that uses the representation of a string would be correct I had to fix this test that was using invalid characters um I think it was invalid characters and I added this new test for for repper so using one two and four byte wide characters in it and making sure that it prints out and represents the same as uh see python does so yeah that was one of the things did you block my my message rude I don't know what you're talking about oh I guess I didn't see it maybe I don't see any others oh cool and about three Costa Rican indigenous dialects Egyptian and quite a bit of Sumerian it's amazing I'm so bad at languages all right well um let's take a look at this the ci is still going but it says two have been failing so I wanted to look at that and I also had a question not you blocking at YouTube yeah sorry not in my control so I was having problems with fitting code size so it looks like we're still having code build side issue size issues 40 bytes 40 bytes I have one thing up my sleeve that I don't really want to do and then this is 144 bytes so that's still a lot bummer that was the nano 33 iot how much did I got really sick from vaccine number two better than getting COVID though are you feeling better mark I'm wondering how sick I'll get from vaccine number three if we get vaccine number three yeah if you don't show us what's up your sleeve how will we learn to do the magic tricks so this is the challenge with these problems of having code size is that the moment you free up the space people fill it and the thing I have up my sleeve is that I'm not feeling good sore arm and feverish I'm sorry to hear that are you how many hours after your second dose are you because I definitely did feel like run down and weird Bruce says I felt like that for a few hours Advil and back to bed 24 hours for Johnny twitter message length comes to mind if you change it people will fill it vaccine number two was the sickest I've ever been as an adult wow still better than COVID though hoping dose three will not be as bad for me yeah sorry to hear that I'm uh I'm gonna go get the flu shot here next week probably because they just came out with a flu shot mark says also my amazing uncle died on Tuesday from COVID he was vaccinated but had pre-existing conditions it was eight hours after that that it came now I'm about 36 hours oh I'm so sorry to hear that it's gross as I'm hoping it is be as late as when I actually had COVID what hardware is this for steel blue vision asks um this unicode stuff is for all of the circuit python devices um which is a lot of different stuff if you go to circuit python org slash downloads um this unicode stuff will apply uh fat fs unicode support will apply to all these devices with circuit python seven and the x fat stuff will apply to most devices those that we consider full builds which are basically like not not same d21s and a couple others um maybe they will combine the flu and the COVID at one job I don't know should stop writing my English success no worries no problem at all yeah what do you feel for you sorry folks get vaccinated if you can if or when you can um what was I doing oh so the thing about my sleeve for code size is um and maybe let's talk code size um the thing I have up my sleeve is that whenever I build and maybe I talked about this before um oh that's spoilers for my non a fruit project so if we do at mel samd make board bork arduino zero arduino what is it nano 33 out yeah so if I build this it doesn't run out of space wow five to seven days dylan that sounds rough m or an a flu shot that might be combined better cell reception yeah so if I do if I do that build that failed with like 144 bytes free on mine I actually it succeeds and dan figured out why this is it's because I don't use the version of the standard library that arm releases I'm using the version that arch linux builds and when they build it they build it with dash os which is optimizing for size whereas the arm build is dough dash o2 which is probably faster but larger code size so this is the trick I have up my sleeve the reason well one like it might be a lot of work but also like the moment we open up these 1200 bytes like people will fill it so I'm kind of like waiting as long as possible to actually set that up like I found I've gotten really close now I had a question that I thought actually folks might watching might know so I was getting desperate for bytes I my partner was coming in and asking me she was like just wanting to chat and stuff and I was like do you have extra bytes you could lend me like I want to find extra bytes so this is a program called teta it comes on kde and it's used for viewing binary files so if I open recent I have this full dot bin now ignore the first 2k of this I just added 2k of zeros to get the offset to be correct so that I could compare it to the firmware map but I was looking through this so here's here's circuit pythons core and if I page down basically what I was trying to look at is like are there any places where there's a lot of zeros that I could figure out why there's a bunch of zeros so that I could delete them and make our library smaller and let's see kind of curious to see so I found this pattern and I wonder I wondered if somebody well there's some those are those are pretty long ranges of just zeros like why in the world would we be storing a bunch of zeros not as cool so the way that we can figure it out is that we pull up a sublime text here and this full bin is for a build of arduino zero I hope it's the same one otherwise I'd have to redo this file but if we do build arduino zero firmware elf map map looking for sequences of zeros that's really scraping the optimization barrel I tell you we have been in this like optimize the firmware size business for many many years at this point and that's one of the reasons you'll see us be so aggressive how aggressive will be when we move compiler versions because generally gcc's gotten better and the code size always goes down with newer versions so if you delete the zeros won't that change the way circuit python operates um I can't just delete the zeros that would like shift the location of everything but what I can do is I'd like to figure out why those zeros are there right um why are we storing this this range here of zeros like um there's a way to have zero memory zeroed in the first place um oh and this gives me a a clue look at this circuit python hid these might be usb descriptors with a bunch of zeros in them um I'm not actually planning on doing compression Keithy says woo compression that's what I love I'm not planning on actually doing compression what I would like to find is I'd like to find places where we're storing zeros where we don't actually need to store them um because there's the bss section which is a an area of RAM that all is and starts at zero so if there's something that I'm storing that is actually like mostly zeros like why maybe what I can do is move that large thing to the zeros area and write some code to just like plop the one value in there that I need uh Higgins says is there a trick to getting a macro pad to show up as a device in Linux being plugging and unplugging or hitting the reset button make sure you're using a usb cable that is known to be good that is almost always the problem is that the usb cable is bad oh it shows up in dmessage just not as dev in my file browser is connected storage device you might try it in different usb ports there is a a bug on this in the silicon for the usb stuff um but yeah this gives me a clue this circuit python circuit python hid string here so this is a c string um so this is what I would expect hi piata um and there's a lot of zeros here so if I do 2c f 2c 500 and I go I'm I'm cramped for space unfortunately like can I turn off word wrap yeah that's better so it is a debug build so there's all these debug macros but in the map file at some point you get to the point where there's actually addresses so if we keep going further down we're trying to go to 2c 2c 5 I wonder if this is old this must not match up because 2c 5 here is a pin definition 2c 5c yeah product name 2c 5d 0 yeah this doesn't look like it's exactly matching up let me just do it again let's make that full bin so what I did is I let's rm full bin and I did that let's see if that changes it huh I noticed that circuit python device serial numbers are nibble swapped between 6.3 and 7.0 interesting uh do we care it would make sense that it was like there's a lot of usb hid stuff usb descriptor stuff that changed so I would bet that it was introduced then run length and coding I'm not actually going to do compression with there's a lot of string compression already for error messages um open this again and let's go to 2 5 see oh yeah so look at that that's interesting actually because this is 2 5 a 7 that still doesn't look like pin data for some reason I'm not oh this is 2 5 not 2c this looks like um q strings so see all of these things like collections so that's the string for 16 bit strings yes q strings could use the same compression as error messages the problem is is that q strings are also added dynamically so we would need to either be able to distinguish between compressed and uncompressed q strings or we would need to um be able to compress them on the fly which would save ram um so a q string is if you look in circuit python's source like if we just pull up um actually let me take a timecode because this is a great topic and maybe we've talked about it before but you know we can talk about it again again qster um so if you look in just like your shared bindings top level and look at the bottom so this is this is your map for beli these are if you do import underscore beli io this is the thing that does the dictionary to look up stuff um and so we have these mpq strings and what that does is there's a script in the build process that looks everywhere for things of this form and it mashes them all together so it deduplicates all of these strings and then it gives them a 16 bit identifier so that 16 bit identifier is used to refer to that string um everywhere in the code so this table this uh wrong map thing like this value here is actually just going to be a 16 bit number whereas in c it would potentially be a 32 bit pointer to some other place um so this is key strings is a way to both deduplicate and save uh memory okay so if we go back you'll see that like there's just all these different uh different names so this is like the qstring pool there's like data into this and then there's also something else um you can see like module names like bin ascii and actually let's this 25a3 so let's go back to our map file and just make sure that like because we it's pretty obvious what that is let's just kind of correlate it with our map file that it end up getting closed i have too many files open it's right here and i gotta turn the word wrap off again let's open it for good word wrap so it's actually 225 this is not right my offsets are not right i wonder if that 2k that i added is making it worse not better where is this cuester 2a like there's this const adder and then there is this const pool so there's all the pin names this must be the pool or whatever the first one is 2a 8 7 i don't know why these aren't matching up anybody know why these aren't matching up it would be really nice if they did like the very first thing is 2000 and that's the vector so if we go all the way up to why is that 800 oh is it not 2k oh jeez like this is where the vectors are there's debug strings in one of them are they in the other they're not matching up because it's a live stream you're assuming i had it matching up beforehand which i did not have what is the offset for 800 did i just make it i think i just had my file like right it's a 2k file but 2000 is 8k that's why it's wrong okay um i need 8k not 2k so i use dd to make the file okay so now if i do you remove fold in cat 8k all right let's reload this file i should have checked that way earlier reload okay so we got more zeros which we we expect and now the non-zeros which is the reset vector is now in the right place so if we check now go back here to our q string const adder is at 2a 374 oh i'm so excited to get this right two three seven four that's the attribute table now the pool is at 2a 87 so at 2a 87 we should have seen it what is that so this is really weird right like this is a whole lot of zeros in columns so this is what i was going to ask about like const pool a 87 look at all these zeros there's gotta be what is that zero through f that's like the hex table see yeah that's right nibble to hex upper that's what it is the const pool is where you go to swim through your code so i think this is matching up because that's certainly the the nibble to hex upper good night dave thanks for hanging out and plot a histogram of the bytes yes i could do that ah so all of these look like these are tables like objects global tables so let's look at like c1 e0 is this one wire i o module or module globals is 16 bytes globals and globals table all of these twos are probably addresses so this is probably an address e1 ec oh you know that's what all these zeros could be plot a histogram of the bytes i think that must just be pointers right so like like this is the 64 bit version of the pointer but if we're storing a regular pointer of a 32 bit it's gonna have this top two zeros on it so like each pair of hex digits is one byte so having a single zero is next to two is actually not that surprising and because those are just all pointers so i think i answered some of my own question so one thing i found is funny is that there's both padding there's a padding zeros and a padding spaces um which has 28 bytes which we could totally get rid of of like why in the world is that in flash why is that should we save those 28 bytes thought my eyesight was getting worse due to the returns i was just youtube selecting 480 yeah it could be padding for alignment to christian that's right so i found this spot that was like like so there's like a name but let me find what i was seeing before this looks more like what i was thinking of yeah but look at all that stuff look at all those zeros it must be just pointer pointer tables are pretty common i think those strings are used in the format functions yeah yeah those those two strings for padding i found this really weird case when i was looking at this yesterday i wonder if these things without the zeros is actually code blocks so like the things with lots of zeros are pointers and then the code blocks are dense and then it like frees up again but this is where i was like this looks funny to me this region right here here is the thing that i was thinking about because look it's c i r c u i t p y t h there's no on the word before it is possible and it's all of these like zero something zero something this is like 16 it's two bytes of character instead of one which is really weird so we're in 2f 8 2f 800 so let's take a look on here i was like what is this and can i make it smaller 2f so does somebody yeah i think this is the right section does somebody know what this is what are these read only data's um i don't know what those are compressed french error messages they're chopped pieces right they are chopped pieces and it is a french build um but why are they geedra would be great for this peter i've tried geedra and i don't know how to use it unfortunately um i would love to watch a live stream on house how to do that like that's what live streams are really good for so did i go by the end of it 2f oh okay 2f 8 is not here so 2f 8 we are past 2f 2f 8d 0 is the end of d 0 what is this words i think you're right this words must be a compression thing but they're not utf 8 words they're utf 16 words i guess that's by design oh hopper for mac oh patrick linked me to a thing right copy link count of words by length impossible it's very painful when you first start yeah so i was trying i was trying to use geedra to reverse engineer the firmware of the we balance board that's what i i was like looking at it pretty recently and i was like this doesn't seem right um yeah i can draw i can drop these files in discord ml sandi surger pythe was in that yeah that's true but why isn't it utf 8 encoded that's the thing arduino 0 it might be on accident map and i'll send the uf 2 as well all right so if you want to follow along um the files are in discord which you can go to by going to the url adafru diet slash discord okay so how are these what is the structure of this words table like i do feel like i'm on to something still i don't know of a reason to have it thanks thanks bruce for posting the link make q string data bits per code point it or substrings like the words values const m care values but we're not looking at what is m care format values type that is not what i mean by chart that's the c character types i could put in quotes too like i get wanting to be able to support it multi byte i was looking at this there's gotta be a better way than m care t's it's lengths values wait wait wait let's take a look this generates a file let's find that file it's in build gen compression generator yeah so look at this this we do want a word wrap so we've got a words table oh values is also m care t type def m care that's why that's why i couldn't find it m care t is a unit 16 lengths but these lengths could be bites still all right folks i think does anybody disagree with me this is a place where we can win back some bites oh interesting pu pu m4 has type def unit 8 this could explain why the french build is so much larger is that if the french build is using unit 16 to store everything compress word offset tables scope is it just the error strings or does it go further yeah so we only do compression on on all translated strings which is almost all error messages but also that yeah so this is a french build can you tell it i like i'm in the zone i'm on the hunt i'm on the hunt for some bites now how many bites is this i wish it would print it out for me um and these are two bites a piece instead of one i think if we try something other than huffman we can get space don't knock huffman i so i don't want to change i don't want to change any of the compression stuff i just want to store how the how the words that that decompresses with are stored right like i'm not going to redo it all like it's going to be iterative but what i'm trying not to have is this like let's see how how long do we think it is like it's about there it's about that block and it's like 496 bites so we'll get about 250 back um hover over words and it might tell you how many bites it is what is it that's smart there's no reason that the python code can't print it out so the question is is then when words is read text has correlation huffman ignores well that's part that's part of the reason this compression is doing it over over longer words i also get un8 with french french should use iso 8859 so i see no need for 16 bit characters yeah so i think it's time for us to actually start poking at the compression script and and see how it changes things so let's go to compression scripts uh i hope folks didn't want it's not that smart i hope folks didn't want to get to usb because i think i'm i'm on to something and i want to get this pr out by the end of the week um define compress max length bits is eight so why is it you went 16 okay let's just rebuild okay so actually let me copy copy the build you know zero firmware never know where details takes you it's totally true i'll have to change the title since we're not getting to usb but it's kind of fun to oh i wonder if good night mark i hope you feel better yeah vs code does a lot indexing oh look the size changed are we in the wrong so this is still unit 16 christian says i'm getting un8 i'm on the arduino zero with this branch of mine so why is it so i i find the easiest way to edit this script is um let's see where is it we found it in here it's pi make q string data okay so pi q string data here we go max or if max or it is greater than 255 bytes per code point estimated net savings where are we printing to not this file length count it might be in one of the other generated files so let's open those two build i really need to delete these build folders jan header qster there we go so this shows us words 55 and these are all the words yeah that looks right so this is the huffman table that we've got going with the count values right so i think it's that i turned word wrap off and now nothing is word wrapping we're like printing values twice oh that makes total sense but it doesn't make sense that we're doing it whole sale like i don't see why we should store it as you went 16 everywhere we should be able to do it as you utf-8 man i gotta close some of these what do you mean by whole sale whole sale i think i i think when i meant that i meant like all storing all of the characters as too wide instead of um instead of only storing the characters that need the space although i fix that bug i fix this bug this was my figuring out how to the usb problems i agree about utf-8 but as a quick fix in the meantime maybe find a probably one code point above 255 in the french translation how one made it i i got time i i think i'd rather just fix it because this this will improve this should improve any compression that are a mix between like english phrases and not um it might be like i think we ran into this like non non-breaking space doesn't have been let you store the characters as three bits if you want 18 bits if you want yes so what what the huffman thing and i just closed this file but nice um so what we're looking at is the one so here so it is trading off these these are the words these are the sequences that we want to store and then these are the individual sequences and characters so like um support is being encoded as and i think the second number is the the occurrence count and then this is the actual binary representation that we're storing it in we're storing right this is the decode table correct so the strings themselves have the the huffman coded versions which is what all these ones ones and zeros are for each character but we need some place to look it up um you know i was wondering when suddenly french became the problematic language instead of german before exactly exactly so that that might have been the tipping point uh is when we got some character like we can uh now that we know where the print comes in we can say max ord if max ord actually you know what we want or if ord c is greater than 255 we could pretend print text so this will just tell us what it is and if you're working with queue strings don't assume that it's going to i'm just curious to see how much space we'll get back um oh look at that i should probably print it as a comment but there's a there there's the problem the c compiler is worried about it too what is this octal without anything multi character character constant but yeah when doing queue strings just clean every time because i don't think we're we have our build rule set up quite right key says is this a fix that will go into the micro python as well i'm not sure because the micro micro python has done string compression error string compression just like we have um but they've done it a different way they don't share the same compression technique that we do um so it's up to them to see if they have this problem lz w on key strings well we're so these are not queue strings these are error strings these are error strings they're not queue strings the queue strings are not compressed still um so that we have one of our oldest issues down with 88 59 long live utf 8 i agree 100 with you kyle okay so now if we open this file again we can see that it's this this translation which if we want to i want i'm curious to see how much space we're losing just because of that translation so let's go to the french file find that translation and just delete this line or maybe what i do is i it's a it looks like a fancy um a fancy quote single quote so let me clean now oh look at that we got like 550 bytes back that's enough for me to pass so now it's interesting now it's 73 words and the nice quote is general punctuation all just to have a nice quote on one thing but you like it shouldn't we should be able to have it in there and not cost everything else like like we should be able to figure out how to make it actually just store it as utf 8 and then i don't need to change the translation okay so i was going to pull up the build file compression generated let's just make this print nicer first shall we so compression generated is here so these are file writes but there's a missing let's just put two and then we could wrap this so first and foremost i'm just trying to pretty this up when i'm doing uh when i'm editing this script i kind of just like to put my debug output in the script itself see so if i don't do a clean it is redoing something so maybe it will work oh yeah so it did regenerate it which is handy w length count so now we're back to m care is a unit 8 which we should always want let me just find um c string unit code i was just looking at a good reference for this c string unit code literal i was in i'm in unit code land i'm not leaving the pi folder is shared between both that's why i asked yeah just like us just like me we like to do we like to do things our own way this is what i was thinking of okay so here's the syntax for string literals so what we could do instead of having this be a unit 8 what we could do is we could have this be a string literal a u8 s care sequence and then we'd be able to read it which would be awesome um so being able to read the words would be awesome and then the utf 8 encoded string literal the type of u8 is a const care of n until c plus plus 20 and then since c plus plus 20 it's a const care 8 t n where n is the size of the string in unit code code points including the null terminator a code point right we're not using c plus plus 11 oh i guess c plus plus that's right i'm looking at c plus plus string literals not c aren't i does that mean that c doesn't do that character literal maybe we do have to just encoded ourselves string literal literal constructor functions that's not what i want oh before c plus plus 11 there was no literal for c plus plus strings where do i find about c literals does it just string literal can i find a reference just for this is the c plus plus reference is there just a regular c reference oh embedded artistry is a great block characters arrays which is terminated by a null character are called string literals c arrays do not track their own size i could try to find my k and r we could add a check to weblate to not allow characters above 0x ff like that's not going to work like we have korean we have we have languages that are not that are not below 0x ff i think we have korean i want to support languages higher than that working with c strings is not intuitive i mean does it work i guess it won't because it kind of complained to me already herring strings i mean that's okay we can still do it accessing characters all right i don't care i think we i don't think we need to get it the right way but if we do single single quotes we could do a single character i don't know okay here's what we're going to do uh values type is going to be un8 always we're going to get rid of this bits per code point wait where is this used bits per estimated net savings like that doesn't need to be broad that can be okay so we don't use it elsewhere s must be the string 24 is the overhead so this would really be the length of encode utf 8 right like if we're actually going to store it as utf 8 but we're going to store it as encoded without the null terminator i have no need for unicode in my sweetest translation so if i used it by mistake that would be an error i mean that's fine if you want to add a check for a given language but my point is is like it shouldn't it shouldn't cost 500 bytes to store a fancy it shouldn't time to start the rest rewrite i'm waiting for somebody to make a rust circuit by the module rust that would be awesome that's where i think we should start okay so here we go that means that we can get encode doesn't include a null terminator does it not python hello dot encode that's an important thing to know you are correct thank you you thank you christian for correcting me okay so i can get rid of these two things and max word let's see so this loop we don't need the max ord stuff but we do need and unused hi john franco and don't forget to multiply by 8 where you remove the bits per core point right i understand what you're saying here because i was multiplying it by this is in bits okay so i think we removed all that so that is all the computation to figure out which one's best and then we don't need the type def our values are always going to be u and 8 t so these so values must be the table of individual characters this is the problem is that a value might be more than one character now what is lengths there's lengths and then there's word length count so this is printing out for each word in words i don't think we need for c in words because what we can do in since oh for c in w encode f 8 would word length count be the sum of all word lengths like half a year ago when i put around here i don't trust my initial thoughts i'll take i'll figure it out and then this is do we have b taken b for byte convert it all to and then you know what we can do we can do say f write this and then join all the words together and encode them actually that's just we just f write that i think i guess we're printing it out in a different file but see what that does this probably won't build takes exactly one argument yeah i'm not printing it is nice that it's rerun so i guess what i'm thinking is like it's treating this as two separate sections right it's treating it as i thought we put it behind a did we print out a backslash or something oh we we printed a new line well you know what we can do we're just talking about this we can refer it and we're gonna have to go in and oh you know what let's um let's not let's not call it encoded all words okay so this should fill it out so that it can tell us how long it is which is nice and convenient parentheses matching this should add that looks better so now what we have is a big long thing but we're not printing a new line so we need to print a new one so that we get our words down on a new line and this other error is in supervisor shared translate supervisor shared translate line 45 get word and care put utf-8 get word entries in the words table are guaranteed not to represent words themselves so this adds at least at most one level of recursive call what does what entries in the words table are guaranteed not to represent words put utf-8 well that's just because it's not if words start right so put utf-8 is saying given this character but if we're actually getting a word that the word itself is made up of unicode characters buffer is a care star so I think oh we're returning size entries in the words table are guaranteed not to represent words themselves how does that make sense hmm get word for a position what does get word do it gets where it is and then then from that start to the end we read the value and we put it but the value isn't into you and therefore we add it to the original buffer which we're not mutating at all I think that word length count of i like it's just filling in more than a single character right like I think this would all be this is all simpler word start word end is this because you're going from the start to the word end but if but you're looping such that the position is less than the end of the word so that you can never have the full word in that loop at present throws me for a loop though I mean it's really funny that it's reading so like put utf-8 is doing a conversion from a multi high at makers like put utf-8 is converting a a number that's greater than eight bits potentially right so this really should be well no it's an eight because put utf-8 is also used probably somewhere else put utf-8 yeah so this is also the thing that is doing the decompression so here's the decompression blah blah blah we tell it how many characters we filled in based on the value you and what by calling put utf-8 again recursively is it's basically saying how does it work for how does it work I don't know like four so seven f is like 128 right so it passes the first two through but then does it have to like make it higher like that doesn't seem correct to me like what I would expect is it would say oh so so if we're in the regular compression we just pass it down I don't know why this is int put utf-8 is doing values search length plus bits minus max code values word start is 128 word end 200 so we get a value out are there maybe what we should do is sort this by ordinal ordinal like okay one 173 means this word 138 means two parentheses I kind of want this list like this list is sorted by like least to most but I'm curious if there's actually holes length count right so there's like two steps to it is that right I'm in the weeds like I don't think there's any reason like I don't think it's correct I don't think it's correct for words to include like characters in the range between right like does anybody know that like couldn't we incorrectly decompress this then I'm maybe I need Jeff Jeff has looked at this too well I think there's an extra in direction after I think it's Huffman to value value to character character to or it's Huffman to value value to something so we have w length count we have words we have values and we have lengths I think there's an explanation up here I might know French and they all look correct words but I have no clue how they got getting coded I yeah there's there've been a couple of folks that have gotten in here since I did the initial Huffman version yeah encoded bytes to Huffman encoded bytes use Huffman to get values and the values look in the value lookup yes and it and then the value lookup either points to a single character value points to a single character and then it either points to a single character or it points to a thing out of the range of characters which means it's a word that word might be a single character so for text in text and unused max words and unused minus 0x80 like this feels like there's gotta be some holes on it right a single character based on a bit flag yeah it's using the highest bit as a bit flag essentially code book bit length never time a word appears what I don't get is that words changes size words changes size but like both values so values was always is no values like there's no bit flags being set here values words length count is the length of the word equals L so between min length and max length are you ever worried that the decoder is going to run out of RAM I don't think so like the decoder should be pretty straightforward like I think the when we when we're decoding a string we know up front how long it is so we can allocate the memory for the string at the start and then we're just copying into it I don't know I'm over time folks and I'm in the we like deeply lost and it's four o'clock on a Friday so I think I'm gonna wrap it up and file an issue instead of it really you know maybe I'll do what Christian I think it was Christian suggested two hours ago is I'll just change those apostrophes and file an issue I think I'll do that actually is that bad is that bad of me sometimes you just gotta walk away and think about it 100% 100% okay let's do that so let me back out my changes I'm gonna take a screenshot and say let me file the issue here first if all the other French messages had bad apostrophes go for it Harold says file an issue and have a good weekend sleep on it thank you yeah that was the only one that had it because I changed it and I got 500 bytes back um here's a here's a pro tip if you don't want to go through the form never use the the nice the nice apostrophe um relations with two byte unicode characters shouldn't use two bytes or every character oh that's a long title thanks Sadie Scott and ladies and gentlemen okay when any character that is compressed is two bytes and all characters in the values and words tables would be two bytes instead only the multi byte character should take multiple bytes let me take a screenshot just change the French version increase to fancy apostrophes okay should I call that a bug I'll call it an enhancement supervisor and do it long term but probably worth doing sooner rather than later all right so we filed the issue now let's use sublime merge to unblock myself so this is what we're going to keep we're going to keep this locale french po change that's what gets rid of those characters and then all of this stuff that I was doing to make queue string data I'm just going to discard I don't know that code well enough to have passively do it so that's what I want and I'll just do a quick I mean I wouldn't be surprised if like Jeff knew about this already oh cute the one that should be used is before I don't know how to input that an apostrophe unit test I'm really bad I mean I I would like to see because I imagine that if we look at the the table for other translations like um if we look at locale like we have both Japanese and Korean and those are both going to take like what if I make the Japanese version and then look at that file so like it's unit 16 again and yeah sure there's this one and this one that are high but like these three one two three four five six seven eight like a number of these are not as high like a number of these are a single byte still so like it wouldn't be quite as good but like actually how how long is that yeah so that's interesting they're mostly mostly above value so I think this values thing must just be be single single characters essentially and I suppose that maybe it's because I don't know what length is but like maybe they don't have very variations because then they can offset into it there or they can index into it without having to worry about where they are I don't know it doesn't make a lot of sense to me which is why I'm punting okay well this should free me up simple or message to save space I mean it's pretty wild to me that like Japanese and Korean builds are okay but French is not now we can get into USB stuff uh yeah we can chat more on second by the dev I'll probably actually be at my computer some more I'm just like ready to stop talking but yeah maybe we'll do USB stuff next week if people want that USB is always around um I could maybe I could quickly nerds type Seattle people so that the project that I'm not working on for work is called campaign funds campaign funds.org we have an election here in Seattle in two months or so and so this is a reformulation of all of the campaign contribution information for the for Washington state so you can see there's if you want to see like who the top donors to this candidate are you can click it and it's slow but it will give you back so now you can see like who gave money to who and like in the last few days who gave money to somebody and if you want to say like oh like what about this person like you can see where they work and you can click their name and then you can pull up um you can pull up their most recent donations so I'm getting very much into data science-y land late at night um yeah bruce says oh now I remember only business black funds are hidden yeah it's it it varies a little bit but like you can actually generally it's through political action committees but donations to political action committees are generally reported as well um so if you look like there's this washington education association pack um that's nice good space for visualizations as well yeah I was playing around with that the challenge that I've been thinking about so far is like matching people up because they their names and addresses and stuff are like slightly they vary slightly um um so that's what my brain has been thinking about is like how do I do data cleaning it's my my partner's a data scientist she was just like you're just doing data cleaning that's all you're doing uh I was calling it record linkage but so yeah if you want to see that um it's campaign funds dot org and I do have um get hub tan new campaign funds dot org is where the code to do it all is if only we could get that kind of funding for education system yeah tiger bite I mean this is why I was really interested in it is I'm trying to get ahead of the game right because we have election in two months and then also um the people that are currently elected made some decisions around yeah sorry I didn't mean to talk politics but I guess I am talking politics um um I wish we could just have people represent us for what's right and what we need not to have schmooze people yeah so it's really interesting um one thing that Seattle is piloting is publicly funded elections so we get these democracy vouchers and then we are able to give them to candidates and then that gives them money and that's accounted for in this data so I actually want to start showing like what fraction of this money in here is like democracy voucher stuff too I mean I brought it up it's campaign stuff it's fine we're not talking about individuals we're talking about the system um but yeah so I was doing all this broadband work right and we saw in Washington some of the representatives make poor decisions and so I was kind of like why are they making those decisions and generally the money is probably why so I wanted to get more insight into um I wanted to get more insight into like where the money was coming from and we're lucky Washington state at least has really good um it has really good reporting requirements so I could just download this file and not only does it have contributions you can see like contributions and but it also has like all these expenditures right so this is actually like this is a pack so who is the pack donating to like they're donating to Schaefer for schools right so like once I link this up I'll be a little like click here and say like oh you got this money from this pack um and you can do like aggregations and stuff too so that's that and it's all data set behind the scenes so if you if you want to see the raw data you hit um all expenditures and it's it's slow because it's like kind of one server behind the pack but now now it's like data set it's the raw data from um the state and including your links to like the record record at the state level um and then you can also do all sorts of sequel queries on it too which I think is neat so yeah I uh that's what I've been really sucked into is is I want to make like kind of profiles for people and and and packs and stuff so that you can like it's kind of like Facebook for campaign finance it's a here's this actor in the system where are they giving money where they get getting money that sort of stuff um but yeah nothing's perfect and in fact yeah I've I have to email them about some error that I found this morning or last night I was trying to look at like I was trying to look at like what the most popular address is to get donations from and there's one address in the Olympia that's given that's given or gotten like 88,000 um 88,000 uh donations from it from all different names there used to be the site which showed all the board members and how they were linked I could see the donors linked that way exactly uh yep you're right there with me I would love to be able to see like here's all the don't like here's all the donors that give to all the same people or given at the same time or all that stuff making the stuff more visible to common people is really honorable work yeah so there is like um like it's really is building on like what the state's doing so you can dig to it at all the states but I don't know what this is like statewide like there's this system and it's not particularly easy to navigate and it's not definitely not easy to definitely not easy to under like just to click from one thing to another so I just downloaded it in bulk and put it in a SQLite database and like you have to find like a particular campaign reporting not raising any money King County council member like the data's there it's just not like not that well not that available I guess is what I would say if I click Kathy Lambert I could see all like the data about it and I can see contributions but like I can't go from like contributor name to see who they they've are like how much total they've contributed into whom so yeah I'm playing around with that like they do have contributor by category which is great but I would like to break down this individual category more by like how much per individual like all these thousand dollar people versus like five hundred dollar people anyway I'm pretty in the weeds in that getting my data science on so if I know some folks actually live in Washington like I do so I thought I'd just try to nerd snipe them talking about one other nerd sniping thing going back to the broadband stuff I was doing I filed a public data request to the city of Seattle because all of the polls are owned by the public electricity utility and that's where all of the like cable companies and stuff put their stuff so I filed a request for their records of everything attached to all their polls so I'm hoping to be able to actually just determine like better availability of all that stuff so yeah I I paid $1.25 to get it and so they should send it to me today or tomorrow or not tomorrow but Monday so I'm very interested to look at that data too and I think I'll just as long as they don't put it under conditions I'll throw it up on GitHub as well yeah Freedom of Information Act is very cool it just takes time I'm learning that I'm learning it takes more time but I'm like waiting to see what the data is before I make more requests okay that's enough uh non-a different work I did want to plug it um so check this out let me know what you think um potentially if there's folks in other states and you know where to get this data we could apply it to other states as well um a lot of the mechanics of like how to aggregate stuff together will be shareable I think um but yeah I've been nerding out about campaign stuff and I need to push a new version because I made these a little bit prettier anyway um okay that's it let's wrap it up it's been one of the more longer deep dives so let me go back to nerd I'm proud to be a nerd um thank you everybody this has been a deep dive with Scott uh we've talked a lot about about a lot of things and I'll probably have to add more time codes here because I did a very bad job uh today so maybe I'll add some more after the fact I wish you could I wish anybody could add them after the fact that would be great if you want to support me in Adafruit go to Adafruit.com uh purchase something there uh they pay me to do these streams they pay me to um work on circuit python and do all this stuff so uh thank you to them for that and um if you want to join us and chat more the discord's a great place to do that you can go to the discord by going to the URL adafru.it slash discord um check that out and um um next week should be on Friday as always I think um doctor says that I've spent like a third of my salary on Adafruit and am I an employee yet not quite it's not how it works me from a grief of hard drive trouble fun seeing you struggle we've all been there yeah no problem um I don't mind struggling I actually had a lot of wins like this usb this usb thing I didn't talk about was like I was very worried very worried it was going to take me forever and then it went pretty quickly yesterday so I'm feeling good anyway let's go to cat cam since he's up here and uh we will talk to you on the discord and see you next week I'll just disturb him all right have a great weekend everyone