 Okay, I actually have my notes here all right, so we do the good the bad the ugly approach is that the still on the table Well, okay, so the good was that I got feedback from the community. So they're actually watching these videos and I got some compliments on that so that was good. I spent some time with one of our members and who's very cognizant and up to speed on Precise and we had a good conversation or two about wake words and training and data sets and all of that And so that's very refreshing to know that there's people out there that are actively engaged and watching these videos So that was the good. I also gave him a copy of the model. I gave to both Gez and Josh And he's gonna test it as well. And so that's good I created a wiki page with installation instructions for new models. That was good And everything else was either bad or ugly. So let's see. So the first issue was the data purge The the sub directory is so large and it's a it's a NAS that it's gonna take 26 days Yeah, yeah, yeah, it's a 75,000 files have to be removed and The arm command takes about five minutes per file So the solution is Josh if you can get me access to the actual server That the NAS lives on Yeah, so if you can get me access to the server that the actual drive lives on Rather than having to go over the network. I could probably get it done faster I certainly didn't want to fire it off and let it run for a month because a day or two great a month Yeah, I don't know. So that was one of the bads Let's see I added an image to the data pipeline wiki, which I'm you know hoping I'll get feedback on and it looks You know like a I think in pictures So it's a picture of how the the data pipeline should probably look from end to end But the bad and the ugly work well the other downright ugly was I got caught on a rabbit hole Tuesday I got stuck on a problem with precise studio stuff I was building and I ended up staying up until five in the morning working on it and I'm getting too old for that and I wasn't able to get up before 11 the next day And I was shot. So I got to try to balance that out, but Yeah So I I mean I know what the problem is it was basically I want you to be able to record right from the studio And JavaScript records in web M format and odd container and I got a I was trying to get JavaScript to convert it It's to ship up a wave, but I can just use FM peg up on the server and do it So, you know, I know the solution it just I was being stubborn. So that took me that was ugly You know the fact that it was gonna take 26 days to purge the data is ugly I'd like to get access to that server and get it out of our hair today And that's basically it for me The precise studio I moved it from being a CGI that required a server or an environment to a python Script that runs simple HTTP server with the objective being that when you check out the code line There's a subdirectory that has everything for training in it You need you click on this link and it brings up a web interface and you can create your models and test your models And add and delete from your data sets and move stuff around Test models and all of that without ever having to use the command line I mean over the last month or two I've been building up a bunch of scripts from the command line to simplify the process and this is just a Manifestation of that. So that's kind of what I was working on didn't make as much progress as I would look like to but I Did get a couple of models in people's hands and I think I got some positive feedback at least from Gez And hopefully I'll get some from Josh and the other community members. So we'll see how that works But yeah, that's my update It sounds like some of the stuff you're doing might overlap with some of the stuff we have planned for the community precise stuff like being able to submit like words You know through with the web so I want to make sure we're on the same page with Yeah, no completely different issue. It's it's it's that this is a personalized studio So it's an application that basically you get when you check out precise At some point we could productize if we want to but it's a UI interface to the the whole process So you don't have to actually ever hit the command line to create a model train a model Yeah, I guess I guess my point is I don't want to duplicate a lot of code If it means if I'm going to write something It's gonna allow people to submit record samples It sounds a lot like something you're I think you're misunderstanding this has nothing to do with that This is assuming you have nothing and you want to create a custom like word This is for the process with the UI. Well, you're talking about my data pipeline Okay, I don't know about that because our data sets are gonna be quite large and that's a volume is We can talk about that later, but that was not my intent Sorry. Yeah, that's it for me. I requested Charlie and I go because I think Charlie has to Charlie has to bug out for some other so Charlie you there Hey guys, do you hear me right now? I'm on the phone. Do you guys hear me? Yeah, yeah, we can do it Yes, yeah, maybe you can't hear Okay, so what I did today and what I've been doing here this past week. I've finally been able to get back into I've probably been able to get back in the office and work We've just been finishing the prototypes. I know we sent out four or five I believe and then today we've been troubleshooting some of the prototypes So whether you need the connectors or the software or they're as very high because you're trying to figure out Okay, what what are some issues with the current ones because we've had some problems with Um, we've had some problems with Connectivity sometimes they don't turn on and we've just been trying to diagnose what is it about these About these prototypes. It's not working So in particular we actually figured out that a couple of my ports are broken and we probably need over a couple more of those Over a few more pies, but overall I've spent a lot of troubleshooting the current ones Yes, let's keep all keep rolling with that. So yeah, we sent the four out to It's the project roller team and they were they tested all fine But the other three prototypes that were on the you know ready to go They exhibited some issues and so I didn't get them out the door One of them we turned turned out to be and I actually discovered what causes this but it seems like the Something that we were doing was resetting the power board that we were using because it's a it's a adjustable Voltage output and so I we ended up frying some of the the pies Or I did with with that and I believe that actually fried the micboards as well So we've been trying to recover from that to a certain degree, but it has um It's kind of revealed some needs that we need to be able to tell if this is a hardware thing or if this is a software thing and One of those things is the audio output We've had this kind of known but I almost certainly saw again today Where the audio output doesn't work initially on like a first boot Um, and then maybe I think today we even saw or didn't work on the second boot So just makes it very difficult to determine. Okay. Is this the broken mic or is this Audio issue so one idea I had on that was and I just created a task for it was just create something simple Like we just throw a wave file On the image and I'm acting on obviously I can just download one But um, just include something on a test file in an image that we can run outside of microsoft software Just see the audio output is working um So some some things like that. We should be thinking about just so we can Uh, because we go through this like or that a usb cable is it the mic board is it There if you have a uh, you know the sample wave file in microf core for the alert sound Oh, okay. Cool. All right. Yeah, so we can just do that What is the like a play and you know play wave files see the audio up as we're Cool, so uh, so in addition to those devices we're also building an extra device for michael Um, so, you know trying to get that out the door tomorrow so you can have that for our next week um And then outside of that i've been doing Uh a little bit of work. I added uh, I guess I should show you guys I um added some screenshots in Jira for some updates on the um on the the industrial design of the new uh, sj-based prototype And so if anyone wants to take a look I've added it to HR 66 industrial design r2 on the Yeah Michael are derrick's doing that the um the fact that we can overload the voltage and burn out a board Um is disconcerting it best can't we put like a varistor or something there and limit it? Okay. Yeah Yeah, this was just these are Off this this is from the off the shelf design So this is a like a board. We're literally buying on amazon and we have to this little potentiometer. We have to set the voltage for And I know that we set it correctly There's something that we were doing that's causing it to reset But once we get them all buttoned up and you know, they're working in the tested fully I've never had an issue after that, but I was discovered what what we do in the process that it's caused a few to reset Uh Which is just kind of weird definitely disconcerting, but um Yeah, that shouldn't be an issue once we move over to the uh The new board Sorry guys, it's taken in here. Um, okay I showed this to I think I showed it to Ken last night. Okay, so This is um kind of the front view So you've got the screen is a kind of a second module that can be removed to have a screenless version And we've got four buttons across the top one for action one for Volume down one for volume up and one for beauty in the microphone Um And then on the back side, um, we've got a little channel here the power actually plugs up vertically It's kind of designed so the board Could maximize everything as much as possible to be on that same board Um, so yeah, there's a lot of detailing to do. This is really still pretty pretty rough But thought I'd give you guys an update, you know, the grill pattern is kind of a placeholder for now I'd like to to work on that Um, you know, there's not a lot of the finishing like edge detailing, you know, all that stuff The lights will be in this area up here on top and they will be in the kind of shape of the logo of my craft So what I think I'd like I'm not going to try and do is put our logo kind of embossed in the center of that So that's that's where it is right now and the screen itself is not fully these edges It's going to be, you know inset a little bit so size-wise Uh volume it's really going to be smaller than the off-the-shelf design But it is a little squatter and wider So that's it for me. I'm gonna Kevin has been requesting to have something for him to put his First board prototypes in so I'm also working on getting at least something to You know, it won't be Um You know full factor design, but something at least he can throw it in and and be able to do some tests Okay, that was a long way to that doesn't that's me So the good was I got a lot of I spent a lot of time in confluence documenting Some things around the upcoming precise Efforts, so the precise The base page for the precise holder has got a look at an overview in it Put all the different pieces and then there's links to the specific documents that The detail that information So that's kind of set up and and then I did some work on The design of the first part, which is what we're going to talk about later Um, you started to write the code even a little bit. Um, just kind of you know Just some frame-wisey stuff and you're getting a new flash gap up and everything so I need for um precise so uh So yeah, so I mean good progress on on that The coding if we can agree on the design the coding shouldn't take a lot of long for this part Is really just like one endpoint and um In a database table, which I haven't designed yet, but Yeah, um, that's That's the good, um Not really any bad or ugly. Um, just still kind of kind of holding pattern here I haven't been able to get more done recently because I have a sister's here now So she's been able to help me out a bit. So that's been Hopefully I'll get more productive. So that's good. But yeah Hopefully I'll be out of here sometime in the next week. I wonder the thing I forgot to mention if it's a good Is we I'm going I'm reviewing a pr right now from okay that It implements a plugging system for Some of our services like the audio service and uh, if that's the speech we have always different potential Solutions for each one And you think a plug-in system will help so that people have to actually I don't know if to change core to add a new option For like a tts engine or a wakeboard engine So, um, I've been back and forth with okay on that a little bit. It was not quite done yet, but We're close. I think that'll be a win as far as you know, extensibility of core in the future I'm sorry, I didn't get all of that. Um, let me make sure I understood what you said Are you planning on changing the way that, um, precise is delivered in core? Basically instead of having, um Like precise, I think and other wakeboard engines right now are I don't like the requirements for For core because they're they're all Improved in core all the ones that we say we support now So now what you're gonna be able to do is say I want to add Do we need a vapor-recognizer x and want core to be able to use that for a plug-in system? You can install that That um plug-in and then core will recognize it using the setup.py file some mechanics there Um, you know, there's like an auto-detect and that kind of thing So, um, I think eventually we will probably move some of the existing ones out as well So that everything's a plug-in instead of having one or two included and one or two not For now Continue to be included in core and if anything additional comes along then we use the plug-in system And eventually we'll get the point where everything's a plug-in as far as I think it's wake word s2t and tts of internal three things Okay, so um on the um The wake word module that's in the wake word or hot words file And uh, one of the first things that k and i talked about was it's just too bloated It's it includes every wake word Recognizing module in a big old file At a minimum It wouldn't take more than a couple of hours to take all of those and put them out of separate modules And then import them. So that's where I thought you were going. Okay. I got that's kind of what that's kind of like What this is for is instead of importing them though, maybe it's a more a plug-in system than an import system Yeah, that'd be cool. Okay, that's Yeah, so at the moment Chris is saying like at the moment the pr is for tts and audio services But I think wake word has already been flagged as like another area where this would be really useful Yeah, another interesting possibility is uh skills, maybe if you change how we do skills is from imports to the plug-in system That's just you know, it's mentioned in the pr description is punting them We can possibly do and we can talk about that down the line But um, but yeah, right now we have these factories where you can just include all kinds of different Uh, you know technologies where you can do these things with so this will be a nice a nice way to not have to worry about potentially doing something bad decor when you do a plug-in It's not Go ahead get started I guess uh Yeah, so currently, um, it's not built into the buildchris. Um, we leave a couple we leave the sort of the core services that are actually in use um In core and then you can add in additional ones um, but So Chris was suggested you might change that so that The all of the services are pulled out and so then You add whichever couple that you do like Um, but you do want to use um, but it does also mean that users can add or skills without a dependency for example of like You know, I need this audio service to be able to To operate um and install that on the plot so Kind of both I guess you can answer And uh, the the plan is to get the the base system in Um now so that people can start using it Before we pull things out. We're just going to be to break and change We should have to wait until 20.08 20.08 not too far away. So, um, I want to kind of think how this one Goes in the wild. So no the plan was well, I'm hoping I'm hoping we'll be able to deploy this before 20.08 But not pull out the services Not pull out the existing services. So you could then add in extra plugins Without breaking the existing functionality and then at 20.08 we can pull out the services that are not essential Which would be breaking for some people That's done from Chris Yeah, um, but uh, yeah, I'm also going to uh document That system a bit so we can Um, actually tell people what it's about before you before we ship it Um, which is a good thing. Uh in terms of other stuff, um Well on the tps we added, um festival tps support, uh, which most people won't use but it does provide on-device um tps Full of languages that aren't currently supported. So particularly getting pushed from our catalan community Um, we were also working on or maybe have already changed a neural voice with Takatron too. So, um, so Yeah, that's pretty exciting um in terms of the I don't think I have anything too ugly, but certainly some some bad stuff is the Uh, well stuff is not progressing smoothly The service readiness stuff which will help with what Derek was talking about before Yeah, we're still just uh the current plan is to shift the The system readiness check to the enclosure. Um, currently it's in the foundation service because it's the final thing that kind of happens. Um, and Going back and forth with our day. We're talking about having a sort of, um A process status object to each of the different services. Um, so that there's a common interface for sort of checking in with each of the services about What their status is and you know, if they're ready or not Um Uh, so yeah, Chris, um, could you get your eyes on that? Um The skill API is something that's been around for a while, um, but it's just never got merged. Um So I was reviewing that for the idea is that you know, if I've got an alarm skill I can I can expose one of my methods as an as an API for other skills on the device Um, so you know, they can use an alarm without having to code their own alarm piece into their own thing Uh Yeah, so Mostly good. Um, I just found a couple issues. So, um, I'll be having another part of that. Um And uh, I've been doing some um I've been trying to get the the new a new, um qt image. Um Uh packaged up. I I busted the script with the mimics to caching. I think so, um I've been Again chatting with okay. Who's on holidays at the moment? So hopefully I'll be able to get I'm hoping that I'll have one of those images by the end today Um, and then the only other thing is that um, uh, el fashino has been I'm doing some tkf training for us using ad george and data sets that we kind of show When we get back, um, and since it's been Sitting there not doing anything. Uh, we thought that yeah, you can have a go at it. Um, so he's trying it out with um, with as well We're just yeah Getting access to some of the original data, um from states and so Um, we're still pushing pushing on that. Uh, but there's another result. So, um Yeah, that's like that goes well and we'll we'll have a new u.s. Um female boy at some point Hey, guys, are you uh, did you leave that a model? I gave you installed or did you go back to the old one? Oh, no, I'm I'm all all the way on the new model. So yeah Did you uh, change the settings or did you leave them at their defaults though the sensitivity and trigger? Uh, I left them as what you suggested Oh, okay. So 0.1 and 7. Okay. So so you are using that. Okay. Thanks. I just wanted to follow up on that I am finding one issue. Anyway, this is not the right place for that. I've got one issue with that, but I just uh, so you can turn on recording of Of wakewoods like locally so you store all the All the recording to yourself And for some reason It's they're all super truncated. So Which is going to be an issue when we try and use them for training obviously So instead of getting a micro you will just get craft And so I'm not sure if it's related to the new model or something in core that's changed So I need to be into that Yeah, and um, or is it leading is it is a truncating leading or trailing? Leading Leading, okay interesting That might get the crop or even just the teeth of the wakewoods that you don't get the Which out the models I bet you that behavior will continue. Um, I believe that was always the way it was That's in the mic dot pie. Yeah, I saw that too. Okay. Sorry No, nothing significant and roll out the roll out people are pretty reasonably happy the, uh I did add some good conversations with a couple of uh Uh potential customers that are interested in doing some interesting things and hopefully the community hear about some of them soon Yeah, they're readiness I'd like to wait and get some more feedback, but yeah, maybe I was gonna say I was gonna add a wakewood data button to the slainy insight and Let them download it I read it Yeah, I was concerned about that too because actually that data or that page probably needs to be part of the Data pipeline page and actually I used some of this documentation as a starting point. Um, so But the thing with the data pipeline Document is that it's it's huge. Um, there's a lot in there You know, it's kind of hoping to break it up into smaller pieces So maybe there could be a link to this document in the data pipeline document because the data pipeline goes all the way from collection to To tag in and I was hoping to address each other separately as far as how we're going to implement some Yeah Yeah, that sounds like that'll keep that page more sane But maybe just a better link in there to that page would be good. Sure It did with the precise top page where there's links to the more detailed documents Um, you know in that and maybe I don't have been this but I did to the precise top page maybe um similar to what that data pipeline thing is too is we just need to figure out how we want to present um This information, I guess, you know the best way to use to use our tools to You know to present the information we want to present Yeah, I had assumed you had already read everything that I put out there and you were just basically Uh restructuring it and I saw some of the edits you made so I mean, I really don't have many comments on it I just you know contribute what I can and then let you guys go ahead and run with it. So that's fine Well, I wanted to be part um and to me there's that's a separate document as far as the design of that web page And now we're going to collect so it's probably going to have some sort of link to that um So really I just wanted to say this is a tool we're going to have Um That allows us to collect things on a different in a different way But I didn't want to go into too much detail because that was going to be a separate design This is really just how And maybe I could include more information about that in this document because this is part of collection Um, but I haven't given much thought Well, Michael Michael speaking to your your first point The precise roadmap I think is the scope of the project And I think there's about six or seven items on the precise roadmap and I think I tried to put them in priority order or execution order um, but Yeah, uh And chris, you know, we were talking a little bit briefly before the meeting probably now it's the time to surface it the um the index files that I alluded to The rationale for them and you're absolutely right If we decide to go with the database Then there's they're basically kind of superfluous. There's there's reasons why they're not that we can get into technically But they're kind of superfluous You have to understand that my approach with almost everything is that if there is an existing system My I try to layer on top of it without modifying the existing project So the index files were a way to layer on Additional classification attributes to the data sets without having to modify the schema So the existing um collection mechanism could stay unscathed and untouched And then the index files would be able to You'd be able to use the index files to find pitch classified versus, you know, other classifieds versus Whatever attributes without modifying the schema of the database There was a table I was going to add to the database which would add a path To the files Since the current database doesn't consider that and assumes everything's in one big old subdirectory and Right now That's what's causing us the most pain. And so they have that has to be changed They have to be moved out to smaller directories We can't have directories with a million and a half files in them. It's just ridiculous It's what causes the kind of, you know, aggravation I'm dealing with now And so the index files were a way to address that as well as a separate Join table that would have a domain table of paths And a and then a index key into the actual data file Table so all that would be layered on top without modifying the existing system the intent being you could simply take it Re-deploy it somewhere else incorporate and slingy and bada bing bada boom everything would work But if we're going to go and change the schema and modify that code Then all bets are off. So that's what the index files were Exactly in other words It's part of the uh, the accountability or reproducibility trail or whatever you want to call it So when you create a model in my world What it does is it basically you give it an index file And there's there's tools to combine index files, right? And so you can say, you know combine these five index files. Give me a big old master index file That's what i'm going to train off of now store that in the model's description, you know And it's directory structure and now you know all the files that went into the training of that model It also helps across contamination and stuff So, you know, they serve multi purposes But if you don't mind modifying the schema and if you don't mind You know constantly updating the domain of things that you're tagging And then it's it's superfluous, but I still think they're good to have but that's just my two cents Yeah, so just from a from a high level my thoughts are more long You know, everything is done a certain way in salini as far as how end points are created and and with the with the use of the library code, so Um, there are going to be some changes to how you know in my mind. This is a A bit of a rearchitecture as well Well, let me make this didn't you tell me that salini was simply a repository structure? I could just throw some code in there I could just take the code and throw it in there But it would be to be kind of a black sheep as far as the way the other salini apis are coded Okay, I understand you're trying to keep a synergy across all of the products and all the code across the code base, right? Yeah, so I mean we talked about this a little bit before and I don't know what Executives management's opinion is on this, but I do want um to have a better holistic a holistic Architecture and have things be done the same way as much as possible other than having You know all these different things that different people did over time just mush together So and that's kind of why I'm Doing what I'm doing But isn't the existing tagger and the existing uploader a flask app anyway? There's a flask app in a separate um Bring it into the salini repository. I'm missing why it would be a black sheep if you did that So the couple of different reasons one is there's you know, it wouldn't be using we've using a different database Um, so there's a whole data access layer that all the apis and salini users share that um But that code doesn't use and share right now. Can they all use the same database? Yes There are different schemas, but they're all the same database So this would be a different schema inside the precise database And just the way the apis are structured as far as um Just from a coding standpoint probably the file structures and stuff would be very different. So Do the apis take json as input? There's json request objects Which is right because we could probably take it offline I like the overarching thing is is architecturally Or just from an effort standpoint I mean if if we want to do like, I mean, and this is the higher level thing We could figure out the details later, but for a higher level Do we just want to take what we have and put it in a different repository? And you know deal with the fact that there's you know, some differences between that and the way we do What things elsewhere or do we want a real cohesive? Architecture or do you want to put off making a piece of architecture later and just throw this up over there in the short term and go back and maybe You know do that later Julia scope then I mean I could do what exactly what can it describe and just take the code throw in the repository and you know make it work It's probably a time issue right? I mean if you take it and drop it in the new in the new repository and leave it as is That would take a day How long would it take to re-architect? It probably isn't going to take that long right? It's not that much code There aren't that many endpoints, but I mean it's an overarching question is you know because we're going to run into this a lot um, you know where things were done a certain way before um before And you know, maybe you want them to architect differently or go forward differently I don't think so again. There's not that many endpoints So the code itself the biggest thing is going to be the the schema and how we store the data um and how that's going to work but The actual api itself is not that large Yeah, and the schema um the schema that we have now You and I didn't get a chance to talk about this But um, did you have an opportunity to review it? The stuff I sent you Yes, a little bit. Um, I have a thin grasp on how it's how it's working right now um but again That so the the data design of this linear database is very Intentionally like they're normal for me and I mean very database-y This is kind of like the way that it is now is not Not I guess what I would consider a well-designed database or no well designed So again, um, you know, if we want to take what I would consider a poor design and just port it We can do that and maybe fix it later. Maybe never fix it But I I'm never a fan of letting poor architecture live Well, that's what I am No, that's fine, but two questions real quick. Just brief. Um, is there something that you are aware of that's on our roadmap that the existing schema would handle And in what way is the existing schema? Um poor in your in your opinion Just again, it's just that it's it's not follow it doesn't follow data. They could data design methodologies that I like to follow What does that mean? Um, normalize data But it is normalized adding another table. It just has file names That links to it to a table that exists is not normalized data And I'm gonna confuse there's There's not there's a there's a table that has the file name And all of the join values for all of the other tables It has the index into the wake words into the tag counts into the final tag ID It is a normalized database and there's no data that's duplicated across that except In the file name itself because that's where the account ID And the model and the time and all that is derived from and the other tables are populated The one that rose put in there are indexes foreign keys into every one of those tables That's why I'm confused as to what about that design is so offensive to you. Well, there's also stuff we need to add Maybe I need to take a closer look at it, but we also need to add Whatever the new Classification parameters are so Right, which would be a new entry in the um in those domain tables, right? But we don't even have to do that if we just use index files, which is where I was going with that But okay, anyway, I just wanted to understand. I'm sorry Okay, I mean it sounds like we're on the same page as far as scope of the project goes I think it's a these implementation details what these are I'll ask about the whole anonymous contribution state Um, there's something that the community's asked for My assumption is that This being a cleaning application it would require a sensation like the other employee to Something that we could Like have the old tag of You know being available to be able to contribute anonymously or You think it's totally outside the status of what we're doing The old taggers Sorry, go ahead Well anonymous anonymous contributions For uploading raw data is probably not a great idea I thought Gez was referring to anonymous contributors of tag tagging That I don't see a problem one tagging wasn't completely anonymous though because we had that leaderboard We knew who you know, who at least tagged how many But I'm just saying some people might I was talking more about like the rafi rafi project. Um, and Uh, there's a few there's a number of people that use microsoft without the back end at all and Uh, they're people that have said that they would be happy to upload samples Of the wake word or you know, that they're probably more interested in that other stuff that you're doing stand around, you know Being able to develop their own their own wake loads. Um, so it potentially fits closer into that and in which is, you know, the Continual improvement of our active wakewood models So maybe I mean potentially it's also just not worth the effort when we're talking about a small number of users, you know doing this other Um process versus The vast majority of that well, you know, 15 percent of our users or however many that are offered in um You know, maybe we're not talking about enough data samples to to really warrant the extra effort, um And But they could certainly manually Ship them through and then we could manually track them Yeah, I think for people like the rafi project, it's it's not throwing it like they they're not They're not essentially collecting their users data reader So they want to provide a mechanism for their users to contribute to individually contribute their data to microsoft If they want to do that they're all their users Without having a microsoft account. Yeah because they're Like rafi is a is a fully offline kind of a thing There's no you don't need an account to use it for anything. Um, so but if people were happy to provide They haven't a lot of people Suggesting that they'd be happy to provide wakewood data Um, and you know that we want to develop new project models and improve their project models Um, and it's the way that they can do that is to contribute data to us and then they're happy to do that Yeah, so so let me weigh in on that so the the So I hear a couple of themes in this um number one The first question is do we want to abstract the data collection? In a way that allows us to use this same collection mechanism for multiple different forms of data So for example, we're using it to bring in the wakewood spotting stuff But that is like literally the easiest problem in machine learning like hotdog not a hotdog Like they made a whole a whole series of episodes of silicon valley about it, right? You know, but we do have other data that we're going to want to collect So eventually we're going to want the audio utterance like the entire command, right? So that we can use that to improve the the speech to text And you know, we're probably going to want the text of that command as well So that we can feed it to Something that looks like rasa which might actually end up being rasa So that we can improve the natural language understanding And then finally we we may at some point have a desire to take End user utterances And use them to clone voices right so we we may actually grab those audio samples Time to the text and you know simply by using my craft over a period of time We gain the ability or we give you the opportunity to to Replicate your voice You know, all of that data really Belongs in my mind in the same repository. I don't know how it's organized within that repository, but a Your data page inside of microbes Inside of selene where people can go and see their wake word utterances They can go and see their commands They can go and see any audio or any text transcriptions that we have For them and then have control of that data to either delete it or not As part of that page I think it's very appropriate for us to allow the raspy community who would indeed have to create an account to do this But they wouldn't have to use that account every day But when they decided they wanted to upload 50 000 samples, for example, they could go to a mycroft They could go to the mycroft A selene website They could log in and then we would have a bulk upload tab or something where they click the button and they can upload 50 000 utterances and and frankly they want that right because the Tagging stuff can indeed be done anonymously although we should think that through carefully because we are doing double double tagging but It's not all that hard given the volume of users for people to create two anonymous accounts and deliberately reach in and screw our data up Right, like always always assume that that there's somebody out there that's being vindictive, right? So we probably want to tag tie the tagging activity back to an individual account Even if we don't know who's on the other end of that account But in terms of bulk uploads they want that because someday they might want to not make that data available, right? So and if they can't tie it, you know, if we're taking anonymous data from the raspy community How do we know that that data came from somebody who signed the End user license agreement? How do we know what the permissions is? So like that's just a random chunk of data We don't know what the hell to do with it and and and it gets away from our core principle of giving the end the end user control of their data so in a bulk upload scenario It's it's a pretty similar to to any of these other upload scenarios And so, you know, I can see five separate categories of data that we're going to want to track immediately You know wakeward spotting is number one The actual audio samples is number two the audio sample tied to the text example is number three bulk uploads is number four and then manual Data entry is number five and by net manual data entry. I mean going to the saline website and recording their their wakeward recordings through the website not through an end device and And the the fifth category actually would encompass the other four so they could submit wakeward samples They could submit full speech commands. They could tie those speech commands to text and then I guess it only incorporates the other three and so that becomes a full-on application Within saline that allows the end users to control their data So the question I heard Chris veyer ask is do we want to build this now? And My response to that is probably yes is that the In my view this application is really code goes to the very core of what we're doing as a company you know Deploying a useful piece of technology using that useful piece of technology to collect data in an ethical way Tagging categorizing and manipulating that data so that we can use it to improve the technology and then using the improved technology to further expand our reach for technology usefulness and grow the community and that becomes this This hamster wheel that that chris guesslings always talk about this self for reinforcing wheel that that drives Forward the company. So, you know, I think that we've been down this path too often with like, hey Let's issue some technical deck and just bang it out. Um, you know, if we can build that application in a reasonable time frame I'm a big fan of getting that done Yeah, so I guess chris v that answers your question As long as michael agrees with that assessment That's not my call. That's michael's call. I want to That's just as opinion Yeah, I mean right now for what? Yeah, I mean right now from what I heard from josh, there's like five components of which two are already done and the trade-off is Does chris just take the two as is and put them in the same repository? Or does he look at the entire five components? And see if it's going to make more sense To abstract this stuff out So that in other words, is it going to be better if chris spends the time up front? and then The next four modules come faster, right? Because that's kind of where I think he's going is that if he can You know re-architect this and make it more modular and more It's leaning like And it's and it's you know, basically he's re got a chance to re-architect the first two The next three will be cookie cutter is what I think I'm hearing Yeah, and not only that but the old the old taggers and react And then you know all of our saline stuff is in angular. So that at the very least will have to be reworked And there's also stuff you need to add to it anyway. So I'm not too worried about that but um Yeah, and the contribution website our contribution page is a brand new thing So it doesn't really matter. That's that's that's be built. Um So really what it comes down to is the existing things we have, you know what? You know what we do with those so, um, you know, do we spend a little time cleaning it up or do we just And and let's and let's abstract that stuff away because there there are other things like once we have all that data tagged for example We can empower the users to use biometrics, right? Like once we have enough audio data for an individual and user we can say Um, you know, we can tie a specific audio sample back to that user if they want us to Right. Um, I think the other two things we should be thinking of because our friends at amazon goofed this when they deployed the amazon prime video Um, I think that we should consider family members as part of this abstraction, right? So I have an account, right? But you know the audio samples that are flowing into my account probably encompass my wife and my children as well So we probably want some mechanism of dividing those out for that end user, right as part of that So i'm going to throw that in the chat too and then uh, I think we want to be uh Data agnostic with with how we're dealing with this stuff. So You know another thing we're probably going to want eventually is is pictures of the end user because the Mark II has a camera and there are scenarios where And and I want to just highlight this because across the board. This is the policy Only if the end one end user wants to use the feature, right? Like and the data belongs to them. They control it. They consent to it Like, you know, we're building features that do incorporate a lot of this potentially Very privacy intrusive technology But we're doing it in a way that that allows people to control whether or not they use it So now let me step to facial recognition You know the mark two should have a camera on it And so there is a scenario where the end user wants to tie, you know, visual visual data back to that account So that You know Christopher rogers Who has a fantastic name? So that Christopher rogers when he's wandering around his magic wood using his wake look technology If you remember that demo where you can just look at the mark one and it puts it in a listing mode That that would you know be able to not only work for whoever's in the room but tie that individual Face back to a specific account so that when I look at it and say place Spotify It plays, you know, whatever What was I listening to the other way? Oh, George Michael it plays whatever George Michael Album I was listening to last I did just admit that on video Yeah George Michael was in there you go Um, but if the you know, if my daughter looks at it it plays whatever Uh rando, I I honestly have gotten so old I can no longer name any of the any the artist she listens to Uh, whatever the latest Taylor Swift albums, right? So, um, you know all of that You know and there are probably You know tens dozens potentially hundreds of other data points that we're going to want to collect and categorize So I think we should abstract it away in a way that allows us to basically tie generic piece of data back to You know a specific account and then tie Validation related to that data back to specific accounts And so an individual piece of data, you know, even today with our applications may be tied to three separate accounts You know, I contributed it through oh four. I contributed it through my device, right? It's actually data that's tied to my daughter. So that's the second identity Um, you know somebody in the community Derek tagged it the first time and then chris g tagged it the second time So that individual piece of data has actually been touched by four people, right? And we're probably going to want to keep track of that because if we find out um, that You know, Derek is is actually deliberately reaching in to to, uh Miscategorize data and then he's in cahoots with chris g. Um, you know, we we want to be able to back those tags out, right? So You know, well, and that's not we're small Overlaps are going to happen at our scale, right? So, you know, let's not give people the power to deliver I mean, we haven't had any bad behavior there But the whole reason we have security problems on the internet is everybody who invented it assumed that everybody would be on good behavior so So let's just assume that at every step of that process the end the end user is malicious and build it with that in mind Yeah, one of the I wanted to say was on this kind of topic is to me consistency and um Cohesion are kind of important when we talk about having to maintain the stuff of the launcher So if we can be consistent about how we do things And when we work before we you know to pass them by after being everyone in their silo do things doing things their own way And that's why we have all this very different architected things sitting out there So, um, I'd like to get to a point where everything is just thought through to a point where you know, you see a piece of Codes and salini, you know exactly what you're looking for and where you're looking for it And it'll happen. It'll help the main cycle going forward So that that's kind of that's a person that's going from two Yeah, but josh, you have to realize that to the best of my knowledge at least from the two pieces of code associated with this process that i've seen None of that auditability Currently exists in this code days Yeah, so if we're going to spend $150,000, you know Doing this, which is what it'll you know, that's what two months of mycroft costs, right? It was 150 grand Um, let's make sure that out the other side pops something that is maintainable and it doesn't have all this technical debt the um The past is history, right? So um, so let's work on on being consistent going forward and and you're right like I The original version that I hacked to get like I said it several times that I hacked it together in an afternoon Like yeah, it's broke. It was just intended to be a stopgap Um, you know now that we're moving it into a real production piece of software and thinking it through Let's let's do that carefully Yeah, that that gas thing is a is a really important one too like that's something that the others just totally Don't want to don't want to touch the potential poll and I'm assuming the kid thing too is Is you know, there's special requirements when you're collecting when you know you're collecting data from from kids And so I think you know, they just go by the by the route that like, you know No young people are using our software, which is clearly false because that's not like games and shit Oh, I think Their attitude their attitude is probably much more along the lines that we're so big We don't care if the law says that we can't collect kids Like if you want to do something about it, we'd happily pay your fine as a cost of doing business, right? Um, but they you know, remember that they have also in the bigger platforms made a lot of progress at tagging by taking things biometrically and so You know, the they may be able to suck some of that stuff out That's the last thing I added all the the line items I talked about into the chat The last thing that if we can do it would be great because if we have an utterance and we can figure out What actually triggered on that utterance and grab it? Um, obviously with permission That would give it that would be very very helpful at helping us to suss out the nou So knowing that when I said, uh, play hilly louis in the news It triggered the news to go instead of Spotify to listen to you know, the the early 80s ban Um, would be very helpful for us to go through and resolve issues inside the nou stack So it may make sense for us to grab the resulting behavior if if we can figure that out This is part of the anonymous contribution It's it's sort of semi kicks it off as well. Um, it sounds like that's not that's not going to we can prioritize it's not going to wear It's going to be building into to the stack at any point in the near future. And so if community members want to provide Data from outside of microsoft They need to sort of collect that up and and do the bulk upload or create an account and And remember we're not we're not doing any account verification there, right? so if they want to create the account using tor and Create the username anonymous anonymous anonymous and submit that data through that anonymous channel Like that data is still anonymous, right? But it just ties it back to an account So if the data is total crap we can reach into the system and say To lead all data that was submitted by this user because this user is submitting a bunch of malicious data, right? So it's not that we won't take it anonymously. It's that we won't take it without an account I'm sorry. I was muted again. God dammit. How do you just how do you discern that, you know, somebody is a guest on one? You know, it didn't didn't do an agreement. That's that's a Oh, so they could go into their data and say, oh, this was so and so and delete it. Thanks Okay, that's that's the part I was a little fuzzy on like we have to do that. That's a Okay I'm doing right now if somebody walks in and says a microphone an activation you need to sign an agreement The same as google photos like if I upload a bunch of photos they they They are able to tie all of the pictures together and tell me that this face is in these 1500 pictures But they don't know who that face is until I type a name in, you know We eventually if we're doing our jobs, right? We should be able to go through and say there are 15 000 utterances in your account And they're from these, you know 98% of them are from these four users. Can you give us a name and an age for each one of these users so that we can You know if the age is if it's inappropriate for us to keep them because they're under age so that we can nuke them, right? Or so that we can tie them back to You know name age and gender would be really helpful because then we actually name age gender and dialect would be an Enormously helpful contribution from our community because then we could say hey This is somebody who speaks Indian English. It's 41 years old and female and we could use that to improve the speech speech recognition, right? so so we can We can eventually go through those utterances and categorize them and tie them back to a You know a family member and then it's up to the end user as always to determine what they want to do with the data If they trust us to use it for the benefit of the overalls technology stack, right? If they don't you know, they'll have the power to delete it and to not act in and then we won't even have the data in that case Okay, so from what i'm hearing right now as I build it We're gonna assume that people have us a an account with us For the for this data whether through a device or through using the selene application yes okay, and there'll be This will be part of the account page they go to and they should be able to see all the data on their account and You know, we should be able to provide them with the ability to To tag data right and earn points of some kind the Yeah, but that's the further down the road like my current focus is just getting precise data tagged and possibly contributed to and all that that we're talking about now specific to precise will require an account Yeah, you have to have an account We probably want to do the family members thing now since we're collecting the data And then we need a tool for people to to tag the data the rest of this can probably wait As long as when we build that we abstract it We abstract it away enough that we'll be able to add other data types and other tagging Workflows to the to the system The other thing what you may want to do before we go and reinvent the wheel is spend A half an hour on the google and see if somebody has already built this entire system for us And we could just plug it the hell in It very well may exist out there because there's a lot of companies solving the same problem Even the wake word The problem of collecting data from users tagging it using using Whatever supervised learning categorizing and storing it like that that's There are tons of companies doing at this out in machine learning space. So there may be a Hell there may be an open source framework that that has all these abstractions already thought through and is available as a an api We'd still have to be modified to fit into the selenium for structure is what i'm hearing Yeah, I probably would but you know if somebody else has a wheel. I would rather not invent it over again I will look Yeah, so my approach to that was going to be since the Contribution web page is a brand new And part of the reason it's not sussed out in the document that I wrote is because it's brand new I was going to Port the stuff we have first like port the you know get the might get the device based stuff working With selenium and get the tagger working in selenium because those are known entities and then um bill like that would be like a Last thing would be putting this brand new Contribution system, which is why that isn't really sussed out in the in the Designed accident. Um, I could approach it differently and just get contribution done completely You know um and get that that section done so we can say this is how you contribute data That's fine, too. Let's just I just want to tell you how I was approaching it And if you want to tell me to approach it differently, I'm happy to do that What are we here? Okay, I'll give that a thought And can you know I should probably maybe tomorrow spend some time talking about You know This particular document in particular and how it differs from what you've put together so far and what we want to You know, we want to be going out the other end Sure when it's convenient for you ping me All right Go team All right guys Dr. Sam you're with me See you