 Okay, welcome. It is July 20th. The Minecraft DevSync starts now. Okay, so welcome everybody. Today, as we discussed of our last meeting, I want to try doing things in a little bit different way and hopefully we can be more efficient. So I'd like to start off today going through the tickets but I'd like to go through them in a very specific way. We used to do this in the last couple companies and we called it the good, the bad, and the ugly. So we'll just go through everybody and rather than going through ticket by ticket, just give us a quick update on what's going good and what is bad and then what is really ugly. And the ugly stuff is what's the distinction between bad and ugly. Basically it boils down to bad stuff is just bad stuff. You didn't get done what you wanted to get done or you had to run to the hospital or something like that. The ugly stuff is stuff that is blocking you from getting work done. If there's just something that you can't make progress on for some reason or you're not going to be able to finish it in time, then that's the time to bring things up. So let's just go through everybody and we can reference the jury tickets. That's a good thing to do but I don't want to go through them item by item. So let's just highlight what you did get done, what didn't get done, and if there's any real problems going on. So we'll start off with that. Then we will try to keep the discussion to a minimum. It's just people giving up their status and then we'll go through and we'll have a discussion about the things that came up during that status update and we can prioritize those and sort of limit the scope of our discussion. And that's it. So let's go with Ken first. The good was that last Thursday I listened to my marching orders and began investigating how to start tuning our hyper parameters. That required me to figure out what they were and see if they were accessible which they were not. There are more I'm sure in there but the top ones like surface so far are epic batch sizes, they're kind of related. Dropout rate is the gradient dropout rate sensitivity and RCUs which is basically the long term short term memory gates. So those all have variable values that we could be using in this. Obviously we're trying not to enumerate every value. So I came up with what I believe is common sense regarding some of them to get started. But I can certainly see in the future maybe even a machine learning model to control the hyper parameter tuning so we could be AI squared. That being said I was getting ready to I had done some initial testing over the weekend with it because I had to modify some of the code, some of the precise code because none of these parameters actually except for epic were accessible. Now they are. I added some command line parsing and stuff to the train. I don't know what that was, that was interesting. Anyway, so that's what I'm at. I'm getting ready to put some of the code up on the Lambda server and fire up some hyper parameter tuning runs. They'll probably be many hours maybe even consuming the better part of a day or two. And when I get some initial results obviously post them next Thursday or earlier and share them so that people could show some insight maybe into what we're seeing. And then once I complete that it's like I said I don't need to sit there and babysit it when it's running. I just need to get it kicked off. So then I'll get back into the pipeline process and handling the ability to delete anybody whose data we need to delete and I'll leave it with this. Gez asked a question regarding what does that mean? Does it mean we completely eradicate the data in which case we could break some relational stuff or does it mean that we flag it as deleted? And what I'm leaning towards is once I get the database back in sync with the pipeline process leaving a adding a new column to the database which says this person opted out and their data has been deleted. And so any of those so I would actually physically delete the files from our system but I would keep in the database an entry that they used to exist and they've now been deleted. There'll be a column that indicates that. So Gez that's kind of how I was leaning unless anybody has any better input. That's what I'm planning on moving on to. As soon as I get these hyperparameter tuning tests fired up in the morning. That's it. I don't have anything blocking them. I'm fine. The only bad would be that the enumeration of all these values is in the millions and I'm trying to figure out a way to keep it reasonable maybe a quarter of a million at a run. I've got some parallelism I've introduced to try to help us step through some of that. I'm looking I like to keep runs under 24 hours if possible so I'm leaning towards that. Alright thanks Ken. So I took down a couple of things that we could talk about. We could talk about the data deletion and priority of that versus hyperparameter tuning. And then we could also talk about the goals with respect to the hyperparameter tuning and maybe that's the thing we can do offline. But how do we go when we're done I guess is what I'm trying to get at there. Okay so let's go with Chris Gislin. Cool. Yeah I spent most of the time documenting actually the last few days. The questions around open data re-triggered the fact that we don't really have an easy way to read what that means and it's something I've been meaning to document for quite some time so I've started that so yeah I came to keep discussing that and make sure that is 100% accurate before we push it out there into our actual documentation. So mostly good the testing stuff is just continuing as in every time I see I runs and that's been going well and then the other pieces in looking at all of the boot up stuff and ensuring that the device is actually ready when it says that it's ready which has also led me into starting to document actual Microsoft core processes a bit more because currently the sort of best thing that we have for that is a four year old video from Steve about how he was going to design it so yeah getting into that because it's not an area of Microsoft that I've actually dealt with a lot before so yeah generally pretty good documentation takes time alright thanks Gus Chris there so the good is that I have a new version of the of the Pi 4 image that I took over basically I took the results of our last sprint our last bug fix sprint I also completed that sprint and started the bug fix sprint too took the results of our last bug fix sprint and built a new image out of it and that is should be uploaded right now and I started a new version of the images that's more like some of our other naming conventions so basically this is the first image of July 2020 so this image is 2020.7.1 the last image I renamed to 2020.6.1 and I also am getting ready to publish a release notes for each of these image releases in them so that we can tell what went into each and if something went wrong maybe be able to backtrack and see where we screwed stuff up from image to image so I will publish that soon and put it in the journey ticket I'm using to track this new release of the Pi 4 image the bad is that I was it took me a long time one of the things I did was since the microp display repose and that is now I'm starting to core is now up to date let me start over since the branch we use for this device in core is now up to date with the dev branch of core but all the skills in the build I started with are 1908 so I took some time and upgraded all the skills on the device but that broke some stuff so it took me a while to get through those upgrades figure out what was breaking and get it all to work again I updated the instructions for building the image to reflect this hopefully I'll get to a point where I'll start with the last image and build upon it rather than starting from scratch but yes that's a good part of Friday was getting this image up and running again but it is and it has everything in it the ugly is that I tried to update our WordPress instance last week and I there's a droplet out there called maintenance page that I figured was what we use last time to do a maintenance page but it doesn't appear to be anything on it so I'm really I'm not sure where we're keeping that maintenance page right now I don't remember so if anybody guess if you remember anything about that or it will help me get the maintenance page up so I can resize that droplet that would be awesome right now that's on hold so I can figure out where that maintenance page is living and yeah that's about it okay thanks Chris let's see I guess Derek is next on my screen okay so I'll let Charlie talk a little bit about project rollover when I get to him all I can say the good is that we've been continuing to print parts and about 85% done with that so that's good and then I think the ugliest thing on that side is that we haven't really done as much heat testing as I would like so once Charlie is in the office this week we're going to do some more of that so on the prototyping side of things on the first prototype for the market too with Kevin at the end of last week we posted the repo and since then most they've been working on quotes Kevin has so we've got some good looking several options there and meanwhile I've been making more progress on the industrial design so that's kind of been mostly where my time has been spent kind of taking we just have the blocking file that we shared with the community really didn't have the final plastic design so I'm going through and working on the detailing of that hopefully I have something actually to share of concrete by the next meeting say Thursday because I think too ugly over there I feel like everything is going pretty well there I think the one thing that Josh wants to talk about is in terms of the Wi-Fi setup I think there are some aspects of that just in that I was thinking about prototypes and I mentioned beginning to call like I've been thinking about the value of making more of the off the shelf style prototypes while we're also moving towards the new design I guess I should use the correct terms here so we've called the new design we're calling it the board anyway the whole thing is the dev kit and the board itself is SJ201 so yeah the value of making and so I kind of renamed the existing prototypes OTS for off the shelf so the value of making more OTS prototypes short term being available without devices especially kin and it's still going to be several weeks even with the best quotes to get the SJ201 and then some time to bring it up and test it and it may not even work the first time that's to be expected so yeah something we should discuss and I would personally like once we're done with project this phase of project rollover to a new phase we can move them to SJ201 designs that I'd like to put to rest the form factor version of the OTS design at least and be done with that so anyway and if we do put that to rest there are some floating out there we could collect parts off of them or something to think about okay thanks Derek so Charlie Derek mentioned that you might have some updates for us not too much on my end I've been doing the same stuff this past week and Derek's been dropping off some thanks for me to some parts for me to cut out I still need to update the spreadsheet that Derek gave to me so I'm going to be doing some counting here in a bit but everything for the most part that I've done looks good I really need to see Derek I'm not sure what part it is exactly but it looks like some of the casing I mentioned that it snapped I can actually check right now to see if the glue worked or not and then we think that really went bad on my end so Derek's been bringing over the housings and I've been cutting those out and then when I get in tomorrow I'd imagine I'd help him test some of the heat and we still have not assembled we have not completely assembled any of the units but I'd imagine once we get in that tomorrow I'd imagine we'd get started on some of those so overall I think lots of the stuff that I'm going to be getting touched up on is really going to be starting tomorrow I did go ahead and assemble one that we started but parts to do many more we should easily be able to do five what happened with your health and what happened with your health scare there everything is good thank God I'm glad I took the extra precautions because where I'm at right now it's obviously not the worst thing ever it's not like some cities but at the same time it's just been going around a lot here and just had to take those precautions I'm glad to see that everybody in my family, my friend group and that's just something very good for me to see Josh do you have anything that you'd like to update us on? Not really on the software development side of things I did go ahead and try and get that image working on my existing Mark II and that did not go as planned other than that I've been on the business side of things I did have a meeting with our friends on the rollover side of things and I'd forward across some notes about that to the rest of the team okay alright well thanks everybody I have not done any software development work so no update there okay so here's some notes that I have as to things that we could talk about Ken mentioned data deletion so I think we should have a brief discussion about the priority of that vis-à-vis the other work that he's working on I'd also like to talk about priorities in general Josh mentioned that he tried to update his Mark II prototype and it failed and that set off a series of emails shall we say so we need to I think we should have a discussion now about our priorities around things like the overall priorities like we're talking about doing bug fix sprints right now although Ken is really more you could classify it as a bug fix or as a performance improvement on the wake word side of things it's really kind of both and then there's this issue that came up more recently of the boot process, wifi setup and that sort of thing but also the update process in general for the Mark II which we've kind of punted down the road and I wonder if now is a good time to start talking about when do we put that back on the roadmap, when do we make the development of a reliable update process to mark II our priority because obviously if we're going to be having these things out in the field we need to be able to update them and right now it's all manually Chris Ver makes us an image you download it under an SD cards to get in the slot you know hopefully it works and sometimes it doesn't so we need to resolve that process and I guess that's really two issues really it's being able to update reliably and the wifi setup are completely different issues I guess the boot process is related to the update process in that if we apply a bad update we need to be able to boot anyway and have some sort of catastrophic recovery process that I think can be a lower priority just because devs working here right so if there's a catastrophic failure the fallback plan is okay and then we make an SD card flash and update it that way but I do think that we should get onto the roadmap a process whereby we can update the mark II dev kits over the internet so anybody have any thoughts about that process and what that's going to entail yeah I mean I just want to footstomp that we'd spend a lot of time from June to December working on setting up the mark II over and over and over again we'd had Kusall come in and record the actual audio prompts so we didn't have to do the speech synthesis it could sound supernatural which of course is people's first impression of our product we've done a lot of work around the graphics and we've done a lot of work around stability and then we shifted over to pi 4 and just kind of set all that stuff aside at the end of the for us to ship a product that people are able to use doesn't necessarily require us to have the best music player skill or the best news skill what we do have to have is the ability for somebody to take that device out of a box and connect it to the network and then on the wake word side of things that thing the wake word stuff needs to work fairly well so that people can activate it both you know when the room is silent and you know through barge in and then finally so that it doesn't inadvertently activate all the time which can be a both annoying and a privacy risk you know if we can get those things working you know get our software development process sharpened to the point where you know I was talking to Michael earlier today you know I was reading that it was Netflix or one of the other bigger companies and a new employee their first day at work they push a change the first day for a new software development they push a change all the way through to production right so even if it's just adding a comment to one file you know they're able to take that and push it all the way through to millions of users globally on their first day at work so if we can get the initial setup process squared away and if we can get the wake word experience the wake word and kind of the entire experience working properly you know we can push updates to the rest of it through an update process that's very fast and flexible and that puts us in a position where we can really start moving and doing some you know building a great product but until we cross that line you know I feel as though we're in park and I almost feel as though we're in park going all the way back to the original video Derek you remember I had my mom set up a mark one or try to and just showed the you know made everybody just sit there and squirm while for 15 minutes she tried to get the thing on network before she finally gave up until we can get that stuff squared away the rest of it's it's you know we can have the best best of every other piece of the stack but if the wake word stuff and the initial setup don't work we might as well go home well I totally agree with that sentiment I think we have to look in terms of priorities we have to look at you know how many who is this going to affect right currently if we were to create a you know we have a continuous integration test suite now right but what we don't have is continuous deployment process right or really even a deployment process for the mark two right and that's because there is no such thing as a mark two right now right there's a bunch of varieties of different kinds of hardware with off the shelf parts and this new experimental thing that we're you know building that will hopefully turn into the dev kit and there's the frankincrofts and you know whatever's running on your desktop there's a whole bunch of varieties of software right and if we go and build a deployment process that will allow us to update a mark two we have to decide okay well which mark two are we talking about the pie three mark twos that everyone has the pie four base mark twos that we're moving towards the pie four base mark twos with our own SJ201 board on them you know those processes are going to be different for every single one of those devices and hopefully minor differences between you know the pie four based ones but still there's going to be driver differences they're not even using the same chipset for the margin stuff so you know so there are going to be some differences there but I don't think that the work in general of setting up a continuous deployment process will be wasted I think that is good work that you know we need to do regardless but my question is how much work is that and what are we giving up and who is it going to affect if we you know if we decide to focus on that now versus focus on that in you know a month or two months or three months right so for example how critical is it to our over at rollover if you know if they have to manually flash their devices to get updates like I think that's a question that you know we can't answer I think they should answer that question and you know versus and the tradeoff being that you know while we're doing that work we're not fixing whatever bugs you know that we can or whatever other process improvements we can make so so yeah but that's my thought you know absolutely before we're ready to release anything this process needs to be ironed out but you know how much are each of you relying on the mark two you know prototypes in your day to day work like how useful would it be to you to be able to get updates you know easily like every day or as the pushback yeah I mean I think I think the challenge at the moment is that because it comes back for me it comes back to the kbqt page because at the moment the mark two prototypes that we're talking about kb images and all the kb code is on a separate branch instead of the challenge has been that you know is a fourth of our main code base and so then has to be maintained separately and the same with each of the skills that are on it which is where Chris has spent some time like manually updating all the skills rather than them just doing it all themselves so you know now that we've ironed out we've shifted the kb branch from nineteen point eight to twenty to theoretically updating the devices like keeping that branch up to date should be fairly straightforward in terms of you know just updating that branch and then doing a get pull for micro core but you know that's clearly not the solution that we're talking about here but just as a sort of development process that should work but then yeah the whole thing just comes back to me to we're maintaining two separate images two separate forks at the moment and that slows everything down yeah okay so just to recap there the kibby versus qt split was or cute I think they call it is the GUI we created our own version of that late last year because we simply just couldn't get the cute version running on the hardware that we had right was that on the pi 3 right and it does work on the pi 4 correct okay so so what are the ramifications of picking one versus the other I don't think this is probably do we want to have that discussion now or okay anyway no you know what let's table that I wonder if we need to come up with some criteria for how we're going to make that season my first thought was well what's the performance difference right the reason that it didn't work on the pi 3 was the cute version had some performance repercussions we're just overloading the pi 3 that's my understanding but maybe there's some hardware support in the pi 4 that makes that easier or what but yeah we should we should get some criteria for this okay so let's take that as an action to deal with and kibby versus qt is one problem then there's the wifi setup there's the update processes I think there's a few approaches there's you know there's the get approach which I don't think is the right solution there's um you know dead packages and you know using our app which is what we use for the mark one and then there's this new process that we're floating with around you know it's something like snap packages the way to go there's sort of three main routes that I can see right okay so we should come up with a criteria for evaluating this as well I mean we've had this discussion a couple times in the past so it's not like this isn't it but we should let's do it formally right and then there's the wifi setup which we've talked about copiously email and whatnot so we have some decisions to make there as well okay so those are the things that are keeping us from making the boot and update process you know something that we can just start work work on right we need to make some decisions on which path we're going to take so who is in a good spot to to write down the criteria or at least start that list for us for these three things um that's probably could do some that work okay so Chris bear we'll start off by creating our first list of criteria for the GUI wifi and update criteria okay so excellent let's take a let's take another easy question here do we need more first dev kits do we need any more pie four based dev kits using the re-speaker array how useful would that be to you I was going to say that you know not having any equipment at all I wouldn't mind waiting until we feel we have a mark that is our final form factor and I could be the first guinea pig for our install process in other words assuming we had our install down we had our all the stuff that has been being complained about regarding wifi setup and everything working I was hoping to be like the first guinea pig for that process and give feedback and continuously go back and forth through that until it's perfect because that to me is really where you need to have perfection I mean you have old mark you know ones that aren't upgrading right or if you have desktop stuff that's not working great I wouldn't see that as a bigger problem is if you're shipping a product and the product's not working and it's not upgrading and it's not installing so I'd love to focus on that whole process as a user if you will okay the one objection I have to that approach is that you're working on the liquid stuff which is you're going to become very intertwined with the margin support as soon as you've got a device it is but we're not going to have a device from what I understand that we can get me that would be in that state with all of the install perfect and the wifi working for at least several weeks am I not mistaken there oh it could be longer than that but we can get you a working pie for that but that's what I'm getting at in other words you know I'm sure I could like go out and even get a pie for and take our schematics and put one together I'm looking at it and we're like this is kind of our you know chance to make sure that everything's perfect and it works as a shrink wrap app off the shelf I don't think you're ever going to get to experience that well we can try it let me put a couple of caveats on what we're doing with Kevin so you know the boards are going to take three weeks before he he's got to finish himself by hand possibly and then we've talked about this over time there's a very strong reality that they may not work or they may need heavy reworking so yeah and then you know at that point they won't also they won't also be in a production enclosure or anything to begin with we'll be looking at the laser cut what we call generally you know the frankincroft designs to start with right so that that will impact to a certain degree some of the performance you know so we're not going to be looking at doing full optimization or anything it's not going to be you know what you'd consider a product for a while so I think there's some value so the question I guess would be okay maybe four weeks realistically to get something in hand if things go well that is just a board and a pie four in a laser cut enclosure versus maybe something Charlie can switch to once we get done with project rollover prototypes it would be based on our current stuff maybe by the next week that's kind of what the two comparisons I guess I mean I guess my point is I'm a good guinea pig because I can be pretty dense when it comes to installation stuff I kind of like you know I'm one of these people that's potential if it says something I do it my concerns I become too comfortable with the product and its idiosyncrasies and can work around them and won't catch all the gutches during the bring up process and the installation and upgrade process which is really the pain we're trying to make sure our customers don't feel and you know I'm flexible you can ship me one ahead of time I don't care so here's what I want you to look out for is at some point you're going to get to the point in the wake word process that we're going to want to test the barge in support we're not there yet we're just testing the core precise algorithm but at some point we're going to need to test the barge in and a wake word working together and there's going to be an effect there I don't know what that effect is going to be but that's my question is what is the effect echo cancellation and the efficacy of the removing the speaker output from the microphone input that whole process that's going to affect the clarity of the signal being fed into precise so that may have a huge impact on the training that is necessary to get a good precise model but the work you're doing now with what I guess we could consider fairly clean data but who really knows where that what the quality of that data is like was it through a headset mic like I'm wearing or through a speaker microphone on the desktop or what there could be all kinds of garbage in these recordings so in any case I think it's going to change drastically and it may affect what kind of model actually performs best in the world so once you've got a process down for identifying a well trained model and then that I think we can do at least this well then I think we need to start considering okay now let's do it with barge in and so I want you to have a device to be able to do that with from that perspective I would say as soon as you have all the stuff like barge in and all that stuff running get me a unit I should have a lot of this stuff you know a lot of this stuff is going to be on going background stuff certainly on parameter tuning and stuff so you know once I get the process locked down and I don't anticipate that being more than another week or two then I'll have some bandwidth on that you know I don't need to sit around and watch models build and watch hyper parameters so yeah I would say that if you feel like you'll have that stuff in the next couple of weeks then yeah in a couple of weeks it'd be good to get me a unit even if I've got a hack it together or whatnot so I can start looking at is the barge in working you know is it configured right are we getting feedback through the you know speakers yeah that'd be great so yeah there that's a better answer okay so Derek if you can get Ken one in a couple of weeks that would be useful to him okay who else the other people are deficient one is you yourself Michael and Josh does not have a pipe or base unit so is there value in us building say three more I would want one is to be able to yell at you guys but you know I know what the state of things are right now so I don't need that I would actually I will be very involved in bringing up the the SJ201 base model though so I'll need something for that but okay we're a ways away from that well I can send Josh I can just send you the laser cut files that we're using and you can just take that one apart and put a pie for and then put it in you know cut yourself a new enclosure assuming you can get some acrylic but I will send Ken I will we will work on one to send to you that you don't have to do much it's just plug it in and go that'd be great so I think we have a unique opportunity here too that we continue to dev on these pie for frankincroft things then and work everything out that we can here I think it would be an interesting exercise to once the new prototype is ready to go to see what the pain points are going from one to the other because I think one thing we've talked about is that whatever we're doing should be as generic as possible to any form factor what are the pain points going to be if we have everything polished up for this frankincroft and then we move to the next device where's that pain there's definitely going to be some impact in terms of the acoustics and how that you know but it's TBD how that's going to affect the wake word and barge in process I definitely think that there's some tuning and it's going to be necessary but I don't know how involved we're going to have to get but yeah I agree I think it would be great to have a well-defined process for one very particular piece of hardware including the enclosure to see what's the difference between that and you know just a completely different the same electronics but a different 3d print for example yeah I think there's definitely value having us know too that although we want the barge in to be better on the frankincroft designs you know we haven't tuned any of the way that it's currently working but it does work at a reasonable volume so we at least have a benchmark on the difference to compare the new design too right say performance wise we need to match this we need to match the re-speaker microwave at least I think I mean you should be able to barge in at max volume yeah we should be able to do much better but we should be able to at least match the performance of the re-speaker great so it sounds like the answer to the question of how many new models new ones do we need are is one for Ken and I didn't hear Josh say that he could DIY one if you gave him the parts but I think we can assume that one thing I'd like to add is that we'd originally had some kind of an abstraction that was called the enclosure abstraction where the idea was that we would you know layer that on top of the generic software so that it would customize it for whatever form factor and so you know as we do this let's think about it in the context of in the future we may want to support hundreds or thousands of different pieces of hardware you know the generic all of that and then for abstraction changes or customizes the core code for a smart speaker rather than for the redistributing core just for the mark two and then leaving it an exercise to the user that for your automobile right no absolutely I mean that's the thing I was talking about a few months ago when I talked about developing a process you know the first product that we're working on is the mycroft core right this whole software stack but the second product we need to develop is the development environment for someone to take our software and build their own hardware with it right so but that's I mean that's a next year thing right but there's still fundamental architecture questions that we haven't asked I think let alone answered such as do we need to trade a new wake word for every enclosure or can we characterize the audio performance of an enclosure and capture that entirely within the front end you know the XMOS acoustic cancellation chip can we push all that work up to that chip or do we really have to do that and train a work root model for every enclosure or you know or even maybe some other pre-processing work right because I've got all kinds of ideas about you know ways that we can try to cancel out the frequency response of the various components of the device which is something that we could potentially do in software and do in an automated fashion but you know we haven't even started to talk about that stuff so we need to get to the point where we can do one and then figure out how we're going to abstract it so let's see what else is on here so that was supposed to be an easy one wasn't quite as easy as I hoped go figure right oh I guess I don't really know when the dev kits that Charlie's working on are expected to be done do we have an answer for that we can deliver these to roll over I think since we should be back in 100% tomorrow I think it's reasonable that we can have Josh had requested at least five although we promised them seven was that a typo or did you mean the ship five first or ship them in two batches Josh five is what they needed that if we are promised them seven we might as well ship them seven if it's just as easy well I would love to be at a point where we could ship at least five by the end of the week if that's what they need immediately hopefully we can do all of them but minimally five by the end of the week sounds reasonable alright so next question okay so the data deletion issue now as this is a this is a policy issue right we told our you know we told people that we would delete their data if they asked us to and we have not yet done that I know that Ken forgot who else I guess it was probably Chris Bear we're looking into the account IDs and trying to match those up and see how many accounts we have that have data that has been retained improperly do we have an update on that I think the biggest question I would have just from a high level is how many people that opted into the program have opted out yeah we don't have that right now we're now starting to capture it but we don't have it historically well we can recreate it though right we can just look at who's opted in now and create the list of people I don't know that I trust that in other words maybe we don't have opt in flags for 20 or I don't know 100 people but the question is have any of them even opted out the list isn't like confirmed right I think it's fair we should decide what our policy is has anyone considered the issue of someone creates an account opts in then deletes their account have they opted out or did they just delete their account can we retain that data or not is there any way that they can ever delete that data because they don't have an account to tie it to actually I think there's an existing jury issue for this I'm going to tell Linda the process so I'm not you'll have to excuse me for coming up but I have a config parameter that's an easy so number one no we can't unless Chris designs the database as a basically an accounting database where you keep track of the transactions instead of keeping track of the state then no we can't determine if somebody flipped a bit because we don't have the historic state we don't have the historic state so all we can do is give you a white list of all these people are currently opted in and then we can compare it against the files and see if there are files in there from end users who have not opted in as simple as that so we need to do a quick comparison again between one and the other so that's the first item and then the second item that's easy we always err on the side of more privacy for users so if you delete your account delete your data simple that's a very simple answer yeah so just to tell you what is going on right now so if somebody hits the lead account in the front end we just blow everything away I mean no trace of that account on our system yeah so that should make it easy because but we don't have that okay so that makes it super easy to you know this what we're doing is looking at a white list right we have a white list of people for whom we are allowed to keep data if there is any data in the folder that is not associated with that white list we nuke it so in the case that they opted out they wouldn't be on the white list and if in case that they deleted their account they wouldn't be on the white list so in either case we would nuke the data right that's what I did last week was I went through our database rena scripts using some accounts that Ben gave me and I came up with 207 accounts that either have had their accounts deleted or have opted out from the list that Ken gave me so that's definitely done so I go ahead Jeff so yeah next step is to nuke all the data and then the third step is to displace so let me just ask a theoretical question have an opt-in flag set on the site forever so the earliest users are most likely to be the largest contributors to the data set that we have that's clean data which is over a year and a half old are you positive that all of these people had an option to set a flag to opt in when they became a customer or when they created an account because what if there's a bunch of people that were already created before you created an opt-in account and they never bothered with it where does the assumption come from that if we don't have an opt-in they're definitely opted out that's what I'm questioning it comes from the business decision that we're a privacy based company and as Josh said we always assume the most privacy for the users yeah you haven't listed opted in then you're opted out that's our default answer to that question I'm just assuming if somebody opted in they don't care and then they deleted their account now you're going to go consume they don't want you to keep their data I mean I don't care with it I'm just questioning it yes that is the assumption we will always we will always err on the side of more privacy for the users rather than less although you know it does raise the point like we could as they delete their account we could ask if we could retain their data for training purposes yeah that could be in addition to the deleted account workflow I guess I was not looking at it that way in other words if we have a list of people that we think are not opted in why don't we shoot them an email and say are you opted in or not the problem with what you just said Michael is that if we delete an account we have no record of what that account was so we don't know how to tie old, late, or old any data really to an account that answered yes to that question I can certainly go through and delete the data and then when you say disclose Josh that's what I was getting in I'm not sure what we're disclosing to who you know about what we need to disclose to the broader public and then also if we can identify so in this scenario where the person just hit the flag and I don't know how many of those there are we need to contact that person and say hey we know you hit the opt out flag but turns out we didn't do your data so we just did that sorry and then you know number two in the case where they deleted the account we no longer have that information that means that we have to put it out as an announcement to the broader community hey look we fucked up like we kept a bunch of data we weren't supposed to keep it affected you know 180 users we don't know who these users were because of course we don't have their information anymore we're sorry we've nuked the data we didn't access it in the meantime we'll work to do better I mean Chris do you have a list of accounts of people that have opted out it's stateful so there's no way there's no way to do it unless Chris developed a database so that it keeps track of individual changes in the system then the only thing we're gonna know is their current state we don't have historic information about the original state we just know what the state is today no I get that we have a list of people today whose accounts are currently opted out if not well if they're not in the opted in list then yes I think Ken wants a list of files that he can go delete yeah what I'm getting at is somebody opts out something somewhere should change and we should be able to differentiate between users who are currently in the opted out state and who are not and I'm just asking if we have a list of such passwords I think well let me take a stab at it we do not have an opt out flag we have an opt in flag which by default is set to false if the user decides that they want to opt in then we set that to true and that's it and I think Josh was referring to the fact that well actually this is a question Chris we're not using a transaction based system right we're not recording transactions we're just recording state it depends on what we're doing I mean all the metrics are more transaction based but all of the like reference data is all fine in time Ken except for your membership subscriptions we do keep a history of that for billing purposes we do have a list of accounts whose current opt-in flag is set to false certainly we can't so the only thing we can provide you with Ken is a list of the people who are currently opted in if the data comes from somebody who is not on that list then they are opted out and needs to be new alright I'll do it that way that's fine I mean I can do it from either direction you can give me a list of people who have opted out and the assumption is the opposite it can be set of all the other accounts and delete all their records and I'll certainly do that and I'll let you know what we end up with we've had 1.2 million lake word data submissions that we haven't looked at so we lose them and when will we look at those so I don't know I will and before I delete anything I will pass around an email to everybody in this team to let them know how this algorithm will affect our data set and we can take it from there and I could have it ready to turn on and delete those accounts instantaneously at that point excellent I do have at least Gez or Chris take a look at your algorithm and just give it a once over oh sure I'll write the update as the user update we'll put up a blog post and stuff about what we did wrong and why it happened and how we're going to fix it I was just going to say really the most important take away from this conversation for me is what's going to change so we don't have this problem in a year you're going to fix it we need to add some code when we get an opt out flag if it kicks off a process and then there needs to be a periodic audit script that runs just in case that the system was down or something when the person opted out exactly and then a process for leading the data and maybe even notifying users so all I was getting at it should have sound like the the legitimate ticket Michael oh it's already in there and there's comments okay I'm at the dentist so I got a jet but if there's anything else shoot me a note I'm happy to help thanks Jeff can you have that list of accounts it's I gave it to you also in that ticket so yeah the list of accounts you gave me if I'm not mistaken would be the list of accounts of files that I want to delete those are the account IDs that yeah I mean any file related to those account IDs should be go should go away okay cool so let me add some code to get account and figure out what data sets that's affected and put that out and then we can take it from there that list is in the Jira ticket too for anyone else who wants to see it okay thanks is this problem a reason like it seems so any like did we have a break that thing or is it just never existed I don't think this process ever existed the problem is there's no real link between the reference data on the database like which accounts we have and which don't which have opted in and which have not to other data stores like our waycords and such there's no link to that stuff those like that waycord database is a totally different database that Selenia doesn't even know about there's also you know there's also files from our speech to text transcriptions that there's no link to those in Selenia either they're just sitting out there somewhere so some at some point part of this process of being able to do this as it happens is some link between those two things some way to say okay this guy deleted this account what do I need to do to link this to other data that may related to that account I don't think there was ever a process in place to the best of my knowledge from what I've seen in the code I've looked at to delete data associated with WAV files and things like that the policy has been in place for years there was a manual process or something like I'm kind of surprised we've quite actively told people that we delete their data to stay up there I think that process was there on the speech to text side and it probably was manual if I recall I think if somebody said something I think there was a manual like where to go out and delete some things this was never automated and it's probably one of those things that's never made it to the top of the queue I'm sure it's been an outstanding item to do ever and no one's ever actually got it I just think people figured out you know deleting data is easy give me an account ID and if it's in the file name which also may be an issue because some of those file names are hosed I don't know how that happened I haven't gotten to the bottom of that but yeah I just don't think anybody ever bothered to think about it they just figured it was a trivial issue we could drive off that bridge when we come to it and now we're coming to it I think the issue that this highlights for me is that if we ever collect data from a user for any reason we need to consider our policies like our opt-in policy and that kind of thing we may end up with more than one policy versus speech to text and that sort of thing but anytime we start to collect a new data we need to consider the whole business case there and not just collect data but also be able to delete that data and also implement the process that we've described wherein when we release a data set people downstream who are using our data set it gets deleted from their data set as well that's why we have an agreement our data sets are available to other people so copy and use but they're not free you have to sign an agreement to get a copy of our database and in that agreement it says that you will refresh your database from our servers every 30 days and as a consequence if we delete data from our servers it will eventually get deleted from other people's servers that's the process the problem is this like so many things we talk about promises to just explode into this huge issue but really at the end of the day what other data do we collect does this policy apply to metrics and things of this nature as well so if I don't want to be associated I want to delete my account are you to drop me from historical records values and numbers how can we impose the 30 day refresh policy on non-cooperative clients or users who may choose to keep that data and you know what's the responsibility level there I just don't know I don't know what our corporate governance is but I mean certainly if these are going to be issues like this we need to probably enunciate them and document them and back them with some sort of processes to make sure they're implemented well there's lots of issues like this and this came up when I started doing the account metrics is that you know because of our policy privacy policy we need to really be explicit about what is included and what is not because the way it is the way we're operating right now we really have no good way of knowing what our daily user account is because right now we only know if you have activity if you opted in so we know what how many of our opted in people have daily activity but since that's only 15% of our total user base it's really a very poor number for us to be able to tell you know who's using the system so you know where so one of the questions I asked when we were doing this account metrics stuff was you know I think we've talked about this a little bit is maybe there's different levels of opt in versus right now there's one opt in opt in or opt out that's it that basically covers having different agreements for different things that you know and I think even the privacy policy says there's certain device information that we collect regardless of whether or not you're opted in so I think it's just we need to go through all this stuff and say you know what data that we have or collect or whatever is you know is part of this privacy policy what isn't and you know what we really mean by the privacy policy I don't think anyone's really ever gone through the exercise. I agree 100% Chris that's kind of where I was coming from was you know I don't want to make it a big deal where none exist or as we say I'm out and out of a molehill but the reality is I think we need to in as much as we're looking at our security audit we need to look at our privacy policy audit we need to figure out what our corporate requirements are and make sure we have processes in place that we're living up to our words I mean what we're doing is we're trying to do a better job than Google and Amazon at being transparent and that's obvious right I mean good luck you know opting out of Gmail and telling them you don't want them to use your information any longer I mean it's not going to happen so we're definitely going above and beyond the call of duty here we just need to make sure we know what policies we're committed to corporate-wide and that we have processes in place carry them out that's all I was getting at and I think it's probably the stuff of a meeting probably at some point in time we should have an online meeting go through it come away you know walk away with some to-do items and follow through but you know I think we probably need to spend some time on it. I agree and if you'd like to look at this some more you know if you're curious about it I'm fairly certain there's a number of tickets related to this in the system because this is my very first question because I don't know where else it applies I mean this is what we hit in wake words but there's probably like Chris pointed out two or three other areas that are tangential yeah I agree so okay I'll go back to we have limited staff and I mean some of this stuff was probably on our old ticketing system too but you know at some point somebody got prioritized over it and it didn't get done so and it's going to be true for a lot of the things that we know need to be done you know that's a carry list for example you know that's a lot of work you know where does that fall in with the other stuff and there's two and a half developers on the microstats so we have the best intentions and we want to fulfill those intentions but we can only do so much faster yeah but Chris we may only be two or three developers but we are Superman agreed so the one last issue on here that we haven't talked about for follow up is the wordpress droplet issue what is is that something that we all need to talk about or is that something that you need to talk about with anybody else in particular or is it just like something that's going to get some ideas together last time and I was just hoping he remembered where this he put together was a you I think it was guess that put together like an image that we or maybe guess and Derek did it an image that we put up and said that site's not available and last time we did this we had that image up and then we read it all the wordpress stuff and then took the image away I just can't remember for a life of me where that image reside and where we serve it from so that I can put it up while I resize the droplet yeah maybe I should just do a sync of our test site and then we can point to that temporarily and then switch back would that work yeah it would be a short term solution yeah as I could just change the actually no it's really dangerous because database does mean you have purchases and things maybe we just spin off a droplet throw a bunch of stuff and then we can go back to the next Romero and hope to that's the thing I thought we had already right and I'm just I can't find it you know we've done this before you know it should be and then we have a droplet called maintenance page something all right well you guys how about you guys you guys can carry on I will I think what we're done here and I don't want to know just to be clear for my efforts this week we're still just taking away on the bug fix spray right beside from this criteria thing I just volunteered for right yeah and I think at our next thing maybe that's a priority but yeah for the time being we're bug fixing away okay should I schedule a meeting we go over this criteria and we're going to have to eventually actually talk about this stuff I agree doing like a security audit and popping around so you know have a stab at it place the ticket and sign and pick up one yeah that's a good idea so Chris will make the first draft and then pass it off to the next dev you guys okay guys have a good week see you guys bye thanks for listening all the way to the end