 Yeah, hi. I'm Alex. Last year I've been doing a fellowship with Mozilla, but I'm a neuroscientist. That's my background And I just wanted to talk a bit today about What's next after juice in a books the spoiler is that it's more juice in a books? But first I just just kind of give you an idea of the the motivation of why I've started working on this stuff Is we're gonna do a little create our own adventure So I don't have anyone did those books as a kid where you get to choose the end of your story You get to use the path or on Netflix that was banned us naturally where you could do that as a show And we're gonna do one quickly now, and then I promise you it's relevant So it's a sunny day in Portland and you just have time and you're walking to give a presentation You're super excited about but the thing you're walking on is tiled So I don't know about you, but I sometimes just try and step in the center of the tiles rather than on the edges Maybe that's just me But so we're gonna have a vote do we think we should just ignore the tiles and walk maybe like a normal person? Or should we adjust our stride so that we don't step on the edges? So who thinks we should walk normally a and be who thinks this is just I'm stoked. This is a good audience So unfortunately, we didn't notice some dude looking at his phone And he's just felt a coffee on a shirt and we're on our way to give a talk And there's no way we can give a talk looking like this. So we've got two choices now We can either have a look in the shot around the corner But it's a second-hand shop and they might not have our size or we can sprint all the way back home We should be able to make it there and back in time for the talk So who thinks a we should go to the shop on the corner And be who thinks we should run back home What do we think Let's I think I think we're gonna go with a So we walk pick pick up the pace and walk quickly to hats and cats Which is your favorite closure luckily, it's open and you have a quick look around and they have the perfect shirt And it's gonna go ace with your jacket And so that's the story I I could tell you that that was the most boring ending you could have possibly chosen You could have either adopted a cat you could have met Michelle Obama But unfortunately we've got a nice shirt anyway And so what what what I feel how I feel Let's skip Is that doing data analysis is often a bit more like this choose your own adventure thing than writing an essay But the way we present it is is if it's just a nice story and sometimes that's good It's nice to have a good strong narrative to be able to explain what you're doing to people But sometimes we want to share the whole process. We want to share these forks these decisions we've made And we want to be able to do that while we're developing We want to be able to take different directions and and be able to see from a high level what we're doing so what I'm gonna do is Describe a few pain points. I guess that we've observed from watching people use notebooks and have a look at some potential new concepts for interfaces that build off the Jupyter notebook stack and But allow us to have a little bit more complexity in these flows So the first set of observations I guess if anyone says they've worked with just notebooks and they don't have a folder that looks like this then I absolutely don't believe you The second one is people copy and pasting cells because you might want to make a little change to one cell But then you want to compare how the output of the first one looks to the second one and then you scroll up and down We're seeing bunch of people going through entire cells and commenting out each line Just because they're not sure whether they're gonna delete that or not And these kind of I'm not sure if I'll need it later things Tend to be the use case that people will say, okay, you should be using version control and that's great But often these small changes are not atomic things that you might want to commit as a single thing in version control And also you want to be really explicitly comparing what you had before to what change you had so We design an interface that kind of embraces this copy one copy two Thing but it doesn't in a little bit more of a structured manner. So Kind of on the on the basics. This is kind of like a normal notebook. Maybe with some cell templating The only thing to notice that's slightly different is that we Enforce that people have some text to describe what each cell is a tax-treat cell And then we can play it and the only other the other thing is that we pin the outputs to the bottom here Sorry, this is a bit unclear on the screen, but I think I think you'll get the idea So the key thing here is that we can then make a fork of this notebook But have it appear visually and so that we can see exactly what changes we're gonna make So what we've done there is just made a fork from that first cell and then we're gonna make a change in that cell And we can immediately see what the change in that cell there which is highlighted has had what impact that's has had on the output and we can do the same thing again, we could make another fork from that first cell and Maybe use a different style and we can compare them all we can collapse different things to see Compare back to the first one Don't know who's using this so slowly And we can also fork off any other cell as well not just the first one So maybe we want to change the data instead To make a fork here and then if we decided that this in fact this last one is exactly what we want We can make it or we can collapse the other ones and make this one the main branch in the end So this is concept number one So in terms of the implementation, this is just a collection of notebooks There's everything else is just UI on the top and but we can discuss that more in a bit if you want So the second set of observations is as hidden state bugs. So this is when people make a change to a variable in a notebook somewhere and or in any code And you don't realize that you've changed it somewhere else And you might be executing cells out of order and then you end up with a bug because of that We also see people have lots of shared cells between notebooks or they're scrolling up and down And that's because you want to explore multiple different avenues for analyzing one data set But those different avenues might share a lot of code But you don't necessarily want them to share all the state So we're trying to design an interface that allows us to like explicitly share some some stuff But not share the state between different parts So this is one example of what it might look like I can't pause it And So maybe we've we've loaded some data in here and then we're going to plot it down there because that's the first thing you do When you should load data is have a look at it But there's no reason that when I normalize this time series here There's absolutely no reason that any change in state here should be affecting anything I do here because they're not part of the same. They're not part of the same flow and so Here we filter the data and we plot it again and We can do some more things we can compare the signals Down here. I'm up here. We compute the energy of the signals, but again, there's no reason that this Anything that happens here should be affecting this plot. I made at the beginning So this idea that we can separate the state out in the user interface And also one of the nice things about having forcing people to have the text above each cell and not as a separate cell Is that we can do things like this we can collapse the whole thing and see That's exactly the right reaction And yeah, see the the whole flow there So the implementation of something like this there are a few different ways it That we're exploring that it could work. The first is that each Separate flow will have a different kernel and that's the easiest to do but causes some Certainly not the most performant way of doing it One of the ways is in object oriented languages or we can just serialize all of the objects that are in a particular cell And then we can do lookups when I whenever you execute each cell, which means you only have to execute your cell once Or we can do something that will work with all languages Which is capture the whole state of the process in memory at that time and serialize that to disk How much time Okay We'll just go through this last one So if any of you were in talk before we know that one of the I guess one of the things a lot of scientists struggle with is Having really complex processing pipelines and then having to change them later. So One of the things I've seen in almost every single lab I've been into is that people it's really difficult to onboard a new person to the lab with the whole pipeline that they have And you have a lot of debugging or reporting stages embedded in with the actual pipeline and it's really Difficult to just get an idea of what's going on without spending a lot of time diving heavily into some complex code so One of the things we've been working on is Whether we can use whether we can have a common graphical interface for a whole bunch of different pipelining languages. So Here we might Generate a shell script to the beginning you can set it up on the right and Then we can basically do anything with this So this is a bit like how to draw an owl and then we just press the button and everything comes up whoop There you go But the cool thing about this model is that the idea is that we're just building an interface So like Jupyter notebooks, we're not building the kernels. We're just building an interface And then we can have separate pipelining kernels in the background. So a pipelining kernel could just be Could just be something like make so Or it could be a common workflow language or it could be Apache Kafka It could be any of these amazing tools that people have already built But that are really hard to get a handle on of like exactly where the data is moving in those different tools Yeah, and that's it from me. Thanks All right, thank you Okay, we have about 10 minutes for questions so lots of question time Curious if you thought about how this this interesting work in the interface and work and stuff That's particularly for the the forking of notebooks might play with How that might play out in a different notebook environment Such as observable notebooks which are this reactive where you don't have the state problem like with the ordering of Evaluation problem and how that might Yeah, so I guess if anything that's like almost better to have it in that in in that environment so when I Started working on this I was kind of looking at observable Jupyter And I died as the kind of three data science in the browser things and All the work that we're gonna do is gonna be just UI so it could be put anywhere and so that would be the dream starting off with Jupiter just because it's got the most users and That's the way I think have the biggest impact But yeah having where something where you don't run the code Where where it's reactive like observable is like the dream use case for something like this But not everyone likes to do everything in JavaScript So Yeah, so at the moment I have a gear have repository called graphical notebooks the three different things at the moment have the the name I Hard to remember the name of the first one the first one is up for grabs if anyone wants to name it The second one is Jupiter canvas and the third one is Jupiter flow. I am currently writing a job post for a developer to prototype two of them and Yeah, so within the next few months will have prototypes and they'll be definitely links to it on the graphical notebooks It's really important station as you talk about this work are you finding that there's an awareness and appreciation for this distinction Can you can you clarify exactly the kind of Yeah And that's Yeah Yeah, so I think that's It's like there's a really interesting conflict because I think as humans we find it much easier to understand something as a story And so it makes sense to present things in that way to people But you're right. It isn't the process that actually underlies it I think there's kind of two sides with a like hypothesis driven stuff So you could still have hypothesis like very strong hypotheses But there would still be branches and you want to do check this and this and this there's very rarely like just one Plot that's gonna tell you the answer to something So I think I think yeah, it's always difficult to know How much of the complexity you should share or how much we should just be telling people This stuff nearly always starts out crazy and then you have to trim it down I Visualization system I Yeah, so I mean definitely everything that we're going for here is piggybacking off the notebook structure and So I was speaking to the team last week and I think like one of the biggest things that they've done with notebooks is have a Data format that's been around for now five years that's barely changed and it's really well structured And you can do basically anything with that And what people have done so far is just had cells in order But what we're now trying to do is so how can we organize this exactly the same data format but in different ways To fit different models of data exploration It's really useful. I wonder I have you Have you worried that by adding this new tool that could help you could actually inadvertently make things work Yeah, for sure The the kind of idea is that instead of we probably end up having exactly the same mess But a slightly more structured mess and for me, that's better But what certainly what we're planning on doing is as soon as we've got a like a reasonable prototype going is Giving this to a whole bunch of people and seeing what happens And we're totally happy to pivot if that if that is exactly what happens I see that you have like say four different notebooks that might be referring to like this choose your own adventure What are you doing under the hood that links them all together or is there like a schema that you're following that keeps it Yeah, so So that in the forking one or in the graphing in the graph one Is Okay, so the forking one The link is is just that they would probably literally just be notebooks in a folder with a naming schema So in terms of the deafing it would just we just do deafing using ambideum And so there's no explicit link that's not just enforced by the interface For the graphing one where we have we have splits and things like that It would it's all still fitting in within the notebook schema and just going to be metadata Where each cell will point to its next cell and it doesn't just go to the next cell automatically And so the idea is to do as little as possible so that all of this can still be used in a regular notebook browser So where do we go in Portland to meet Michelle? No, you have to turn up early for your talk and then she ends up being there waiting for you We have to have one more. I just more have a comment on why I like this I manage like teams of undergrads and I think a lot of things at vids at the Frickin studio designs We talk about managing teams and one thing that keeps coming up is the like sanity checks And those sanity checks when you're working alone are very invisible Um, but when I get their notebooks, I don't know what sanity checks they do So I redundantly do the sanity checks myself, which is a lot of times just like change the variable of this level So I love that that would be transparent. So one feature I would love for you to add Is a way to like mark those branches would be like reviewers check the sanity check or have that like sanity check like tag or some sort of tagging system for the branches That's the only sort of comment that's allowed in the talk Okay, that's a good one You might have answered this and I didn't quite understand what you're saying, but you're talking about observable Well, it's like we like we know like the end of the notebooks like the non linear flow of managing state And these things is like a challenge already and this seems to just exponentiate that problem, I'm just curious if like If there's other things you can add to this to stop that Like so to help the novice user know that they're actually manipulating their state in a non linear fashion Yeah, so there's the there's the out of order bit which um, I think so one of the one of the I guess one of the solutions I pointed out for having these these branches Is caching individual cells. So when we have something like this, this is actually a directed graph And so if we're once we caching what Each cell does let's say let's we know how we do that. We've got some cool ideas about how to do that um, actually if we Know that we're changing something here and none of this has changed we can rerun that without any penalty So we can actually have this like observable like reactive environment, but still using Jupiter Still using jupiter and python So that's that will be one of the cool Directions to go is like treat the whole thing as a directed graph and cache the state of each cell Or each flow each black combination of cells It would still be this it would just be a different kernel it would be that's Yeah, and yeah, yeah So So, um depends so they're two different so that if you're doing the the So if you would do the hash of each cell in terms of Oops disk space is the least sustainable So you're using the most space when you serialize the state of each cell In terms of computationally It's the most efficient because you never rerun anything that you don't need to rerun If you were to do multiple kernels, it's the opposite You don't save anything to disk it's more disk efficient But you have four kernels running at the same time, which if you're running a big jupiter finder have instances It is a big issue. Um, so those are the those are the trade-offs. Um, we're trying to look at which one works best