 My talk today is called let them write code and What I'll be sharing with you is one not so weird trick for building awesome developer tools Which is kind of what we do and I guess which is also something that many of you do even if you don't think of yourself as writing developer tools I would say we're all here because we either writing code or we're somehow involved in some process that's about writing code and Why do we write code? Well, we mostly write code so other people can use it. Maybe it's other people on your team Maybe the code is gonna go into a product W's by your customers or maybe you actually just writing code for Future you who's gonna come back to that function you wrote in a few months and hope that you've documented it properly, so you still know how to use it and Well, I also write code that's used by other people Some of you might know me from my work on spacey Spacey is an open source library for natural language processing in Python So basically if you have lots of text and you want to find out more about that text that goes a bit beyond just searching for keywords well, then you can use spacey and It's it's always a bit hard to estimate usage for open source projects But we we think that we have at least about 100,000 developers using spacey And we also have a growing ecosystem of plugins and extensions That you can use together with spacey to really make the tool more powerful and extend it with functionality that you want and The other piece of software that I'm writing is called podigy Which is an annotation tool for creating training data for machine learning models So basically, you know if you're training a machine learning model You want to always usually want to show it at least some examples that kind of encode the behavior that you wanted to learn and for that You typically need to label some examples try out a few things and run some experiments and prodigy makes that easier And we currently have over 2,500 users of the tool and Prodigy is kind of a hybrid Developer tool in that sense because it gives you both a modern web app That lets you move through the examples quickly and try lots of things And it also gets gives you a scriptable back end so you can you know really program custom workflows in python and set up You know your your workflow and your annotation projects exactly the way you want to We do all of this as part of our company explosion We really we're a developer tools company, so we specialize in building tools for other software developers and This year we've been able to grow our team and we're currently six developers working on you know different parts of our stack and When I talk to people I often this is one of the questions I get a lot Which is like wow you're such a small team We used to be only two people only me and my co-founder Matt and how do you get so much done? How do you build all these tools and like how do you build all these things that are quite actually quite useful? How do you do it and of course they're like different? Sites to it, but I would say One big part of it is that it's really when you're building software. It's a lot. It's not just about building lots of stuff It's about what you build and like how you choose to build it and in our case by making out tools Programmable and extensible We're able to effectively, you know get more done write less code while at the same time also making the people using our software a lot happier and the thing is well as developers write code and Developer tools don't have to replace a developer developer tools are supposed to help a developer Do their work better and be more productive and ultimately a developer is always going to develop but tools can help them and In a way actually the worst developer experience are these tools that try to be a fully integrated solution and kind of try to be Everything so maybe you've experienced this at work where you had to use some tool and it you know it really tried to give you One way to do everything and you couldn't customize it and when you needed a new feature You had to email support and then wait two weeks So they added a way to load from JSON files instead of CSV even though you're like God if I just had You know if I could write five lines of Python, I could have done it myself But no wasn't possible And that's usually a type of experience. That's like not very useful if you're building software like yeah You can write code so you often writing code usually makes you happy and Yeah, what you can see is that this sort of this type of software isn't only better for the developers It's also actually Significantly cheaper and easier to build so it's a win-win for both sides. We get to You know we get to build software in a way That's very efficient even with a small team and our developers get tools that are actually very useful to them and so When you see this you might say well, but if it's a tool for developers and like all developer tools Extensible because you know you write code with them if you could write code you can extend them, right? but anything is Yes in some way yes, but also No, not all libraries provide you with all the Composable primitives you need you can still use a library and interact with code and still kind of have the same problems with extensibility and Efficiency so I think it's maybe maybe it's best to show this using a small example So here's a bit of code imagine you've written a library that can take a piece of text and predict The part of speech tags in a text so basically what's a verb? What's a noun and can give that back to you and that's like you know super useful to find out more about your text So you know you pass it a text and what you get back other verbs are going swimming should and go That's not pretty nice But also looking at this you getting these verbs like are and should which are also called auxiliary verbs That don't really have you know don't really express any action, so they're not very useful So you might want to say okay at least I want a setting that lets people exclude these So I only have the action verbs that you know really tell me what's going on here So that's a bit better. So we have go swimming and go. That's nice or going swimming and go, huh? so It's kind of the same verb twice both going and go have the base form goal So it's kind of the same thing here. So you might as well add a little flag that you can set to true That results over words back to their base form So it's a bit more useful and now we have go swim go That's pretty good, but you have the same word twice So you might as well have a little flag that lets you if you need it Exclude the duplicates. So, you know, you have a useful function that does what you want to do, but Let's just take a step back for a second Look at your function again. Is this really is this really the way you want to do it in your library? We already have three different keyword arguments with like pre cryptic names that are also sometimes a bit hard to spell I never remember how to spell auxiliaries And by committing to this sort of API you're also committing to adding all other methods You're committing to adding something for all nouns You're committing to adding all kinds of other keyword arguments for anything else that you might want to do so Maybe that's not You know the most efficient way you're gonna do this and also you're gonna end up with super inconsistent APIs You have to remember which method takes which arguments and that's usually pretty annoying. So Here's an example of how We do this in spacey. So we you know, you get you process a text You get this doc object back and that doc object holds all the annotations so all you have to know is how do I access a lemma how do I access a part of speech tag and You can write a simple list comprehension that just gets you all the verbs that you want and Okay, if we want to filter duplicates, well, that's luckily something we already have in the programming language We use because that's called a set. So all you're doing is you call a set around it and What you get is a list of all verbs and if you need a list of all nouns Well, you change the part of speech tag here and you get the same thing You can customize it and you don't have to remember a bunch of keyword arguments But of course, you know, you might say well, but that does mean I have to write a lot of the same code And if you know, if I write my software that way and do it like I do quite keep having to repeat the same code So, you know things like that. Why can't the library just do things like that? Well, the set of things like that is Probably bigger than you think and it also keeps growing and keeps changing So you will have to think of all of that and that's going to be pretty difficult because here's a quick example. So Another thing that might be super common in your library is that you need to load data You always need to load text and data from somewhere. So you built this load data function You okay, the most common things are Jason CSV. You might as well do plain text. So you're writing the loaders for that pretty easy Then okay, many users also store text in databases So you might as well add the most common Databases that people use But so as people start using your tool and it becomes more popular Inevitably, you're gonna get some questions like oh, does it support MongoDB? MongoDB is quite popular You know, I can only use it if it supports Mongo. So cool You go add a loader for Mongo now your users happy and they can use MongoDB but of course You know the space is constantly changing and maybe in a month's time You have a user asking you oh, there's this really new framework on the block. It's called Unicorn DB It's a whole new paradigm for Using databases and it's like completely different and it's what all the cool kids use now. Could you please integrate Unicorn DB? So, you know you start Reading the docs for Unicorn DB start like thinking about the best practices because as with any new framework people, you know use Software that promises a new paradigm. They're often also very very opinionated about How you do things and I feel like also people are into databases also usually quite opinionated about how they want their stuff done So, you know, you don't want to disappoint them and while you're still reading the docs here. There's a new Yet another new version. There's do your corn DB now. So you also have to support that and Yeah, so if we take if we take another step back here, you know Well, you know, you're reading the docs for Unicorn DB and do your corn DB and you get this thinking feeling that like sooner or Later, there'll be try corn DB that you also have to support and you're wasting a lot of time trying to you know, write things for all These integrations that like you users want and probably afterwards you get a few angry issues on GitHub by Dior corn users complaining that you're not following the best practices and they can't use their stuff and that your software sucks So if instead of doing all of that We did something like this that would actually You know make the developers much easier and solve this problem In a much more straightforward way because instead of your library taking care of all of this loading You can just get let people pass in a function And so all the very opinionated unicorn DB users can just write their own function that loads their data and yields Data in your format that your application can consume So it's really a win-win and this is not this is not the lazy way Like it might feel like oh, I'm kind of cheating here because I'm letting the user do all the work But like from the user's perspective, they're like great. I don't have to trust some person With their implementation. I can just do it. I know how it works I'll plug that in and immediately it works for any arbitrary thing. I might be using and If you kind of if you think outside of the framework It's really you know developers if developers can help themselves They're much happier than if they have to file a support request like if it you know if you can just write a few lines of Python To get something done that you want that's much easier than you know having a framework that you use support everything and That also means that the question about what does it support? What can it do kind of shifts from the idea of does your tool integrate with something towards? Does bill can you do whatever you want to do in Python? And that's also often for example for our tools like Prodigy what we tell people if they're like, oh, does it support? Insert some technology here. We're like, well, can you do whatever you want to do in Python? And then people are like, yeah, of course I can I do that all the time and we're like well then most likely you'll be able to use it because you can just write a function that does it and plug it in and Put maybe put in a slightly different way Mostly, you know when people think about like Developing new features and building new things what people are mostly worried about is reinventing the wheel Which is you know a valid concern You don't want to you know keep reinventing a whole thing that already exists and make it harder and like less maintainable But I think what we really you know, what's much worse and much a much much bigger problem is Reinventing the road you're reinventing the way everything is done and everyone else has to adapt to how you're doing it and Every question becomes about well, can I use whatever I want to use on your road? And you know you're defining the constraints and that's that's really I think You know one of the much bigger concerns that you know, you should keep in mind when developing software And maybe you know this might still sound a bit abstract And so I've prepared a few small examples that show how really you know You quite can quite easily program behaviors in the code that you write in a way that lets other developers Program your tool and that makes them happy and lets them do what they need and the most obvious one here is of course callback functions I feel like all that functions They kind of they kind of got a bit of a bad reputation because it's it's kind of easy to overdo it and make it way too complex and then you end up in callback hell and You kind of you know, you don't know what's going on anymore But I think if it's done well, and if it's simple it can be a very very effective way of letting a user do custom stuff within The software you're writing so for instance here. We have a callback We have a function that takes a callback that's called on update and it receives one argument Which is some status and lets the user execute any arbitrary code that they might want to execute when that happens So that's very straightforward another example Is what we call function registries. That's actually something we've started using all across our libraries Which is and basically the idea is that you know, you can use for example a decorator That's provided by a library that you use and you can register Functions like in this case a custom loader Assign it a name and the library knows what to do with it and if you want to load from Unicorn that's all you know, you have to tell the application you register your loader and It can just run whatever code and you don't need to you know, you don't need to monkey pot patch anything or You know even submit a pull request to hope that someone integrates your stuff It's just can all happen under the hood and it's also very clean and very independent So if you you can write a unit test for that function in your code base That's pretty much completely independent of the library you use and make sure it works and The next one that's actually something I've actually never used this in any of our Tools, but I'm kind of keen to give it a try and it's also something that's Yeah, it's it's I think it's kind of little known a little known concept even though it's been around since Python 3.4 And the idea is that often you have functions that take different You know different input types. It's like yeah, you know an argument can be an indoor string and we've I think normally we've maybe learned that yeah, this is not good and it shouldn't but like sometimes you can't avoid it and Usually what you would then do in your function is you would write some type checks Which again also we all know is kind of difficult and annoying and like also something you shouldn't be doing So the idea here is that we actually you get to define different variations for different types, so In the context of extending an application you could use this to register your own custom types So of a loader you want to take let's say you're working with pandas which works with data frames And if the tool you're using doesn't yet support that you could just register a variation for that input type Make it load a data frame and then even maybe you want to submit a pull request and like push that upstream Maybe you don't But it's basically or you know your unicorn db object. It's just an easy way For the user to extend by writing their own code without having to kind of hack into the library and The next one that's actually something we use a lot and I think it's like it's it's a really great way to make tools extensible and I don't know if you've worked with entry points before but the idea is that entry points let one python package Advertise a function to another package that's Installed in the same environment. So for example here. We have the entry point spacey factories And at the end we did the entry point group and we defined a component and a function that's in our package and then spacey Well when it's basically loads spacey will check what entry points are advertised by other packages And if they exist they're going to be loaded And that's all gonna happen automatically if you have these packages installed So you basically you don't have to call into spacey you don't have to you know You don't have to register anything it just works And it's also a great way for one library to into easily interoperate with a lot of the library And probably maybe one You might have heard of it by the console scripts Which is typically how you you know register command line scripts, but you can also use that for like completely custom functionality and I think it's pretty cool And finally one thing that we see a lot which I think introduces a lot of unwanted complexity is If your library or your code takes completely takes over the Ios or loading the data for example takes a file path and Opens the file and does everything Automatically because that means you know if you don't want to load from a file You have to create a temporary file you have to write to disk and all of that is pretty annoying So instead if you can make it either take an open file Or if you can actually make it take the config you can say hey look Loading from a JSON file loading from a YAML file all of that is a solved problem Let's let the user open a file if they don't know how to do it They can probably find an answer online very easily and your library doesn't have to deal with all of that that can potentially you know cause problems and To give you you know something an example of this in action. Here's the recipe script from our library prodigy again And we also one thing we do it here is well We have an we have a decorator that registers a function. So by just adding prodigy recipe We can tell prodigy. Hey, this is a recipe called my recipe and it be available on the command line And you can also use entry points for that So another thing we do is we let the users load their files if they need to load from CSV Maybe they need to load from unicorn DB. We don't care We can provide some helpers for this but the user the user can load data however they want Ultimately all we want is an iterable of dicts. It can be a list. It can be a generator Prodigy doesn't care You just now you just need to provide it somehow and how you create how you provide it is up to you If it's a generator function, you can use a model in the loop. You can respond to external states And as long as you yield Dictionaries in the same format you can use them and set it up how you want and we also Give you there's also a callback function example here So this function is called whenever we receive new answers or new annotations from the server And that's can execute anything from logging state to updating a model And we just pass that in and if it's there and prodigy can use it Aspects of the tool that I think has really made it easier to develop these things and support a variety of use cases And I also think that if you're building tools that other people use this can really make it easier for you to get more done focus on doing the actual development work and Make the users happier but of course There's always you know, you usually if you're developing software. You're usually not The only person making the decisions and even if you as a developer think hey, that's how we should do it Let's spend less time on trying to build every little feature and like build something that our users can extend Program and work with there so it might be other people you have to convince and you might you know get some pushback So how do you really convince your company your team whoever else other stakeholders that like you know You should we should be approaching developer tools that way So one thing one thing you might hear like someone might say is like well Okay, so this all like all this code and stuff and all these like writing things It's not it's not that much code But like it still looks really complicated and if a system is just easier and we just have a few buttons That's just so much easier to demo if you you know want to demo it internally to your team Maybe to a customer. Maybe you're selling software. So, you know, why can't we just make it easier? So it's easier to demo and the thing is if the audience of the tool is developers and you can you know get the developers involved That's gonna really gonna be a win-win situation for both sides and we we also see this a lot with Prodigy at like we often You know if we do demos or show show the tool to developers. They're like, oh great It's finally, you know, I can see how I can program with it and I can see how I can use it And that's much more valuable than like showing something. That's really really simple to management two things Oh, yeah, like our developers can click two buttons and then they're done. Well, it's never that easy So we've already we've heard from any developers who are like, oh This is who ended up going with Prodigy and this is so much nicer than a lot of other solutions They were demoed that really focused way too much on the simplicity of usage and Yeah, from a developer perspective just look like well, can you really you know, how do I extend them? Do I always have to file support requests? Sounds difficult So if you and if you're selling Software the other thing is well, you know, you want to you want to win customers and you also want to want the customers to be happy and your customers always going to ask for features your customers want stuff and The best way to make your customers happy is to give them the stuff they want, right? So you built if you build everything that your customers asked for they're going to be happy but Once you go down that path and say, okay, I'm just gonna you want a feature I'm going to build that feature you're really selling an all-or-nothing Approach and you really and if the user says well, I don't want all I just want like parts of it I already have a solution for this. I already have a solution for that they kind of have to go for nothing and Yeah, you can tell them. Hey look that one part That's actually not so important like you don't have to use that but that sounds Pretty bad as well and to the user this what this communicates is well I'm kind of locked in here and I have to go I have to take the whole thing and it's really it really wants me to do everything on that platform And that's actually often for in the developer space not that desirable and Finally well another thing you want if you are building software is that you want your tool to be easy To learn you don't want people to have to spend forever to try to understand what you're doing So if you we have all this other, you know, Python staff and if the user has to think about like oh What's a decorator and like we'll have to you know Return a list here. How do I do that again? Doesn't doesn't this actually make it harder to learn and shouldn't you like you know Want to avoid writing your tools that way? Well a thing is Background knowledge is actually background knowledge is not the problem. The problem. What's hard to learn is your tool is the tool specific stuff Background knowledge like the programming language. We're already working in that's actually pretty easy to learn And it also generalizes well across other tasks And there are lots of great resources. So, you know, I'm sure for example In in you know in the audience here I'm sure they're like some some of you have never worked with Decorators before because it was just never relevant to you And I'm also sure there are lots of you who have never worked with any of our software before you space your prodigy But totally know what a decorator is and could probably explain the other group of people what this is and how you use it Same with a generator function or you can type that into Google and you find lots of threads on stack overflow that explain you that concept and it's completely independent of the tool and from the developer perspective I It the burden of documentation is not on me I don't have to explain to you how to jet how to create an object that my library needs I can tell you hey, it's this it's a function that returns this and However, you put it together. It's up to you if you don't understand the concept Google it ask someone And you'll be able to find to learn about it without me having to give you every single piece and if I don't you're struggling and So if we're talking about you know programming, you know, like one one thing people discuss a lot which I think is very valid is Making technology accessible and inclusive and also You know democratizing technology so that like everyone can use it and it's not just like reserved for like a group of people with like Certain skills. So, you know, if I'm actually telling you here like you look people should write more code and let you use this write code make you use this write code Isn't that exclusive and doesn't that like, you know, exclude people who can't program what you know, what are we doing here? And the thing is well Even, you know people even people who can't Program can still benefit from an ecosystem of tools if they're programmable And like if you're talking about all people who can't program There's always this lots of people see this as a that's kind of a hierarchy of our You know, some people can program and they can use all these complex tools And then there are other people who can't and they don't deserve like powerful tools and of course That's not true often people who you know want to use these very advanced developer tools and maybe haven't learned to program yet because whatever they're often Also very experienced professionals in another field like medical for example or legal Digital humanities all these fields you have professionals that want to work with highly specialized tools and just don't happen to know Python and They might they might still be people in their groups and people in their fields who can program and in that sense Your at all you write can still be very useful for these people in a way that you could have never anticipated like you can't know every single use case and every single thing people are trying to do with your software and Ultimately, you know, it also comes down to you know, you realizing that like you're not gonna think of everything and to have the humility to accept that and Instead of trying to provide one tool to rule them all Allow people to extend it And accept that like you know, you can either ship them an incomplete version or not and You know, and there's no for people, you know, if you want to if you if people are using your tools just Giving people an interface that they can click on does not solve any of the problems because by just putting a button on top of it of Something that's fundamentally a tool to program You're just giving people an abstraction and every abstraction leaks an abstraction is always going to be worse than In that in that sense this type of abstraction is always going to be worse than like the actual thing and it's sort of you know You're not helping someone who can't program by giving them a leaking a leaky abstraction over something that's fundamentally programmable and Let's maybe give you, you know a simple example that actually might even seem look a bit appealing at first So here we have a machine learning model builder So imagine, you know, you're training models and you know, normally you do that in code But you want to make it a bit easier for people to use so you've built You know this little UI people can select an embedding layer an encoding layer people can upload their data Tune some hyper parameters and then they can click a button and they can train a model. I mean looks useful, right? but You know if we kind of if we take a step back again here Ultimately, you know, what we've built here does not actually Solve any problem on any level if you're a developer and you're writing the code This is not going to help you because as soon as you want to add a print statement or log something or change something You can't do that. You need to add another drop down You are not need to add another field and if you you know, if you could just write one line of code instead That would be much more efficient than working with a tool like that And on the other hand, if you know if you want people who are maybe not machine learning developers To be able to train a machine learning model. Well, that's I know that's a nice idea and that's just thing but giving them some giving them kind of this simple interface Which is just like an abstraction over your code. That's kind of that's almost an insult to any professional working with You know these tools because there's a lot more that you need to do in order to really make things useful And I would say that maybe you know with the technology We might not quite be quite there yet to make it You know useful enough to offer this kind of abstraction So, you know a business professional shouldn't have to care about your encoding layer A business professional cares about like very different things So this interface would this interface and this this kind of approach to things doesn't really solve Any problem and it's just strictly worse than letting people program and building systems that way and So, you know when we think about making technology accessible making Things accessible to people who aren't like you is not just trying to think of everything they might want and then giving it to them Again, that's what I mean. It's kind of you're never going to think of everything There's no way you can know what you know a specialist in a different field might want and To dividing also to dividing people up into coders and non coders really isn't that helpful There are many professional tools for example, just think of Excel people use Excel people write formulas in Excel And yeah, maybe that's not coding, but it's still something people want to use and people use that So those sorts of behaviors to get their job done people want complex Systems and people want complex behaviors and even if at some point it means yeah They have to write one or two lines of Python that's still more useful than having to rely on you to give them a checkbox They can click on so And if we look at this From you know the perspective of okay, what what does this mean for? software and what like lessons can we learn Here is if we look at open source software It's pretty interesting that like open source tools have come have again and again crushed closed source software in many many fields and many many domains and That's even those open source has like a pretty just pretty significant disadvantage over Close source proprietary software, which is often open source tools have a lot less money Sometimes you know they have company like in our case. We have a company behind it. Sometimes people take donations many open source tools are very community driven and Can actually be quite unorganized sometimes it's you know some guy developing it in his free time and still companies are using it and Why is that well? People often talk about oh, it's because it's free and companies you know companies just like using free stuff But I think that's not that's not true I mean it could be you know it can be a motivation and it makes the back entry barrier easier, but Companies can totally pay for stuff But people like the fact that open source tools are programmable and extensible that's something people really value about software in general And that's also that's I think a big part that contributes to the success of open source And this is also something we can learn from the success of open source software And I'm not saying all you need to do is like make your software open source It's completely fine to make money. It's fine to build close-source systems But if you want to take one lesson away from this when you are next time You know you're sitting down in building a tool building a piece of software that other people are going to use Learn this lesson make it programmable make it extensible try to focus less on making things easy and instead Just let your users write code Thank you So I just question I'm here Yeah, so in your entire talk you were talking about how to make it easier for developers to add features on the go What's your take on high-level rappers that go on top of your API's in this kind of an environment? So I didn't I didn't get the last part of the question What's your take on high-level rappers that go around API's for example Keras stuff like that How do you think high-level rappers Plane to making your API is easier to use Okay, so you mean you know high-level rappers like Keras and What they contribute to making tools easier to use. Yeah. Okay. Good. Perfect. No So either the acoustics are a bit difficult and I keep hearing an echo. So it's not it's not about you So yes, actually Mentioning Keras and also tensorflow actually it's a good example. I didn't mention that in a talk, but Especially with tensorflow too I think they really just like pytorch went for this concept where you have the low-level Primitives that you can use but also you have a more high-level API that does the more Common tasks that people might want to do and I do think this is if you can maintain it and if you can pull it off I think that can be a good compromise But actually, you know, you want to be exposing the primitives for people to work with and they actually need to be Usable and then you can still say hey if you know I can give you a fit method that trains your model But I also give you all the parts you can use for your training loop The only thing I would say you have to consider there is that okay a large library like tensorflow can pull this off But if you want to do it, you know, both parts need to be good If your primitives are kind of hard to use outside of your wrappers. That's pretty difficult But if you can do it That's a good way and that that way every time a user needs something custom You can say okay, you're like past the point of using out of the box now here all the primitives now You can put them together yourself. Here's the documentation have fun. So yes, that's I think a good example So if I understand correctly prodigy is then closed source to Yes, and no so prodigy is like prodigy is a Commercial tool so it's not open source and free But we do include parts of the source that are not compiled siphon with a library And we also let you write custom python scripts So ideally you can interact with a tool via these recipe scripts and use the components and compose Your workflows in code Yeah, so it's still you know, you can still you pip install the library Into your environment, which means you can import its components from your script and write your own scripts using the library. Yes Yeah, thank you for a wonderful session my question is regarding spacey NER if the data is in Sentence form then we can directly go ahead and recognize entities based on the context But what if my data contains single word in each line? So how can I use NER in that and how do spacey NER learns context from it? Okay, I mean, yeah, so that's a bit more tangential and we can probably also talk about more details later But fundamentally so you just have one single word Yes, I mean in that case the thing is if you named entity recognition by definition is you know The idea of recognizing names and contact names and concepts and usually that happens in context And so for context you need context and that's also how the models are typically designed They decide is Apple a company or fruit based on the surrounding context if you have no surrounding context then NER and kind of you know that idea of modeling your task is also not very useful Okay, and even in sentence format do the number of spaces between words matter For recognizing the entities so you mean what what the space the number of other other words around it Yes, yes. Yes. Well, yes, usually, you know models have different windows in our case It's like four words on either side So that's kind of what the model looks like and that's also you know Usually what you want because that context tells you a lot more about you know the words you're looking for so yes Okay, thank you Hi so when you were conceptualizing when the explosion team was conceptualizing prodigy as a product and Through the development process. How do you decide what components to make open source or to even expose and what components to you know To keep us the key points for it as a product Yeah, I mean in our case so one part was okay that we also we use we compile some of the code So, you know if it's if it has like see extensions or something and it's be compiled Then that's a bit harder to just expose and have you edit But you know more generally we we thought well, okay, there's some All the building blocks that basically modify functionality that people might change or also that we might want to You know very like for example how a stream of incoming examples is sorted how you know Certain config options are defined or how the model is read in that's all something that we know a user might want to Customize and then there's other stuff that just powers some of the internals. That's also much less relevant To change in a sense because that's really okay That's a core functionality of how the tool all fits together. That's the stuff that we're like well, okay We don't need to expose all the details or then again stuff also actually stuff that we feel like look We just I'm not an expert on either, you know rest apis or databases, you know I can write that stuff and I would say we did a pretty good job in prodigy, but I wouldn't Consider myself like especially opinionated enough to tell someone how to do it So we're like pull the web server and the database We have a pretty good implementation, but if you don't like it We give that to your open source because I I can totally imagine that someone wants to change that and I think just admitting That is fine like look you build something good, but I can understand you might want to do it differently and so here it is editor Hello, so hi, I wanted to know like when writing libraries And sometimes we use decorators and all and if we want to use some protected variable of some other class or say some input We are getting for example I was writing a decorator to print logs that Your function like just logs that entered into this function And I wanted to write that exit of this function and these are the outputs or these are the arguments that are passed I needed to use the protected variables and dot function name Is this okay to use the protected variables of a class or not a good practice and so it can you summarize the question again? I'm not sure I got the whole yeah, so like if you pass a class or object So is it okay to use protected variables in our library or it's not a good practice What variables protected variables like if an object like we define in Python by double underscore So is it okay to use it in our library if someone else's User is passing or not a good practice. Oh, I mean, I don't know if I have an opinion on that to be honest Like I do feel like Yeah, I don't know in general every you know, you'd want to avoid too many Abitray names you hard code because that's just more stuff that a user has to you know remember But like I don't I don't think I have a have an opinion on that But you know it does sound a bit like you know having your protective variables and stuff It does sound a bit inconvenient to have You know too many of these conventions that you come up with in your own code base Like for example, or even you telling me that I'm like I'm actually trying to imagine like how this would look in code and like how I would interact with this So that you know, that could be a bad sign, but like honestly, I don't think I can give you a good good advice Sorry. Yeah. Also. I wanted to know what is your in this PC. What is your work culture likely? I was the work divided and how you guys Oh Also, you mean how we divide up what I mean at this point, of course, it's a bit easier because we already have a library We have we've validated a lot of the ideas we've had about like how to compose things So that becomes a bit easier because you know, we see we have our roadmap and then we decide Okay, we now win our team of four for developers who are on the spacey core team everyone kind of has their own strengths and like areas that they're working on so Yeah, that's that's now I would say Relatively easy by like just okay talking about it and actually another thing we do in spacey is spacey is very it's sort of Driven by a very few authors and while we love, you know community contributions And especially in the area of extending the languages and really using people's expertise We don't necessarily expect our community to work it on like some in-depth features and like Contribute stuff where you know, we have a clear idea and we think okay, we can pull this off and develop it So we're like, you know, we talk about it in the open, but we still ultimately we figure it out and do it and Presented to the community for feedback What's your view on you mentioned entry points, right? Yeah, can you speak up a tiny bit? Yeah So you mentioned entry point. Yeah, so let's think about it to like come online to like click Confirmation to let them go so most of the entry points go through this input path head So, let's say you have nested directories and if your configuration file or the function Which is in the module which comes into the entry point often it encounters an input Problem so you can you like that? I missed the last part. Sorry. So if you package, yeah, right often it enters this import problem right through Python path So what are your opinions on that so we expose entry points and then you have modules and within modules You got submodules and then specific functions within those submodules Often that cause problems Okay, so yeah, so you the question is well how to deal with like bugs and problems and like usually have you encountered anything like this Yes, so if you you know if you have entry points and everything like advertises them You kind of end up with these problems with like several layers deep of what it registers like yes I can see that's a value concern and of course you could also you could just publish a package on pi pi that like Advertises really terrible entry points for spacey and then like mess up everyone's spacey installation That's that is true. And that's I would say get Fault with this approach like or potential problem with this approach also, you know I guess as a library developer you probably want to do, you know some scaffolding around it to make sure that you're not loading something That's terrible, but yes, that's something you have to consider Probably writing some good error messages around that and maybe also I don't know I haven't I'm not an expert on like the development on entry points, but I could imagine it maybe also That could be something that could maybe be improved in the future and have more features Packet added to the packaging. So, yeah Yeah, so that was a great talk and I loved all the designs of your slides So my question to you is does support does prodigies support longer text with new lines What sorry a does prodigies support longer text with new lines longer texts with new lines Yeah So in general yes, because whatever you can you know render on the screen you can render like there's one thing if you if you're labeling manually new lines can be kind of tricky because The difficult things here are that like new lines are unicode characters. So you always need to see them in your text and one problem with Visualizing new lines is that well, they're usually kind of invisible. They just create a new line. So we have to you know add a bit of Hackiness to actually display new lines as new line characters because otherwise it's very easy to accidentally highlight a new line and really mess up Your training data that way But in general if you in input text has new lines in it At least one new line you can visually render and see and highlight or avoid highlighting. So That's typically no problem. Although You know, I would often recommend if you text of a long have lots of new lines You don't always bend it don't always benefit from looking at the whole text unless you really know what you're doing and your model Implementation is actually sensitive to the whole text Otherwise you might as well cut it up and use shorter pieces But we can also talk about some of the details later. So this is the last question Yeah, yeah, thanks for the talk. I have a question specifically on API design So one headache that happens is sometimes your API is too low-level and that you know, you can do everything with it But you have to do everything or it's too high level where you don't have to everything and you can't do anything Yeah, so have you do you have any thoughts on finding the right? Abstraction that's the first part the second part is suppose you make a mistake How do you do version 2 of the API without screwing all your customers? Those are both like really good question. So I mean the first one Well, it's kind of what the trade-offs I try to cover in the talk often It of course it helps to have users like, you know, sometimes you might see it's always better to start Maybe a bit too low-level and if users want more you can gradually add something more Once you have like all of these helpers that do everything rolling that back into a more low-level functionality is Often a bit too difficult. So I do think having people use it and then kind of slowly move towards like Kind of a good compromise, but that's definitely one of the challenges and if you can solve that I think you have a good tool second. Yeah backwards compatibility. That's that's difficult and that's also why It's a it's not the best advice But try to make good decisions and try to make as little mistakes as possible and you know You need to be right But also I would now when I'm writing APIs, I'm trying to already think about What what am I going to do if I ever want to roll this back? Can I easily do that and often I feel like if I come up with a design that's it's easy to manage the backwards compatibility It also often means that it's overall a better API design or once you end up with like Random combinations of keyword arguments that might not be valid anymore in the future That's I would say often already kind of a very questionable design that you Want to avoid or you know if you rename something want to change something? Can I just raise an error message here direct the users to something else and it's done But yeah, I can definitely I can relate to all of these questions, of course