 Hello everyone. So I'm Teddy. I currently work as a software developer at Open Metadata where we're building an open-source Metadata catalog management to basically discover your data set, apply some sort of governance and whatnot. I previously worked in different data roles, so data analyst, data engineer. It's my first time speaking, first time ever at a Python conference, and I'm a bit nervous. So this camera, I'm not necessarily directly involved with the PyDentic library, but I've just used it in different contexts, and I think it's an amazing library for a specific use case. And I just want to share the love and my experience with everyone. So first we go over why PyDentic exists, go through some basics and elements of PyDentic, see cool features, and then we'll go over two use cases that I came through in my experience. So first of all, why do we need PyDentic? So as probably most of you know, Python is a dynamically typed language, which means that this kind of function, where we are expecting two values, can be successfully run by passing two strings, right? And so what you will end up with will be concatenating two string and creating a new string when you were actually expecting to return a specific integer, right? And so a few years back, when Python introduced some static type checkers, such as MyPy, that allows you to run your entire code base through specific type checking and make sure that you don't have some, what, correct senior code. So in that example, if we run MyPy on our small application, then we see that our second code to some value is not correct as we're passing two strings instead of passing integers, right? So in certain contexts, that's what feature of Python is really, really great, right? Because it allows you to do one quick experimentation and whatnot. But in certain contexts, it can be dangerous as you can pass some unexpected data into your application, process it, and do things that, you know, what you didn't want to do. And so that's where PyDentic comes to the rescue, right? By providing you a way to type check your application, your code at runtime and prevent it from processing any further down your application, right? So basically, what PyDentic allows you to enforce type hints at runtime, it's super easy to define your data representation using simple Python code. You can also do more complex and modular stuff by defining types as other PyDentic models. So we'll see that a little bit further in the presentation. And you can also define your own types, define your own custom validation by, you know, if your application needs to validate against certain business logic, then you can do that. If we were to rewrite our previous function using PyDentic, we will simply define what is called a base model, basically, which is represented here by my data, where we're defining the types of our two class attributes, so num1 and num2. And then we would just pass those values from our class instance, basically, which is my data and my other data here, to our function. And then as we can see, when we're trying to actually create an instance of my data with two strings for the class attribute, then we see that PyDentic frozen error and it will stop the processing of our application. And we're not going further than our application, and we're sure that we're not writing any bad data into our application. So what we'll go over is the basic of PyDentic models and some interesting future as well, for a small caveat with PyDentic models. So as we saw to define a PyDentic model, it simply inherits from that base model class and we just create a new object where we are going to define some class attributes and add some specific type hints to our class attributes. Models themselves, you can use them as a type in other PyDentic models, which allows you to create a more modular type of model. So a good example is if you have employees that belongs to a department, a department that belongs to an organization, you can create three different types of PyDentic models. And in your employee model, you can simply assign, create a department class attribute that we'll have as a type, your department. So in this example here, we're doing it with the data property model that we are passing to my data. In my data, we have a properties class attribute which has a type of data properties which is itself a PyDentic model. So that allows you to define complex models that are modular and can be reused in different parts of your code base. So as we saw earlier, you can easily access the values of your object instance by simply calling the attribute. One thing that you need to be careful with, that you need to be aware of, I guess, is data casting. So by default, PyDentic will cast value as it sees fit, basically. So in our first example, so we have our my data, we define a new instance of my data, which is the model that we've defined earlier. And instead of passing two integers, we are passing a string integer and then a actual integer. And as we can see, when we run it, when we look at the type of my data number one attribute, then we can see that PyDentic casted that value as from a string to an integer. So it's something that you need to be aware of. And if it's something that you really don't want, you can use a strict integer in that example. So PyDentic has a bunch of strict data type that you can use to enforce a one strict type checking and prevent PyDentic from casting your types on the fly, basically. Something that is pretty interesting as well. So as you saw before, we define our data model ahead of time, right, by defining those subclass of the base model object. But you can sometimes, maybe sometimes you will need to create dynamic models. So models that will be defined somewhat on the fly. And so PyDentic allows you to do that with that create model function. And basically what you need to pass is a, well, the name of your model as the first argument. And then you will need to pass basically the type and the name of the value. So in that case, we can imagine that we're reading some sort of input from a YAML file that we can convert into a Python dictionary. And then from that YAML file, just defining our dynamic model and then just passing those data into our object and then creating our dynamic model. So that's pretty interesting feature if you need to, if you don't necessarily know ahead of time what your data representation will look like. The feature as well that we've used often is the ability to instance yet and create an instance of your class by parsing an object. So it's a similar scenario where you can have a YAML file, for example, that has a different configuration that you can read and transform into a Python dictionary. And then from the dictionary, by calling the parsed object method on your PyDentic model and passing that data, then you will create a new instance of that object that you can then use throughout your code. One cool feature is the ability to define custom data, the data type. So the data type that you are assigning to your attribute don't necessarily need to be Python primitive, but they can also be your own logic basically. And for that, it's fairly simple actually. You just need to define a validation method. So in that case, we are from line six to line 15. We are simply defining what a correct French phone number should look like. And then in our get validator render method, then we're simply yielding a validator. And what's pretty cool is that you are not restricted to defining one validator, but you can if you want to define multiple validators and then just pass it to that get validator render method. And now when we are defining our PyDentic model, we can simply assign that French phone class as a type of our phone class attribute. And so that brings us to our second big topic, which are the PyDentic validators. PyDentic validators are a way for you to validate against specific business logic, right? So so far what we saw is validation against specific type, which is basically making sure that the when you instantiate your object that the value you are passing to while your class are matching the type that you've defined, right? But sometimes in your data processing or other application, you may want to validate against specific business logic that pertain to your use case, right? A good example could be making sure that age is greater than zero. Could be making sure that the value, the amount of your order is greater than zero to make sure that you're not ingesting bad data basically. And to do so, you simply use the validator decorator, sorry, the validator decorator that you are going to add to your method inside your model. So in that case right here, we've created a identity user model that has a few class attributes, if you type in a sign. And then what we want to make sure is that age is always greater than zero, right? And seems like a fair thing to assume. And then when we are going to run to create or object instance, if age is less than zero, then by then take with an error and then it will raise that error and then it allows you to handle it as you wish. One thing to note with validators is when you have a list or dictionary or set type in one of your class attributes, if you don't set the each item argument of the validator method to true, Pidentic will validate against the actual list object and not against the item inside that list, right? If you set it to true, then Pidentic will do the validation against each item of your container sequence or whatnot. So it's a small element that might be important to keep in mind. And then the last interesting feature about Pidentic is the setting management. Basically what it allows you to do, it allows you to read specific values from environment variable and it allows you to set your configuration in an easy manner. And so that's generally interesting when you want to do some sort of overriding of your configuration in your CI, basically. So it's very similar to defining a base model, but in that case the object inherits from the base stateings. You are still defining your type as you would do with a regular variable model and then you are signing specific default values. The two interesting things are what's under the class config, where you can define an environment prefix or if it's case sensitive or not. The environment prefix basically will be appended to all of the class attributes that you define in your object. And so in that case, you can see that we have a port class attribute and when we are defining on line 15 the actual environment variable, we're prefixing it with the Pidentic underscore underscore as, since we said that as the environment prefix. And you can see that in the example, that value will be overwritten once we call or setting again after we have defined that environment variable. So if you use .mfile then you can also pass that in the class config and then Pidentic will know where your .mfile is located. One important thing is the value priority. So how Pidentic determines which value takes precedence over which one. So whatever is passed as an argument which takes precedence over everything, which is pretty interesting when you have some sort of a CI that means, you know, to have some specific configuration values or whatnot. Okay, so now the two use cases. The first one, the first use case that I've encountered is when we are defining that configuration file. And so for those of you who don't know that files are just directed acyclic graph that defines one task to be performed within a data pipeline. And so what we've tried to do is to give the ability to people really close to the business domain to create their, to produce their own data pipeline to extract and transform data. The way we've done that is by building a pretty big overlay that's abstracting a lot of the steps that are required to what create air flow, air flow DAGs basically. So that users just need to fill out a YAML configuration file and not worry about how to write that DAG file, what's, you know, what's even Python or whatnot. They just need to know how to fill out that configuration file and then they're good to go. So the only drawback or the risk with that is if a user creates a YAML configuration file that is not configured correctly, you run into the risk of either their pipeline doesn't work and now you have to, well, debug it and understand what's going on. Or with case scenario, it breaks your entire air flow instance and well, that's not great because then, you know, all your data pipeline stopped and then your stakeholders are not very happy. So what we've done, we've created some data contracts that were defined using pydentic models that were representing what a configuration file should be and what it's represent, what are the values that are allowed and whatnot. And then whenever a user makes a pull request, then we're able to validate their configuration file against what we are expecting to be a correct YAML configuration file and we don't need to worry about, you know, either manually checking it or having their YAML configuration breaking or air flow instance. Then the second use case which is what we currently use at Open Metadata is to define pydentic models using JSON schemas. So for those of you who don't know what JSON schemas are, JSON schemas are basically JSON files that you annotate with specific values. So it has the benefit of being language and agnostic, super easy to read, and it allows you to run some validation and testing against it. And so while we are using that kind of approach, it's basically, we are defining some standard for Metadata entity. So what we define, what a table entity should be, what kind of attribute it should have, what kind of characteristic it should have, and those entities are implemented across free code implementation. So we have the backend in Java, we have the front end in TypeScript and then we have the ingestion framework that's written in Python. And so the big question is how can we ensure that all of this implementation follows the same data structure, right? And so the answer is simply by using what, as you guess it, JSON schemas. And the way we are generating those pydentic models is by using a library which is called data model called generator. And we are basically running this data model called generator against our JSON schema. And those JSON schema will output pydentic models that we can then use to validate different aspects of our code base, right? So be an API payload or sorry, or what the table entity representation should be, all of this kind of thing. And we only write one centralized file and then it gets propagated across three different code base. And yeah, so that's pretty much it for me. Thank you very much for the interesting talk. Are there any questions? If so, could you please come to the front and ask it through the microphone so that the online audience can hear it as well. And just to the online audience, if you have any questions, please ask. But we also let's integrate you as well. Thank you. Hi. You've shown us how validators work when you instantiate a new object. Can you also run validators when you change an attribute of an existing instance? Can you have an object and you do object dot attribute equals something? Make sure that this also runs the validator. I see. That's a very good question. Actually, I will have to double check with you, double check and then get back. Sorry, and then get back to you. But that's a very good question. So you mean basically changing the value of an attribute by simply accessing that attribute of that object, not instantiating that actual object? Yeah, I will have to confirm with you. Thanks. Sorry. Any other questions? Okay, I have a question to you. So if I have a custom type hint, so like a nested type in, would this also work with identity? What do you mean? Like I have a dictionary, you have a dictionary, you have a string and then maybe a custom, something inside like nested JSON structure, for example? Yes, it's a thing that we use, you know, you know, you can also define optional type hints and then it will. No one else? There we go. Most of the things you were shown on the use cases were kind of taken from external data into your code and make sure the validation from like external data was. Yes. And I know FastAPI uses it internally in the code to make sure that everything's working kind of internally in code. What are the advantages that PyDantic gives you, like you internally in the code, rather than something like a data class or something like that? Yeah. So I guess something that we currently use and that will be more taking internal data is when we are defining the payload for or what to pass to or different API endpoints where we are validating that that payload is correctly formatted and that it can be sent to one or back an API to write specific entities or whatnot. So that's the internal use case that we have right now. Thank you. So I'm very new to PyDantic, but about your last slide about writing your schema in a JSON file and loading them into different program languages, I was thinking like what would be the advantage of using PyDantic over something like PortoBuff. So that's the question. Thank you. Thank you for the question. I've mainly just experimented with either plain data classes or PyDantic. I'm not super familiar with that library, but what the advantage currently is that the code generator, what plugin is able to output those PyDantic classes. And so that's, you know, many why we have that direction. Thanks for the talk. Would you use PyDantic with data frames like Pandas? Trying to think. So I haven't personally had the use case. Generally, the way we've been using it is to when we are consuming data from a specific endpoint or consuming data from a config file, then making sure that that data representation is what we are expecting. And then if we then need to do some processing or whatnot, then we can transform that specific data into a Python data frame. But using it directly with Pandas, I haven't come across that specific use case. Thank you for the question. Thanks for the talk. Maybe just a quick one. Is there in your experience any time when a data class would be better over PyDantic? Yeah, so I guess that's a good question. The reason that we, the nice thing I guess about PyDantic is the way that it's handling mutable default types. So if I will remember in data classes when you are defining a while creating a mutable attribute, it will throw an error. In PyDantic, it handles that automatically by creating deep copies of that mutable attribute on each instantiation of this object. So that's the biggest advantage that I found over data classes. We would have time for one more question. All right, then thank you very much for the talk. It was...