 So good morning for the people who are with hungover. I'm sorry because it's be late early in the morning I think it's going to be more like a friendly talk because I nearly know most of the faces. That's that's really great So what I'm going to talk about is plon5 Who does not know what's plon5 plon? A person. Well, that's great. There is one person that so then I finish the talk and I go to take some views So let's go to try to do that. Who am I? I'm Romon Navarro Bosque I'm a plon foundation member. I'm in the framework team in plon. That's the core plon team that's doing It's meeting every week. It's really boring and then there is the seat. I'm a CTO at the East Company in Catalonia Yeah, I'm Catalan. So that's who am I? You can see the measure That's in Catalan So how I'm going to organize this talk I'm going first to talk about plon5 because we are now in a good moment in plon. We are going to release plon5 So I'm going to try to explain you what's going to change there How do you are going to program there and then I'm going to talk about machine learning because otherwise You cannot include your talk in every Python. So let's talk about Let's talk about plon5. So from the user point of view from the developer point of view from the city Open of view from the business point of view. It's mostly quality user interface testing It's a really old software. We have an early 12 years That's been there. We grow really fast We have a lot of sites around the world for some of the global the government the Brazilian government And it's really being used a lot on big companies ABB some other big companies are using that so and why They are using because it has a good quality. It's really stable You can grow from a really small side that it's only for your personal usage for your shop under your flat Or you can just use the same software for building really a big site and weak internet or big content management system so Everybody knows plon so I that's that's going to be boring because I'm going to talk about What's blown from the user point of view? So blown from the user partner view is content Content is the king. That's the king Japanese. So What does it mean that it everything is contrary and that it's you have pages you have documents you have folders you have events you have whatever you want for example, there is a An hospital in my city that they want to store the Passion Sheets about okay. I He's ill. He has these problems. He's going he we solve that this is content Everything that's a piece of a document that we can have in a word or an Excel or whatever Or we can have enough in a piece of text is a document So content is the king content is the center of clone and we have full multilingual support and In Catalan also, so that's great It's translated in more than 50 languages You're I added this because most of the CMSs that we have in other languages like Java I think something like that Doesn't have full multilingual support that means that all the content commit translated to all the languages And you have connection between one translation to the other blah blah blah Okay, one one of the most important things about blown is the security Why is being used by FEI why it's being used by the Zilling government? Why is being used by ABB because they know that it's really secure So you all the content that's there all the fields all everything that's there It's not going to be seen by anybody that you don't want that is seen And why is because why it's we have this you know it security granularity It's because we have dope and ZLDB I'm going to talk a bit about that later and we have workflows So all the all the content goes through three different different steps and in each step has different permissions For different users groups, etc. Thimming If you were using blown before blown 5 4.4 Yeah, you will surely you will quit using blown because it was really difficult to use The curve of entrance. I've been asking some of the Repital people have you used blown in the past? Yeah, yeah, but I quit it because it was so difficult to understand So what we saw about that we saw for most of the problems that we had with the family For example, now we are using some kind of rules. We call the aso. It's using they are being They are designed so you can see your side without knowing anything neither about Python if you want it's using rules to move things from one from place to the other later To show. Oh, yeah, we integrated required. Yes. Unless so if you want to build Nice front end with angular or with we are or whatever blown directly provides you the tools that you need to do that We have inline edition through the web edition. So you can see in the back You can create content move content around. This is mosaic project is one spin off of blown That's that's still work in progress. It's really cool because you can just create content You can just Load an image through the web with a dragon drop and push it wherever you want in the page Okay, if you want we can try Those this is the new plon 5 Front end this it's really cool because it's using a theme called barcelonetta Resonetta from Barcelona Catalonia again Thank you Just you can see I think it's English yet So there is some kind of cool things you'll have you can browse all your content here This resolution is not really good for doing these kind of things, but you can go to the folders You can go to the content and see whatever you have here You can create a new page wherever you want You have this that this tiny me see for that's really cool you can push images you can Push files whatever Okay There is a lot of good features that if I would show that we will still until tomorrow so Just sharing you can decide who can edit who can The history button where you can see who I did it, which is the difference Yeah, okay, so Now as that we are in a technical Technical conference where I'm going to talk a bit more about how it's technically blown because that was like the commercial You are QC the blown is based on a bureaucracy database because as all the The users are used to have their desktops with folders content inside that so we are using a Specific database. It's called you DB It's really I think right now CDB has more than 10 10 years maybe More more than 10 12 years, okay So and at that time nobody was talking about Non-sequel and a GDB is a non-sequel database that you store objects in Inside objects and he stores directly the pickles on Really great structure So if you want to if you are not going to use flow But you need some kind of database that you need to store a tree that of objects that are going to have some kind of Relation between them. Do DB is a really great database outside of So this concept this concept of storing everything in Iraq your accuracy mode. It's really There is no other CMS that does that everybody's starting things in the sequel that about Content management is not relational that If you go to the university and you say, oh, I'm going to store an object And I'm going to put all the HTML in a field of my table. Sorry, you need to structure that you need semantical You need to have the the the option to have sons of your content. You have a folder then you have Content inside that images inside that so that's the one of the main points that technically talking but The really The content is the king. We just have this folder structure. So then What are we using to have the content we are using a project It's called dexterity in plume 5 we remove what our type if you never use our types I'm really sorry for you, but dexterity is really a cool piece of software because It allows you to create can't define content With a simple a simple interface just you decide, okay I want to welcome to create a content. It's called a sponsor and then you have a Choice feel where you can have different vocabulary a little vocabulary that's going to be shown from the UI You have a rich text. That means it's a tiny me see. It's going to render a tiny me see so you can write down You can define a Yuri you can define different kind of fields files Images you can define also permissions for specific fields. Okay. I want that Teemo doesn't see that field. So I can write that Teemo doesn't will not see that field. That's that's That's really cool. So with this simple file. There is no copy a strange thing that we use them in our types That only defines really the interface really the fields that you want in your content the semantical meaning of Your overcontent you can define your own content type Then this content type it's going to be mapped to the URL so one of the good things also we have in zooploam blah blah blah it's that We have traversal that means that all these folders that we have I explain it to you that you have a folder and then a content and then image for example You can access that through the URL you can write the URL if you say the folder the content blah blah blah and Then the view a specific browser view Function that renders that and creates a templating stuff or whatever you want to use to see that here We have an example of what's What's the view? so here we have okay, let's That's one of the cool things we added is all this a JavaScript CSS National system, so you register CSS and JavaScript and then it automatically pops up on your page when it's needed So you can register for example a bundle jQuery data tables and you want that this view It's going to render some kind of page has this Elements this JavaScript included and the CSS that's needed for this resource to be loaded so then you just add resource on request and You write the name of the resource that you want to add and it's automatically going to be deployed on your on your view when you are rendering that Yeah, there is a lot of technical details. I just don't want much maybe Here is how to how we register a view I Didn't phone asking a lot of people here in your Python What's the worst thing about blown and they say this is email because you seem is XML XML is not bad Sorry, it's okay. It could be better This is email. It's really great because allows us to have really a good list of the views the content all the definitions that we have so I Really think it's great to have this this way of defining all the views and all the content Yeah, and we have the templating Yeah, because you can create a class to show you the Content to have your content type there But you need some kind of class to render that on the view page I some kind of templating and one of the cool things that we added on plume at the end of plume for it's that we can use chameleon So now no more towel no more way of defining variables with a specific crappy language That's only use it being used by zoop. So we can define these variables and use the variables with these dollar stuff there That's really Interest interesting So by tonic, okay, you're you're I think that most of you you know that okay, Joe peace the most by tonic rest Framework for what can't work on the management you can access the object of your site and it's everything mapper to the database then you can just Go to the attribute of your site That's called a folder and then you can go to the attributes of the older That's called a document and then you can access that an attribute of the of the document So you really have the granularity and the pythonic way of doing content management Yeah, I wanted to talk about so component component architecture because maybe there has been a lot of talks the last one I saw those pushed an internet it was 2005, but it's it's really interesting Infrastructure and it's being miss miss used I think because we have job that interface and so the Component that it's a really good packages for managing three good patterns of programming that Okay, if normally when you write some Python code you write a small program but if your program is going to grow up you will need to have some kind of patterns for designing that software and Sometimes you there is really good libraries about patterns how to develop big suffers with that and so it's a really good one I really love it so Where are we storing our? Components our adapters our patterns. It's in local storage Local registry and global register. We have two in soap when it's global for all the sites of your environment of your process of your threat and then one specific for each site. That's local the local registry So I wanted to first explain you okay What's an adapter adapter? It's the one of the patterns that and so for it's really good for using because we are using in the past Like subclassing classes, so we ended having our types that maybe you were looking how many classes it was subtype But and it was maybe 30 different classes that if you go up up to see the parents of the class You will never know what's going to be there. It was a mess So we decided they decided to go community to create this part this adapter pattern and Well, it's just it's an interface you define in this case a person You define some kind of normal interface like Java like whatever you define some the functions and the attributes of that interface And you have implementations of that interface for example We have an interface called I person that you define a function Which t-shirt I'm wearing and then you have an employment an adaptation. It's called Catalan guy and then it returns a stilada and If there is another implementation, it's called bass guy. It serves a t-shirt so You can use any of these adapters to adapt an object and Get the methods that you need Okay The subscriber another kind of good Pattern The observer pattern the subscriber pattern and software engineering Okay, we have the kind of object that we want to Look, okay I want to I want to check if you modify this kind of object if you modify it the I person object then you can subscribe for I person and You want to modify for the modified event and you decide which function is going to be executed in case that that's being modified An utility. Okay, that's just You want to store some kind of list of good alcohol then you define interface good alcohol And then you have an object that will return all the list of good alcohol because it's an utility That's really useful and then at the end In this more technical about blown stuff and going to talk about The JavaScript and the CSS integration we did We created our own Kit our own frame or JavaScript frame, or that's the worst Error with it. I think because it's going to be hard to maintain that okay at the moment There was no the the the solutions we have right now on the on the community of JavaScript But it's really it was really great. It's using patterns the patterns is a way of defining okay, you have these elements and then you can with that attacks on the HTML elements you can configure them and they automatically run JavaScript on top of that and renders the correct widget for you For example, this is the date that the date picker the pick a date widget that we have on blow normally then you can see This is the configuration you define the format the date blah blah blah and Everything it's integrated with required. Yes and less that means that if you want you can start your clone in the back mode and Then you are going to see Everything compiled on time on the browser. So you will see the the source code of select to there So you can go and debug whatever thing you need because when you are blown. I think that has Nearly one million of lines of JavaScript with all the libraries that's including tiny me see so big that So if you want to debug that and you don't have the option to really have them uncompressed and that you don't need to take care about loading everything so we created a Python model that's taking care of that creating the JavaScript Configuration for required. Yes and compiling less. So you get the source code there on the on the browser Diaso I'm not going to stand on that just a way of defining themes So you decide you happen you you go to your designer and say, okay, I need to do this web page So just design that you the design will browse it you send to the people who do the active a HTML And he does the HTML. Okay, the HTML is done and then you you give to the to the blown guys Okay, this is HTML of my site And with this day as rules you just Remove I want that this deep is this deep of blown this piece of here is the space of here And you you can create your theme without doing so hard integration right now. That's also really cool thing okay new things that we are working that we are going to do in the next Months, I hope is the plumb rest API. It's really clear that all the web Frameworks all the web applications are going to move to JavaScript All the UI needs to be done in JavaScript So what's really needed is that we have a really cool rest the rest API that we can interact with blown So we created blown for the rest. I think it's not released, but it's going to be released really soon You can try it on github. It's really great. We define it the way to define this HTTP bergs like put the get lead on a specific content type Okay, I have I person and I want to delete the person So then I can define a specific HTTP bear for deleting on this object And which is the function has gone to be executed So here you see for example the implementation of the put Okay testing We improve it a lot on testing. We really did a lot of work there Jenkins the plumb the dark has One the plumb five job has 101 acceptance stands and I think that more than 5000 more 9000 integration test and unit testing We really test everything if you commit something and you break the bill then this guy comes to you and starts to yell that you And it you don't never go to sleep before the Jenkins it's green again. So we have a really good testing I really want to thank Timo and the testing team because they really did a great job on that Because it's we have really really a good testing So that makes also that the people the companies rely on us because okay none You we create a new release of clone and you know that's going to work because we have really tested everything I changed the name of the button of a control panel and 10 test fields So that's really really great and for companies that rely on that and they want to invest money to To get their sides on and the one that in five years is still there. So that's really important So, okay, we have flown five beta 3 release now we created a specific Repository so if anybody wants to try it and he doesn't hit things Okay, it's going to be difficult to try it alone because it's a complex software. Okay, it's one two three four lines of Console to try that you go just clone the jithub.com clone jamming to you run with Python 2.7 Gdb is already running on Python 3 So we think that we are going to someday to have options to have Python 3 in all the stack. It's really big so we will try Then we run will doubt and then you run instance FG and you have your clone site running It's only four lines of code. I for lines of console You need to go to take a coffee between the build out and the instance because it might take 10 or 15 minutes So depending on the machine and the network Okay, just We have a really good documentation there is training to plot the dark there is doc scope on the dark hundreds of different doc documentation written for developers and for users and Lots of training There is a training that it helps you to understand flow. Okay the future We are going to move maybe to assure to the rest API and we need to json front end We are going to have some kind of a sink. I yo back end Maybe we substance the maybe pyramid who knows we will see we are going to talk about that I love that this talk also has at the end machine learning and they send me 10 minutes So I'm going to talk about machine learning also So what we did in blown is we created like a proof-of-concept because our main goal the goals that we had Using machine learning is that okay? The users has uses CMS so that when you get to the content you want to see for example They're related content that's on your side, but the people doesn't go to see oh This is related to this one and this is related to this one because nobody does that in the content management system so we wanted to create like a Smart way of that everything is related between between them automatically Classification of the current content So if you have a your upload site that the people has already Targeted the stuff there, then you won't just maybe to run some classification algorithms on top of that to just Classify what's new and what's added on your on your content on your balloon recommendation of two jets and tax that means that if you are creating a new content and you are editing the content having a way of user Feeling that it's suggesting you okay this content talks about that Just asking the user if if he wants to talk this this content with this kind of subject And semantic search on the life search. We have a really nice life search so you can search like In Google and get the content that's there, but it's completely text only It's looking for text on all the files that we have and all the content that we have so we were Trying to make that it to have like semantic search so it's related to stuff So it gets more useful So I started to to use to try to understand what's machine learning. It's completely a big big subject I found one really good picture at a really love and I wanted to show you. This is sake you learn Maps so you this you go from a star and say okay what I need to do I need to I Have more than 50 samples. Yeah, I'm blown normally you have more than 50 samples Okay, I want to predict a category. Yes. I want to predict what's talking about okay The I have labeled a data. No because people doesn't have labeled that so Then we are going to go to yes, but now we want to know so We know how many categories we we have we don't know how many Blocks of content is going to be an upload side, but that's needs to be defined for the administrator because we are not going to try to Predict that so then we are going to use kimmins and kimmins is one of the algorithms that we have Implemented and I'm going to show you then in case that it's we have labeled information. We have another branch This is a really good psychic Purchase that it helps you a lot to decide which algorithms you need to use for different kind of usage So I can learn maybe it's not the best option But we wanted something that is integrated in blown and you don't need to run anything else external There is also we did also implementation with Janssen that works fine But you need external rest API you need to export the content and our clients are really specific about security So they don't want that we export the content to any other external application So we need to have everything embedded on the security of so so We implemented these collectives of machine learning in our g-tap it That's clustering so one of the initial one so what we create we created is an adapter It's called I learning a string you get from that adapt from you said this content type is I person And you create this adapter. So this person I want to get which stacks I want to use in the machine learning stuff so you maybe you want to join the name the first name and his since his birthday or whatever and then you create a text line attacks that it's going to be used for for analyzing Then we normalize it. We vectorize with a no ticket and LTK with future hasher that we get The the corpus of each document and we store that corpus and on a pickle on the database So we can reuse later. So it's not expensive later to use that We close to rise that means that with all the corpus that we have we created the big metrics of All all the things that are on the on the con on the site. We tried with more than 150,000 documents. Okay, you need 64 gigabytes of RAM or 32 gigabytes of RAM to run That on memory if you want you can just use the batch stuff of key means But that's this is this word running on a single process. We try to we use just your memory There is an algorithm that allows you to have batches came in batch. So we close to rise that We define the numbers of clusters that we want on the front end and then we just use the key means algorithms to Decide groups of content and we store this model the model that we get on the on a pickle On the database and then we use that model to predict in which cluster is going to be the next content They could you create you create a new content and then automatically we'll see okay. This content belongs to this cluster Yeah, the good thing is that all this implementation has security implicitly because the zeo DBs has its object And the catalog is secure so Nobody's going to get any information that doesn't need to be seen by them And Okay, in the future we are now working with naive base for classification Recommendation semantic search and external we want to be able to Extern the to push the computation outside of clone But it's a bit difficult with a security issue and I wanted to show you Because they say me. Oh, maybe I'm not going to believe what you're saying Okay, so this side has a service in Catalan because I was doing this morning and was a bit sleepy So Yeah So here I created for example some content a lot of copy-paste Content here and if I go to any content here, I will see that it belongs to cluster one This side, let me see. Where is my point? Yeah so Everything and there is a sign map automatic sign map that gets and classifies all my content in different clusters You know it I'm creating content and it's automatically defines the groups of content that's a semantically close to that So how it's it? configured It's really nice this control panel new stuff Victor did that there So let's see it. Yeah machine learning settings. Yeah So here you define One of the first system are going to store the vehicles You want to use the NLTK steam which are the stop words that you want to use The different grades that we have if you want to remove remember the hashing of the strings on so they are being restored And then we define how many clusters we want the maximum number of clusters And which is the name of the people that we're going to use to store the clustering And then you can press this compute button and this compute button. It's Getting all the content processing all the content Steaming all the content and creating in the catalog of so the specific indexes that will allow us to in the browser see Which object will do which Cluster it belongs and we created this nice view Yeah, this nice view is called Out of sight map and automatically shows you The content group it by this kind of cost then it's it's a matter of each cluster you define what's about and If you have real content, it's really easy to see that because you see the titles and okay This is going this cluster. It's talking about this kind of stuff So you can name that cluster and the people will see all the content automatically push so this I'm going to return back to the talk. Yeah We have a feature on this. This is a proof of concept. We are using in production some sites Because it was needed, but we are working a lot on that and we are sprinting on that a lot of time So if there is any That the science people that wants to help us to understand what's going on behind all this stuff. We are really Happy that they help us Okay, Plum community cool community if anybody wants to join Plum community and try now that it's much easier than in the past You're welcome. We are really cool. And if there is any questions Thank you for the presentation. Just one question. How does this handle content in different languages? Yeah, right now the The steaming that we have it uses English and we need to add in the control panel Drop down. That's allows you to decide which language do you want to use? The steaming is implemented in most of the languages. So it's just a matter of that we add in the control panel this option Could you give us a few numbers? How long the computation takes if you have like say 100,000 or a million objects? Yeah In a We do that at night. It starts a thing at one and it finished at four so three hours in one hundred fifty thousand Okay, so you you you most likely have a specific instance for that We have a specific thing the only problem is that when it's finished the computation He needs to write down on all the objects the cluster and that takes that's what takes more That writing down in all the objects that which is the class that I belong still What's the size once it's done the computation of the pickle on a hundred and fifty thousand documents? And is that read into memory with the Sitemap generator on every page request or do you case it somehow? Sorry, I didn't understand Sorry, so what is the size of the pickle that's generated at the end of the computation? And is that read into memory on the auto sitemap on every page request? No, no, no We are storing the model and we are storing the The vectors transform the modification the the matrix that the corpus We are storing the corpus and we are storing the model so We are storing the model because when you create a new content You want to ask to the model which is the cluster with which it belongs to storing the attribute? But and we store the the matrix So if you want to complete some more times you then need to recalculate everything so everything it's Can be catch it like it is normally five When you are using the site when you are doing any request nothing is Computed because everything it's it's stored on the catalog and on attributes on the objects. Oh, okay, so it's not Real-time machine learning So it's more like we run the algorithms and we integrate with flown Do you have more questions? Is there something that can do real-time machine learning? You should talk with us in some data science Okay, because I mean I I know like like solar or elastic surgery or things like that And I don't think that this possible because it takes some time to to build all the indexes and I guess for machine learning that that's the same as true and then you need incremental indexes and Algorithms so that that might be hard. I just discovered talking with some data science from From the comfort or Python. There is some kind of Line of investigation is called online How is it was called online Yeah, so That it's the kind of algorithms that you get you grow Incrementally and it's designed more for online applications. So we need to Investigate that line of usage Some more questions If not, it's lunch time. Let's thank this speaker