 So first thing who am I I'm using PostgreSQL since around 1996. I was using it first with PHP So I was young and I needed the monies. I've essentially been using PostgreSQL longer than I've used Python. I joined the Python community something around 2003 with the EuroPython in Sholawah and I formed my own company querying other databases than PostgreSQL for stuff and putting it to Excel and due to my involvement in the Python community and Later in the PostgreSQL community. I was spotted by second quadrant, which is a global PostgreSQL support and development company and invited to run the German affiliation of second quadrant now we're talking about PostgreSQL as a database for Internet of Things and Industry 4.0 So while people still are streaming in I will spend some time on discussing what's behind those terms Give a little approach and make clear what we are focusing on Internet of Things I found a very interesting graphic which is much better than those bullet points it started out with Tracking everything The big big craze Don't have your entrance batches have an hour of ID entrance batch so you can scan it The idea is you put computers into everything. That's everywhere. It started out with some stuff here just to really tech stuff that's being moved around and More and more intelligence, whatever you call intelligence is put into these things It's a big big big buzzword. You can formulate it another way Computer chips became cheaper. So it was economically viable to put them into shoes or drinks or whatever Which part of the Internet of Things should we focus on in this talk? in the media There is often something about the household appliances Because everybody wants to get on this buzzword. So your fridge will order the milk for you. I don't think so Amazon had those dash buttons where you can order a specific product to a non-described price from Amazon to have it delivered and Even washing machines companies Tell you that the washing machine will be updateable with new washing programs just to be a little bit serious They're turning left and turning right in different speeds. How much can you really update on how much is a gimmick? anyway those connected applications Push us into a very very very important discussion Who owns the data? Who shall have access to the data? So if they see Harold is always cleaning wide Glovers with a high temperature and he always has the special button to clean out Blood Maybe they start to get suspicious. Is he just a bleeder from his nose or is he a serial killer? so Even there with the household appliances. There's a discussion. I want to Avoid this discussion of the social impacts for this talk we have a very interesting talk this morning about the FFC of IT and there is an open space there. So here important topic not the focus here The other term in the title is industry 4.0 The very interesting thing that's a German thing and a German government thing The German government set up a work group and they called it industry 4.0 of course they talk called it industry 4.0 because it's a German government and they talk German, but anyway There are multiple aspects of industry 4.0 One thing is there are more center data to be collected they hope that the machines will communicate with each other and There is a very fascinating sentence in the final report of the work group humans humans and machines Communicate and cooperate. So if you ever ever shout it at your printer, you shitty beast Why don't you print your way ahead? anyway Do we really need release numbers on our industry revolutions? So do we need release numbers on revolutions? I found a very interesting graphic That explains kind of tries to explain what 4.0 in the industry was in the first thing We switched from horses pulling things to water power and steam mainly in Great Britain Then second the mass production mainly started with the Butchers in the United States then copied from the automotive industry the third thing computers and automation and the fourth part with the cyber and physical systems and I think That's one part we can focus on and that's some stuff that some of our customs are really talking to us at second quadrant all those production machines Produce a lot a lot a lot of data and Usually they write it to something that essentially evaluates to death now so maybe it's written to some memory chip on the machine or some logging file which gets deleted but More often than not all the data is put into Wayne and not use anymore 4.0 the idea and the part of industry 4.0 I want to focus on this talk is those interconnection of systems what to do with it So finding the focus for this talk because Internet of Things and industry 4.0 is a big big big topic with a lot of people making a lot of money tell talking a lot about it We know there will be more sensors more computing devices in more objects They will produce more data Watching and analyzing log files would be a good thing to do. I think that's a Statement that most could subscribe to it would be good if we looked at the warnings in our log files and take or don't take actions and Finding your stuff is something that would really be helpful in a professional field There are ethical problems if you Have your own stuff and everybody in the world can track it But just in a professional environment Just think if every worker just spends five minutes searching for something and we're just talking about Personal office supply or tools. We are not even starting for the production materials flying to the company How much productivity is lost and it's not lost to fun activities like playing minesweeper It's lost to said activities like searching for stuff One part of those sensor data and stuff We are really talking about with our customers which they want to implement We're talking about the maintenance aspect of stuff if you want to have an air condition that runs with a Very high margin that it will really be keep on running or you have a heating system that keeps on running you have to do regular maintenance and That regular maintenance is set on specific time intervals and then you have to change specific parts much earlier than needed and Sometimes if you have a set interval, it will be too late and the machine will break down so the regular intervals give you additional costs to Because you have to do the inter the changing and maybe you do it too soon or too late because you have fixed in the walls There is another way the fix when broken maintenance method You just wait until your air conditioner breaks down and then you start fixing it The good thing is you only change the parts when they're really broken You don't change them too soon bad thing is of interruption of service now That's the part of those sensor data and industry 4.0 and even of things. I really get It's really nice if you grab the sensor data from your machines that are really working air conditioner heating Machines in the production floor in a shop floor and you analyze the data and see what happens Or that's not happen and can have a dynamic maintenance to save a lot of money and you can keep this stuff running Now why should you share those? Logging data we are the internet and consolidate it in a big database or somewhere else you could always do it locally and We all know those jokes about the check engine light. You can fix it by a post it putting on a check engine light I know it from developing software for end users all the warnings I really saw how the training in the second or third generation for my Software was oh this warning always comes just press ok so the local handling of error messages of warnings of check engine lights needs a lot to be desired and Those local check engine lights those local things giving information They relay on a specific set of parameters that are frozen when the system is developed when it's Distributed when it's rolled out Maybe in newer generations for this machine. We have new consumables new filter materials that are more Thorough they take more dust out of the air in the air conditioner thing or they have a longer time and Your given set of parameters is very static if you can Communicate it to a central system. You can do it and you can also add new analyzers so our focus I Have been moving backwards our focus more data has to be stored somewhere more data has to be analyzed Now why should you store that data in post grass squirrel something about post grass squirrel? Post grass girl is an open source object relational database system very important It's not just a relational database system. It's an object relational database system. So there are some more features in and just tables It's more than 15 years of active development We are strongly confirming to the official standard of NC as well You never hit 100% of a standard But we're very hard trying to be very close on it It runs on all major operating systems Currently we are facing out OS to and some very exotic HP Unix version for post grass girl 10 and I put something very in both letters post grass girl was built to be extended that is getting new components into post grass squirrel was Land in the design of post grass squirrel Some specifications Very good. The maximum database size you can put all your Twitter tweets into post grass girl It's unlimited the maximum number of rows per table is also unlimited Currently we have a maximum table size of 32 terabytes and a maximum fuel size of 1 gigabyte and that's Sizes that really are tested which are supported and which work in practice a Very very important part and the speaker before me was talking about the license is the post grass girl license. It's kind of MIT license That's the legalese. I've shortened it do what you want with the software if it breaks don't come crane crying so You can do what you want with that software you can bundle it you can resell it and This gives us some very nice aspects. The first thing Within open source software there is not every open source software is equal some open source software is owned by a company and companies are Sometimes very good for the society think of kiwi.com what they did an awesome party last night Sometimes they have do no evil for some time in their slogan, but sometimes they become evil or sometimes they vanish so I'm a Company owner myself, so it's no capitalism critics. I just know That computer companies are not here forever and their current policy is not here forever So it's very important post grass girl is owned by the post grass girl global development group It's like python, which is owned by the python software foundation just more safe because the possible global development group It's not really a legal entity. You could sue or get from the planet The open development of post grass girl is very very open We have an open discussion about new architecture new features new comets. It's open on the mailing list the Discussions that lead to go one way or the other are always Resurgible if we are talking about complex technology also things the database is not doing is very important The development history what led to do it this way all is out in the open all the thing is doing We have a big resilience of against being bought by some company which happened with other open source projects We as second quadrant are one of the biggest contributors to the current post grass girl code We have not been there for all time so 15 years, but currently we still Contribute less than 50 percent of the new code current things is 30 to 40 And we have the big group of four of 22 committers on our stuff But even if somebody would buy us a second quadrant as one of the big comm contributors to post grass girl It would not really matter to the project. There are still a lot of more contributors and if they buy us our People would not have to stay within the company They could go out and work as a developer somewhere else That's gives a very very strong Resilience to the post grass girl project and that gives the trust which also led a trust in me to put my money into post grass girl The license of post grass girl is open for various business models Which is very important because you can't survive by selling stuffed animal with a logo on it It's not possible. You need to try different business models. We at second quadrant have the Business model of selling support and development and early access other companies like our main competitor enterprise DB They sell Properitari fork of post grass girl with some additional features green blum Translator storm DB did the same thing. They made a own version. We know of many Contributors to post grass girl have kind of an Einstein model of their employment there are government employees in Vienna or somewhere in the United States and in their time as a DBA they also do development for post grass girl and We have a very specific culture of post grass girl development. There are some big companies like edb second quadrant credit if Cybertech crunchy data solutions and some more all over the world. We compete for the same customers While contributing to the same code base So it's a very interesting and not conflict free environment, but it's also very forward-moving now Why store all those data in a database and why no relational database? The idea is if you get sensor data log files posh isn't tracking from your stuff and put it into a database You have a common ground to put your analyzers on it that analyze can be machine learning or Programming to analyze the data. It also can be data scientists And that's the big benefit if it's a relatively relational database because there are many many tools For non-programmers to do data analytics on relational data So post having post-class because intermediary for all your data of your internet of thing and it's report one zero Gives the possibility of people from other fields outside of computer science and programming To come in and analyze the data with their tools and with their methods Why should you store document data like sensor data log files and stuff into postgresquare? You could use a document database Postgresquare since three versions has a chase and be data type to store documents in it and Postgresquare still has a proven query language sql structured query language proven by many decades of experience Designed for non-computer science to write queries chase and be data may be queried by sql Jason be data has a full index support So searching for attributes within your chase and be objects can be index supported and the postgresquare Optimizer knows how and when to use which index to just get the right rows. You don't have to program it yourself We can later. I call it the Hannah Montana Miley zeros pattern have the best of both worlds like pulling out regularly queried attributes from your chase and be object into columns and What is seldom known when you just read the actual blocks of postgresquare features those chase and be data types are Founded on multiple generations of code So it's not a brand new fresh code. Of course, it's fresh But it was used and used and used before it started out in 2003 by Teodor Siga F With the age store data type which was made to store something. That's like pearl arrays it was great if you were an highly trained Russian astrophysicist To very efficiently store the data The challenge was if you were not a highly trained Russian astrophysicist It was very challenging to to deal with age storm. So It took off in specialized communities like the astrophysics like the biological data community But for the whole thing it was not Really graspable what you do with age store and how to put data in now on that foundation of the age store All the experience of indexing non-relational data that stored in these fields on that experience Chasing be and chasing data types were Integrated into postgresquare. They share a big common code base within postgresquare The long experience with indexing the stuff very good indexes for this stuff optimization of the stuff and the chase and be data tape chasing that runs the web No, how to do chase and be That's a book database very short. You have a title case initiative text. You have an ISBN which has its own ISBN data type and you can do an index on Books with the pop info field and the publishing into a full field can be a chasing be Object with any number of attributes and just by doing this index You can search on any Attribute of all the chase and be objects index supported You can search for them anyway without writing your own program just by querying it But by setting up an index it's index supported index supported just a short explanation If you read through all the lines in a database you have to pull a lot of data Which takes a lot of IO if you have an index. It's a shortcut. It's like Searching for a word in a book all the time or just going from the index in the front of the book So that's the kind of query you can do on chase and be data. I have just one example that parallel narrow realization I Suggested I told you about When I'm developing database applications currently I always have this the Germans will know it set B foul It's for special use that is I always have one column in my tables For the specifications my customers did not tell me when they asked for creating a new table and This table is created as chase and be so every Attribute that belongs into that table that was not discussed when the specifications were written. I'll just put into that chase and be thing so I have a user table without having the Twitter handle in I put the Twitter handle into the Chase and be field and it's stored if there's a next cool thing like Flip chat or twitch at which is Twitter, but for chatting and with dogs or whatever I can put the new handle into that set be fee now if That Thing really takes off. You want to move it into the relational field. You can add a column for that attribute from your chase and be and you can transform all your stored data and You don't have to adjust your applications. You can write a trigger Which every time the row is updated looks into the chase and be field and writes the stuff into the regular columns So you have the Hanaman Tana my Lisseros. You have the best of both worlds You have the chase and be which still can be used and you have the column of data Which can be used in your analytics use or reporting stuff And if you have migrated all your application to use the column, you can drop the chase and be attributes storage for good Why columns and why in PostgreSQL? Why not just move a primary index and those chase and be why do all this dance to get the best of both worlds? We have documentation which is given by the column names We have additional optimizations that are even stronger than a gene index that we can have on chase and be and we have a lot of Analytic tools which really deal well with column of table of data Now another part which makes PostgreSQL really really a good choice for all those new data types from new sensors is extensibility Extending PostgreSQL by a new data type which can be used from all application which can be index supported is something that can be done in a PH not you don't need a PhD as you need to master thesis something around That's the effort to get it and a lot of people do it. I have one example LiDAR data one of the big things with all those self-driving cars coming up and Currently it's used for geographical things the planes fly over the city and do a LiDAR scan you have lots and lots and lots and lots of data types data points and To star them takes a specific Specific form because if you would just take every coordinate in between in teacher into one column The overhead with those millions and millions and millions of points would just not make it feasible to search something So one guy created a PG point cloud data type and he's with the Canadian government for nature research and he just extended PostgreSQL by a data type specific for LiDAR Where he has found specific ways of storing? Optimized LiDAR data. He compresses it He makes it specifically searchable and he just takes some features of that LiDAR type which are specific to it and Those optimized data types that's been in PostgreSQL from the beginning It was designed for it and with all those new sensors reporting stuff We have not been thinking of last year PostgreSQL is prepared to put those extensions in one thing my colleague Alvaro and the other developers of PostgreSQL put in into the last world versions is Very cool. If you have a lot of data that has kind of a normal natural ordering What's a normal natural ordering if you have time-based data like log files log things Usually those lines in the log files have an extending time a scanning time You don't jump within time coordinates. It's slowly growing if you do a lighter scan By a plane that plane usually moves in one direction. So you have a natural natural order of your data the print data type block range index Allows you for every block of data on the hard drive on your system To just store the beginning and the end of that data Instead of having an index to every row in your data You just have the beginning and the end of the data block That's similar if you look at those old libraries or in the old companies when you have the physical Storage of customer things from a to b from c to e from f to whatever and we store it in the print index What's good about it? It's a very very small index for a lot of data The bad thing is from the index you don't get directly to the row But if you have something like temporal data, which is strictly ordered by time or you have something Geographical data which gets from a plane scan and you sprint an index. You have a very small index very small index Leads to a lot of index in your cache Maybe in your main memory Maybe even in your level three or level two cache of your processor So it can be used very quickly and you get to the right data very fast One thing that will be in PostgreSQL 10, which is scheduled to come out in autumn this year declarative partitioning What is partitioning and why should we care if you store your data in a table it will be Coming in and append it to the end We're just talking about a pen only to the end and the end the end the end now if you have log files of Robots in a factory you are not really interested what this robot was doing two years ago That's why Skynet came to power. Nobody looked what it did two years ago But in a practical environment you don't look at the data that happened two years ago you need just the last six months or four months or three months because Lay earlier data is no more relevant for today's actions So you need an efficient way to delete the data that was produced More than six months ago more than seven months ago more than eight months ago If you have you have everything in one big file so January February March April and now in May you decide to delete January you have a hole in your File database file which will be filled up and if the data from May which will be put into those Free space is longer than January than you have some here and some here which makes it hard to search also Deleting data from a database takes a lot of IO Because we have to declare every record as invalid That's the way PostgreSQL works if you have a petition for every month You just tell the system drop all the data from January. You have a petition for January and You can just drop that petition. I have put it out here with data for February and Data for March and the declarative petitioning in earlier versions from PostgreSQL You had to work very hard to do this petitioning You had to write trigger or rule codes to move the rows in the right petition and do this now with 10.0 And that's one thing Robert Haas and the other guys from EDB really pushed forward. You have this declarative petitioning Which allows on a declarative way to say yeah put these starters there and then have to dropping Looking really forward to this feature Another stuff We have a lot of features in PostgreSQL. There are Additionally special features. I told you about that point a PG point cloud before I also mentioned briefly Not really reading out, but it was on a slide post just with the geographical data we have a lot of more extensions out there running and if you have extensions that's something like packages and We all love a specific programming language Which currently I think we have five to seven packaging systems in Python Which is great to have a large variety which is challenging be who has to deal with which one PostgreSQL it was worse. We didn't even have any structure to have those extensions So people distributed make files and an instruction how to install it which kind of worked But was not really scalable Now since 9.1 my former colleague Dimitri was the lead developer of those extensions We have a great extension extension name and auto extension update so the whole life cycle cycle of Extended functionality for postgreSQL databases can be managed with commands create extension update extension modify extension Movie here, so we have a nice way of packaging and distributing those information Now another thing to look for in Those Internet of Thing and industry 4.0 application You might want to communicate with the devices somewhere in the world and you have a master database where all your company data is and In the internet exposed database you just want to have the data that is really really necessary To communicate with one specific device because every data. That's not in that Database cannot be compromised or stolen. So have a slow one With 9.6 9.5 somehow We invented logical decoding in postgreSQL logical decoding gives all the changes that happens to your database in a logical Kind of human readable format, but also in a machine readable format So every change that happens to any table in your database is Written by postgreSQL in a binary lock anyway and with logical decoding We found a way to extract information from that binary lock and use it for different purposes So with PG logical, that's an open source extension distributed by second quadrant Based on that PG large on that logical decoding within postgreSQL You can do the following you have two tables that are relevant for your R2 DB to supporting database for microservice and you just want to Replicate those two tables to the outside you use PG logical to get that streaming data outside transcode the write a headlock in readable form and That's essentially a thing. You set up the note your master database You create a replication set which tables Should be replicated you add the tables you want to be replicated to your other database to that set and on the Smaller database which is with your microservice or in the internet or something you create a Subscription to that publishing on your master database and you can stream those changes out And the good thing is those data that's streamed out does not really put any strain on your transactional processing you can do the same way the other way around so you have Data collected by various databases only for specific devices and You can stream those changes with the PG logical decoding which is include the decoding is included in postgreSQL That's a free and open source extension BSD license Stream that data from multiple databases to one master database so that Collect the information of various branches and put it together in one big database now we found a way to do it online without in have inflicting the actual processing on the databases and without writing etl script or something another feature that's becoming better and better in postgreSQL is The realization that it would be nice to have every data in postgreSQL, but it's unrealistic You have legacy systems in other databases. We have new legacy system data in New humongous data databases or you have very strange data sources And you want to consolidate them all in one database because you just want to have a report Across the production data the economic data and some marketing data, and you want to consolidate in one report PostgreSQL has the foreign data wrapper. That's an implementation of the SQL management of external data standard Since 2003 in the SQL standard we implemented in 9.1 for reading external data in 9.3 reading and writing to other databases that is not postgreSQL from postgreSQL and the integration of the external data and the Features we are able to put in those foreign data wrappers get better from every version There is one very fascinating thing. There's a multicorn library Which allows you to write foreign data wrappers in Python? Why would you do that? They have one example on this side You can create a foreign table Which reads your emap account? I map account so every Kind of data that's some some somehow accessible by Python You can push into a postgreSQL table So you can join your production data with your Twitter feed or with your IMAP box. Is it really useful? I don't know The little stuff who has ever been trampled by an elephant Oh Give the sky a towel. Anyway, has anybody I put it here has anybody bitten by a buck or by a fly Yeah, so it's a little stuff that Counts so one major nuisance of Sorting data and ordering is is collisions because If you're still in ASCII with a to set it's fine But if you need to order Cyrillic characters, or you have the people from Miley Cyrus. Yeah, cool If you use Jason B, you can also have Hannah Montana very good Anyway, so if you sort data and you have people with Croatian or Slavic names Or Turkish names. It's very challenging. What is sorted? Why and even if you go to Germany? There are different discussions when to sort an umlaut and how to do it. So collisions the funny thing about those collisions are the linguists and the computer scientists are still updating it and PostgreSQL does not really reinvent the wheel. So we were using the operating system things like Chilip see to do those sortings and Now from how a database works you have indices which point to records and the sorting in those indexes are Based on how you sort values and looking up those values an index also needs the same sort Order than when the index was created Just imagine the index is sorted by another way It's from set to a and you expect from a to set you never find something now the operating system guys in Minor security releases of their operating systems. They change the sort order. So you had a b in version one and b a as the correct sort order in version two Not with those letters, but with Cyrillic or whatever things now with 10.0 my colleague Peter Eisentraut Uses the ICU internet components for Unicode created it and we are basing or you can optionally base your collisions on ICU Which additionally gives you a correct sort order for emojis now with PostgreSQL 10.0 Not only can you make your users happy with having a stable collection not corrupt your index by operating system change But you can also Correctly sort those things Peter wrote in blog posts and there's even standard how to sort emojis another good news one of the challenges of Current of later PostgreSQL reasons was the interface we had PG admin 3 which kind of worked which was nice It was great at PG admin 4 and we got some challenging discussions within the Booth we when we were on conferences Customers when I went there After they got to know me that I'm not a bad guy. They asked hey I had real challenges when I tried this PG admin. Is it me and I know it's not you so We had second quadrant hired two guys in Brazil William and Raphael. They had written Omni DB They were young. They were not with us. They wrote it in ASP.net and good news That's the final present from this talk. We have a Europe Python 2017 preview they rewrote it in Django and Python and it's available in this URL I have two screenshots. We have auto completion and We are tolerant. It's Omni DB it connects to many many different databases and In their wish list what they're planning to program is even a support for Moving data between or structure between those databases. So it's free and it's Available as a preview. I asked those guys to just create it for Europe Python So I have something to give away at the end beside my towels and that's present and That gives us two minutes for questions. Thank you very much Well, okay one question Okay, you showed us the example how you make an index out of the of the BJ some few Yes, I was wondering is that possible to build the index only on specific keys in the BJ some field in order to improve the performance Yes, yes, yes that that was Possible sorry, I skipped about that's possible since a long time In Postgres we have partial and functional index you can create an index on a part of Sorry, that's not a chase and be that's the other one Okay It's crushes. Anyway, you can create an index on just one attribute. You can create a functional index You can even do that with normal columns to create an index on upper first name functional index have been in postgres girl since Nearly forever and they can also use the chase and beam so you can have an index on just one attribute. Yes But we're very proud that you can have one index and have all indexed cool There are tests and compared to the loudest Document database on the market. We get similar numbers or faster numbers. All right. That was it There's another question time is up time up. Sorry. So you have to find them as this booth So I'll be at our booth or somewhere around and the guy who got bitten by an elephant. There's a towel for you Thank you very much. Don't forget that so and don't forget to rate the talk and