 Okay, thank you, Cynthia Welcome before we start this nice picture. I found here is from a metropolitan Museum here in New York. So they just recently published like 350,000 pictures to the public domain You can go and search to the entire catalog and download whatever you want Okay. Yeah, my name is Andreas. I'm actually from Germany came over from Europe here Part of Postgres Europe we run conferences in Europe all over the country or all over the continent I work for a company called Pivotal, which is a spin-off from EMC and we do Postgres and Green Plum data warehousing Okay, do I have to tell you about Postgres? No good Yes, you mentioned please shut off your whatsapp Okay Quick question. We're talking about data types. How many data types do we have in Postgres? Any idea? Good. Okay, when I looked into my database, I found 361 That's a lot. That's too many Well, actually we get one data type for every table we have so we have table types as well if you include any table type we have then I end up is having 82 it's still a lot and Luckily, we have the nice documentation from Postgres. That's a long list of Data types we will see in every postgres database. It's postgres 9.6 Bear in store all of them are there How many of them do you know? Yeah, okay in a documentation. We'll find this list It's still 41 data types which comes with every postgres database and We're going to talk about each and everyone. No I understand I'm between you and the afternoon party. So let's make it quick Okay, how many data types in postgres are you actually using? To five more give me some answers, please five ten Ten is a lot already Anyone more than ten? Yeah, well, how many different data types are you using in postgres? That's the highest number I've heard in this talk Okay, we look into some basic data types we have starting with text and numeric look into dates JSON XML Boolean and also into binary data and network data types just for you Let's start with text. How many text data types do we have? We have three so we have watcher. We have char and we have text in postgres and Internally, it's all the same they don't different different shade here Only option you have for watcher and char you can actually specify your lengths Which is optional for both of them for text. You cannot specify your lengths Keep in mind text data types in postgres are text sensitive case sensitive So if you want to compare case insensitive, you have to use I like instead of like If you come from other databases like my sequel, that's a problem because they compare case insensitive Any text data type in postgres can actually store up to roughly a gigabyte of data If you specify a length for watcher it will store up to the length to specify Excluding any white space. That's a bit complicated. I'll show you next slide if you don't not specify a length for char It will be a one byte char So it's probably not what you want to use you want to specify your lengths for char here, and that's a difference between Watcher char and text. It's all about white space handling in the end Looks a bit complicated. Let me walk Through this so you see on the top we have fire. We have three watcher with lengths of one five and ten And I always specify five five white spaces here And on the bottom you can see for Watcher I always get my five white spaces except I specified one as a length. Let's cut off and For char I always get the length. I specified here. So it's Filled up with the lengths I have I specified in the last Line here. I specified char ten. I only have five white spaces. So it's filled up with another Sorry, it's filled up by postgres with another five white spaces to make it a char ten Okay, how does it handle in that internally postgres uses by default pages which are eight kilobyte? Anything must fit into an eight kilobyte page in postgres So how can I store one gigabyte of text in eight kilobyte? Any ideas well, I can get a chart I can compress it Actually, what happens if is I have rows and for every row I have I have a page header and my row and everything which has a viable length like text watcher It's not really stored in my table, but it's just a pointer to another table, which you call toast table here So I have my four white pointer to my toast table. It's all handled internally by postgres In my toast table I can have as many rows as I want up to my one gigabyte of lengths So if you run a vacuum on a postgres database Occasionally you will see that it also does a vacuum on something called toast tables If you actually look at the output of vacuum That's where postgres handles all the viable lengths data types here. What about this char? 255 anyone here has a char 255 in the database Why? What's the reason? Yeah Access or maybe coming from my secret the same problem So I looked up how many projects are using char 255 or watch at 255 You can see it's roughly half a million hits here I see roughly on search code a bit more than 10,000 projects are using this internally So it really doesn't up to doesn't apply to postgres So there's no length limit 255 in postgres if you find this in postgres There must be a very good reason like compatibility with another database or Maybe developer just doesn't know what they're doing But other be to do in postgres in in my secret If you want to put an index on an index key it can only have up to 767 bytes So if you have an index column is 10,000 bytes will actually only use a 700 something bytes for the index That's three times 255 for utf mysql handle its utf or SS three characters as three bytes That's their limit of 255. They have okay By the way, any question you have please just ask numeric data types So we have integer data types. We have floating point data types We have numeric and sequences Let's have a look. So for numeric data types We have small int using two bytes integer using four bytes and big integer using eight bytes. I Haven't really found a use case for small int so far Maybe you have fun Keep in mind if you have small int columns in a table and you mix small int and integer columns You actually lose space on disk Do I have an example down here? Well, let's say you have small int integer small int integer You think it's 12 bytes It's actually 16 bytes because postgres will align it to four bytes So after the first small int here It will insert two bytes space which are not used after the second small in will also use Not use two bytes because it needs to align the integer to four bytes So if you just rearrange it to small in small int you have four bytes here That's fine. Then another four bytes was the integer here. It's 12 bytes. Yes, please On disk Don't get this question. Sorry The we order the column solves the problem. You have to recreate the table. You cannot really we order columns in postgres We cannot really say okay order table Move column one to column three You have to do a great table as it's like for me all table into the new table Which is basically a new table with a new this layout Okay, double precision. We have wheel using four bytes. We have double precision using eight bytes Keep in mind even if postgres shows you the exact number as output. It might be inaccurate internally Just to give you some examples here On the top I have six digits Caster to wheel in a moment. I use seven digits here. You see it starts rounding this number Because it doesn't really fit into a four bytes anymore. Oh another example. I have seven digits here in it on the bottom And it starts rounding my number to One million two even though I specified one million one. It's always inaccurate So if you want to store any kind of money numeric data whatsoever, please don't use floating points Same example for double precision here. So as long as I use 15 digits internally It seems to be fine the moment I exceed my 15 digits. I go to 16. It starts rounding my output Internally internally it's already rounded so it's always stored as an base and an exponent Yes, it's always Yes, even though your output seems to look correct. It might be already changed internally so if you have something Some in any covered numbers like coming in data from your network monitoring or such It's okay. You can use floating point numbers If you do any kind of monetary values, please don't use numeric Please don't use floating point numbers Okay, we have a money data type in Postgres Please don't use this as well They are like two users in the world using money data type It was called dead two or three times. I already was erected because we have well two users It can store a huge amount of money actually so it's a numeric data type internally You see it can handle up to 92. What is it? Here So whatever you have as a debt here in the US, it's like 19 trillion last I checked it still fits into a money data type But don't bet on it one of the problems you have is money is it always comes with the currency as a representation So whatever you specify as as a monetary here might it be your dollar or whatever It's always in your output. You have to deal with it And you cannot really transform between different currencies here So if you want to store money, we have to use numeric And numeric has a position up to 1,000 digits If you ever have a use case where you have to store more than 1,000 digits, please let me know. I Haven't found one so far So where you use it is to specify The total number of digits you have like 10 here and a number of digits after your decimal point so I have seven digits before my diesel my point and Three digits afterwards. So it's total 10 and then three One of the downsides is it's a well-handled in software. So there's no headway support Unlike an integer, which should just load into a CPU register. It can be processed very fast In America's handling software, so it will be a bit slower Anyone knows this guy Anyone playing chess? Anyone knows this guy. We heard about the story. Okay. Yeah, so they started putting one rice grain on the first Chess board and then just doubled the number all the time. So what how much rice are we talking here? Any idea? Be careful. So we are talking about two to the power of 64 minus one If you use double precision here, you see it already starts becoming very inaccurate here But if you use numeric Where's it to? Numeric to the power of 64 numeric minus one you see it's using like 20 digits here And we have still 980 left in numeric Anytime you need to have precise numbers you must use numeric Or the other exception is like if you do money you can just use pennies instead of dollars and Store everything into a big end as long as it fits into a big end But then you have to do all the calculations from pennies into dollars and vice versa. Yeah, it's a bit slower Okay, then we have sequences a sequence is basically what you know as auto increment from other databases you specify your data type as a serial and not as an integer and Postgres will always increase this number every time you insert something By default sequences start by one and increase by one In the positive direction and they will not cycle it means if a sequence comes to an end It will just stop and generate an error when you ask for the next value But you can change all of this you can have a sequence going backwards You can have a sequence cycling around you can have a sequence going five steps forwards at a time all of this works and One exception sequences are not transactional whatever value you ask from a sequence will not give them back if you roll back a transaction So if you insert like a thousand rows into a table with a sequence you generate or you consume 1,000 Sequence values you roll back the sequence values are gone The next one will use 1001 So you can ask for the current value of a sequence by using a curveball But it only works if you in your current session. I already inserted something so you cannot ask for the value Which was inserted by another session? It always has to be your session You can generate with next wall or you can fetch the next value of a sequence And then you can go and ask the same value again using curveball You can also set a sequence to a new value using setwall and some people like to have not so many Big gaps in sequences if they roll back big transactions So you can just go and set a sequence to the maximum value you have in your table right now It's easy to do Okay, there's one function you should keep in mind. It's PG gets a sequence By default sequence name always looks like that's my table name. That's my column name And that's my that's a prefix post as it's here But don't really bet on it if you use PG gets a sequence will always tell you the name of the sequence attached to this column It's very good for automation. If you just find one to find out the sequence name for specific table The sequence is really just another table in Postgres so you can select from a sequence If we do this to get a number of columns, so it's my sequence name. It's the last name as last value I used I see all the default settings I feel so it's incrementing by one max value shows me it's really a big int sequence And so on Going from text to date and times What is this? any idea South Pole any other options I agree so we have 90 degrees south here latitude on A bottom left which gives it away What time is that at South Pole right now? Well, if you use Postgres it can answer it Okay, dead and time Types so we have timestamp with and without time zone Which obviously stores the data in a time If you have a timestamp without time zone, it will just store the data in a time No matter which time zone which time zone you are right now if you store a time zone as well It will always calculate the value back to UTC and stores a UTC value We see in a moment. Why? same as true for time date only stores a date without any time zone obviously and Interval is the difference between two times. I show you a few examples in a moment Postgres will deal with any kind of daylight daylight saving time if you specify a correct time zone It will not deal with leap seconds So it knows about leap days, but not leap seconds because they are not in a time zone database few examples here on the top I specify an iso date here 2016 10 11 and Cast a two-time stamp and I get back the same date midnight It really makes this date or which what looks like a date make it a timestamp and I get midnight back It also can handle date formats if you specify months names like here American format months first day year you see it recognizes this format and Returns 2017 fifths of January here You can specify time zones as part of your timestamp Here I specify UTC 325 UTC and it changed my output to 525 What happened? It also changed my Time zone to plus 2 so my database server here is set to European central European time which is plus 2 in summer and my output if I don't specify it Otherwise, it's always in the time zone where my database server is running So no matter which time or which time zone I specify output always my time zone You see I have two hours time difference here because it's August and we have two hours summer time difference Same the other example here. It's a timestamp in December again It's UTC and I only get one hour time difference to UTC because we have winter time You can also specify time zone names example here, it's Europe, Moscow And if I specify a time zone name and at only a number Postgres knows about all the daylight saving times like Russia moved to summertime and didn't move back to winter time and so on If you just specify a number here, obviously it cannot know about any daylight saving time. It just adds four hours That's it. So how can I change my output? This example here. I started transaction because time stops inside of a transaction in postgres You see I have my timestamp here 155 57 seconds. I temporarily set a new time zone to Europe, Moscow and Do a select now again. You see it changed to 355 So it is a transformation for me into my new time zone, which I specified It's neat. It's a bit complicated because I have to change my time zone all the time We can do better You can also say select now at a specific time zone One moment This in this case only this now is transformed into the new time zone. I specify Everything else even the same query. It's using my other time zone where my database server is running in. Yes, please Yes, you can actually specify this that time zone on a connection basis Okay, I can also do calculations between timestamps up here. I have Fifths of yet January 2000 minus first of January 2000. You see I get four days difference. That's an interval Down here, I have I just switched around dates and I get minus three days difference as an interval I cannot only use dates. I can also use timestamps with times in it Well, I get 10 days difference 17 hours and so on and so on so because does other calculation for me if I use timestamps Yeah, then you if you want to the interval just in days and you should not specify timestamp but date so like here Well here I use timestamp But it's early midnight because I haven't have any time specified here. So it's just producing days If you cast it to days, it will just return days Okay, what do we have here? 2028 of February I add an interval here of one day in two hours and I get back 29th of February so obviously it's a leap year anyone questioning that 2000s a leap year I've seen this before Or the other way around first of January 2001 minus first of January 2000 we turn 366 days 2000 is a leap year. Yeah, I know I have dates here and just returns the day number It's an interval Yes, well the result is always an interval if you do any kind of date timestamp Addition multiplier, whatever it's always an interval result and down here next year 2002 minus 2001. I get 365 it's no leap year. Okay back to my initial question. What time is it at South Pole? South Pole is running Scott. I'm instant station. It's Belongs to the United States But it's supplied from New Zealand and for easiness. They just use a New Zealand time zone Down there. It's South Pole So if you want to know what time is it at South Pole, we can just use a New Zealand time zone Select now at time is on New Zealand Well, that's a bit cheating because it's really New Zealand. It's not Antarctica, right? The operating system knows about Antarctica as well You can really ask for what time is it at South Pole right now? Your time zone database knows about it. Okay, let's move on Anyone still using XML 2 3 4 Wow Was well he hit back in the 90s before we had Jason Postcards can store any kind of XML data type and XML fractions of data types It's a bit complicated for encoding handling Anytime you don't specify a client encoding it would just assume it's you TF 8 Except when it's binary and it's not So here we have a full document So we specify XML paths as a document here and specify our entire XML document as string And it's transformed into an XML type internally and stored. It's also validated That means it will actually verify if I have opening and closing text and everything But it will not validate a DDT. I can do the same for Just a bit of XML here. You see it returns XML type here as an output what you can't do in postcards You cannot really search in it. We can store it. You can retrieve it but not search in it Well, you can do it by transforming it back to text and do a like or such But you cannot really have postcards just looking for one specific text in a tree This doesn't work. Okay. How many of you are using Jason? One more and Jason That's a bit more. Yes, please Well, it was really heavily used back in the 90s and 2000s before Jason came around So we had it because everyone was using it You can actually make sure that whatever you store in postcards as a XML type is valid So opening closing text everything is there. That's for sure. Have fun Yeah So these days almost everyone is using Jason So we have Jason in two implementations actually so we have old format which is called Jason itself It stores the raw data and every time you access this raw data as a Jason type It needs to do the parsing so it's overhead And then the Russians came around and created Jason B Which will do all the parsing the moment you store it the Jason in your database Then you can do all fancy stuff like searching in it and selecting just parts of your Jason and so on you can put an index on it You can drink vodka and all of this works inside of regular transactions in postcards So unlike other Jason databases, we know where things can go wrong or just the vanish because Two different connections update the same value in postcards is all handled in transactions It's actually faster So here we created Jason B data type. You see I have double quotes so my single quotes here are for postgres and Everything inside my single quotes are for Jason and I need to quote my Jason text as well That's why I have a double quotes here. I can create areas in Jason You see again I have my single quotes here at the beginning and end and everything inside all the text text here are Quoted with double quotes and I get back my area as a Jason area Or as a dictionary with key value pairs all of this works And now the nice thing I can actually look into my key value pairs in this case I just want to have this value which is key to It just returns the single part here or I can Find out if the white value is in my area in a left value I don't have to extract all my data or my Jason data from my database and find out if a specific Value is in my database. I can have postcards doing this for me Same here is this key in my key list here So to is actually key in my dictionary here so it will return to for me One of the problems you have is no secret databases and no secret applications They have no schema. So occasionally you run into applications which Expanded their schema because in the end they have a schema, but all our versions of a data row might not have any Row or any column in your data like newer versions So you need to find out if specific columns are actually in your data or not That's one way to do it And then of course you can put or you can create an index on top of any json b value By using a gin index on top of it Got it This example here like greater an gin index on my table I have to specify which field should go into my index and then when I use the same Field json field in my query as a work condition then it will use an index Okay, Boolean If you ever even need to store any kind of Boolean values, please don't use a small end or char or whatever Please well you use a Boolean value in postgres When I see people coming from databases like my secret where there's no real Boolean value They start using small int Zero and one and over time. It's two five seven. What is seven in a Boolean? Seven is more two than one. It's cool Okay So we have different input values for Boolean value in postgres like two Two uppercase lowercase t y yes, and so on you can use all of them in the end You will always get back a t on f for two or fours It's just more convenient to use whatever input value you have here. So it looks like this To cast the two Boolean returns me a t and f returns me an f here a false returns me an f Very convenient to do and you can put an index on it. Just to give you a use case. I Have a table. Let's say large lock table millions of entries Occasionally there's an error So what happens if you put an index on this error column to find out all the rows which an error on it You end up is having 98 99 percent of rows in your index which are never ever used for query Or I can use a conditional index in postgres Here I just create me a table and insert a million rows and use the random statement So two percent of the row should be error two everything it should be our faults Let's look into it If I use then select count from this table and want to see How many rows are error two you see I have a sequential scan here And now I create an index on top of it where error is true So all it does it only stores the rows where this error flick or this error Boolean is actually set to two Everything else where it's false. It's ignored because I don't care about the 98 percent here and now it really starts using an index only scan if you Have this rare condition error is true and to look at sizes So I create another index full index on this column as well This table has about seven to six thousand three hundred pages on disk. My full index has 2,700 pages on disk and my partial index only has 57 pages on disk so that's 2,690 pages for the full index Which I will never ever use because that's the other 98 percent Makes things so much faster Okay, anyone using bit types For what what's the use case bit map in or bits? Maybe it's bit map Okay, we can store bits in postcards as well. We have a bit data type There's a fixed length or viable lengths You can specify how many bits you want to store and then you can do all the fancy bit operations on it like and or exclusive or and so on where you do it specify a column using bit data types specify your lengths here and then I can make postcards inserting bits value Into my tables or I don't have to do all the calculation back for my decimal values or for my binary values back to decimal Let's just postcards do it. That's my pointer here You see I just put a B in front of my Bit information here and postcards transforms it into bits and if I select for my table I have one zero one zero one. Let's keep this in mind Question come again Yes, I could store several flex and it however every flex every flag cost me one bite Even though I only use one bit It will cost me one bite Like only have one thousand five hundred columns in a table Using bits. It's a bit more efficient so At the top here. I do an end operation on my bits So I have my data column and I only want to see all the bits where Both bits in my data in my search value are set and I see it's only the Second and the fourth bit which is set. No actually, that's an x-wall Here's my band. Sorry. I only want to see bits where the last bit on the right is set and This is set in my input value because I have one zero one zero one and It's set in my search value, which is only the white bit Well, I get back all bits deleted except the one on the white and The same for all I just flip around all my bits here and I get back one one one one one because At any time one of this bits is set X or basically flips around the value Anytime there one is set. So I end up with zero one zero one zero And of course I can also shift to the left, which is a multiplication So my original value was 21 if I shift everything by one bite to the left I just multiply by two and I end up with 42 as a result I can also search in my bits. So here I just want to search for this bit and I get back my data value here because the white bit is set In the bottom example here, I shift this bit I want to find any value where the second bit from the white is set and obviously I don't have data value So I don't get back anything As I mentioned before I can have postcards doing all the bit calculation for me Which is very nice and convenient So my integer 23 here I can cast to a bit and I end up with a bit value or a bit more to the extreme I start with a bit value here I cast it to my bit and I cast it to integer I get my 21 or I can do the same with hex values as well. I start with X as FE which is one before 255 But specify to postcards. Okay, this is really a hex value and not a Not a numeric value and postcards does other transformation for me. I don't have to do this manually. Okay Anyone storing binary data in postcards why oh why So you moved away from it you learned your lesson you learned your lesson Yeah Well, if you want to do it there's a bite a data type in postcards which stores binary data There's a problem if you want to store binary data in a database occasionally there's a bite in your Data stream where all bits are deleted Zero unfortunately postcards is written in C and The bite there all bits are deleted means end of string if you send a Carry string to the database where one of these bytes appears Well for postcards. It means okay here your carry string ends and you end up in a syntax error We can't really do it this way So maybe do it in postcards is we Well actually the way you want to store data in postcards is by using whatever your programming language gives you If it's Java if it's Python whatsoever, they all have functions to deal with binary data Please don't write your own wrapper use whatever is there it works I've seen people doing their own stuff like escaping all the data for postcards back and forth don't You end up with more problems The output in postcards can have two different ways On top. It's a new output. We have it's called hex You switch it by bite a output to hex which is a default by now and you get every bite escaped as a hex value And the old output and a bottom is escape format We are only the problematic characters were escaped By using a backslash and one to three numbers Which is a bit problematic because you have to pass how many numbers are coming after your backslash That's why they switched to hex. You always have two digits But then again programming language takes care of this problem does other transformation for you Network type so we can store all kinds of fancy network information in postcards we have an INED type and and city CIDR type which can store network masks and one type for Mac addresses Which is still very good to store any kind of log file information in postcards, and please don't use char forces So on the top I store a Network here, so it's 1.9 168 0.1 slash 24 which is 256 IP addresses You see it stores this entire type in my INED here Here on in the middle adjust specify an IP address and postcards will go and assume a network mask for me In this case, it's a slash 32 one single IP address Here I only specify parts Often IP address and postcards again assumes a network address, and it's a old class C network slash 24 The beauty of it you can put an index on it. You can really speed up your queries So I have a demo table here. I insert some IP addresses here, and I create an index on it So I have two addresses with one dot two one sixty eight and one with ten In my query here, I want to find any IP address Which is in this slash 24 range and you see it returns these two IP addresses It leaves out the ten dot IP address All supported by an index fast Here's your index scan Okay, last but not least we can create our own data types in postcards. Yes, please Yes, it is IPv6 is supported as well. It works with both IPv4 and IPv6 You can have composite types in postcards You can have a range types, enumerations You can really write your own data types in postcards if there's a need to. We will not go into this detail here What we have in postcards is enumerations It's basically a list of values Other people use a one-to-end table for it You can take a shortcut and just create an enum for it Keep in mind for an enum this list should not change Well, in the end it can but it should not Question, how many gender types you know? How many? Four? Anymore? Five Well, our lovely Facebook knows 56 There's a whole iso standard just dealing with this problem If you want to handle gender in an enum We prepare to extend your enum and again and again Anyone knows this card game? That's a perfect example for an enum because these four values never change Quit an enum out of it. It's diamonds, hearts, spades and clubs and then you can use this Data type here like any other data type in postcards Makes life easy So I can Insert any of these values from my enum here into my table If I select for my table postcards does as a look up for me and replaces the numbers It's using internally with my values from my enum I can even use all of these values in my search where I can search for everything which is hearts. I Can put an index on it? Things are fast. Okay. What's missing? It's that's the next talk next year. Please come back So we have areas in postcards which are not covered here. So we have composite types and range types We have a whole range of geometric types. If you think about post JS We have UD types OIDs and so on and as I mentioned before you can really create your own data types But it's probably yet another talk. Just how we create data types So any questions? I know a party is coming up. Yes, please Yes XML is a whole different data type. It's it has nothing to do with Jason or Jason B Jason now Jason stores it as war data and Jason B parses it. Yeah Yes So Jason B is like XML just more functionality. Yes, please We can have indexes on Jason B columns. Yes, you have to specify what exactly are you looking for into the JSON data? very well So we actually have customers moving from MongoDB to Postgres because Jason B is faster in Postgres than MongoDB can do Any other question? Oh two more Yes, please. I'm a bit more Realistic if I know I have to change an enum. I'm going to use a table anyway. Yes Enum is really when you don't expect to change your value ever Yeah I mean you can do it, but then you have to alter your database all the time because Changing an enum is an DDL statement and altering your table is just a DMR statement Yes, please. Well, first of all, you can put an index on your Network type so any kind of operation you have it will be faster than storing it in a chart If you just want to store something without ever searching in it Charles probably faster But if you want to do any kind of operation like is this IP address is in this network Or I'm just looking for the specific range in it will be faster. Okay. Thank you