 Yn ymlaen i gynllunio'r cyfeiyde. Rwy'n gweithio ar y cyfeiyde, rwy'n gweithio ar y cyfeiyde am y Jeunga Khan. Rydyn ni'n cas yn ymddangos, oedd yn gwneud yn ymlaen i'ch cael... ... yng nghyrch a'r cyfeiyde o ddeithas cyfnigawdd. Rydyn ni'n cyfeiyde ar y gyflwyno am y dyfyn. Rydyn ni'n cyfeiyde, rydyn ni'n cyfeiyde i'r cyfeiyde o Gwblyn, I also wrote Django Migrations, I am a senior software engineer at Eventbrite, we have a booth just down the hall as you come a visit, very lovely people over there, and I have a small hatred of my sequel which is not, it's somewhat relevant here but this is actually a little less bashing on my sequel than I used to so if you're looking for that look out for it. But what I want to address here really is, a common thing I found, especially when I was a younger, more junior developer in the industry, is this idea of people just saying, just do this, we do things this way, and often there's no explanation why, people are like well, you should just do this pattern, especially like a university, this is often quite common, for a place of learning it's much more dictatorial, we just design things like this, we normalise like this, and sometimes you don't quite get the backstory, so what I'm going to try and do here today is show you why by counter example, I'm going to take eight or nine different bad ideas and just run through with them until we get to a point where it actually turns out what the problem with them is rather than just saying they're bad. And this is all part of what I call learning from failure, so if you saw my talk at Pycon this year that was about, it was called what can programmers learn from pilots, and a big point of that talk is failure is an expected thing. And in particular one of the things with both aviation and in some way with this is that every rule, every reason there's usually something has happened to cause that rule, so in aviation every law an accident has usually happened to bring that law into effect. Like every time a crash or incident happens we look at the results and institute a new law and that goes into the books to stop happening again. And in software development that is not quite as coherent, people have their own little specialties but a lot of the accepted knowledge is there for a good reason because people decades and decades before us have been through the same stuff, they've been through the same wheels, they've reinvented it. So let's come to the first example here which is a cycled space log. Now a space log is a way of digitising the written transcriptions of the audio between the Apollo and Mercury missions from NASA in the 70s and 60s and 80s into a sort of website format. Now the only format these were in beforehand was a scanned, angled, badly typewritered PDF. So like you know they typewriter it in the 80s, stuck it into a folder somewhere in NASA, then about 20 years later it comes along, scans them, don't scan great, they're in an angle, they're hard to read, you can't search through them, the OCR doesn't work. So we took all that data, we cleared it up, we put it into a nice website. Now there were two key factors with space log that are relevant. The first one, so kind of an excuse for myself is that space log was written in a week here. This is Fort Clonk on the Isle of Aldony in the UK. Now Fort Clonk is a little unique, it is a Napoleonic era seafort. This causeway here floods at High Tide, isolating it from the entirety of the isle, and the island is very small, the island you can walk end to end in about 20 minutes. And even then like the whole thing isolates off at High Tide, it's a wonderful place, it was occupied by the Nazis in World War II as well. But the key thing is it's part of a thing called Devfort, and Devfort is a thing where you go somewhere for a week and it's a very intense week of writing a single thing and launching at the end of the week, which is my excuse for the very bad architecture I'm about to show you. So the nice thing about space log is that it's all very old data, this isn't going to change, like Apollo 11 isn't still going on, and so we can be pretty sure that all this data is pretty much read only forever, because once a mission has happened the transcripts don't change, there's no user features. I highly recommend you make a site with no user features, it's great, because there's no logins, there's no moderation, there's no spam, it just sits there and presents to the world. And naturally in our somewhat locked away panic state we're like well it's read only, what's really good at read only stuff? That's it, Redis is great at this. Redis is fantastically fast at reading operations because you can say hey here's a key, give me a value. And so we took all the data, we split it into different chapters, sort of like oh here's take off, here's entry, here's going around the moon, and we put them into Redis keys. Great, done, ship it. Well not quite. The problem is of course that you need pagination, so like okay we'll put every entry into its own key and so we can load, we can do a multi-get on the keys, still very fast, and we can load 50 of them at a time and do pagination that way, perfect. And we assumed that the numbers incremented and didn't not see things like that, so okay cool. So now we have pages that do one multi-get against Redis, still pretty fast, that's fantastic. Then you have the problem that well actually we don't have purely numerical identifiers on these, so we need to give each of them a score, which is its position in the chapter, and then query which keys we want as names by a range by score, and then take that range we get back and then put that into the multi-get and then do that. And then also we need to get the details and the pictures for each speaker, so we're going to then take all the speaker names in that result set and then make them into a list and query that list against another collection in Redis and get the details back from there. And of course what we've reinvented is select at that point. This is a very roundabout way of doing a select with a join and getting it on the table. And of course it was much slower than a real database was because it can't do the optimizations in memory. And so to this day, Spacelog still runs on this truly special Redis installation that we wrote. And it's certainly fine, and there is a varnish cache in front of it because it's not necessarily great at doing things like showing you random pages, but it does tend to work. But this is a place where a normal relational database would have been a much better solution, and we were blind to that reasoning until we got to the aim and were like, oh, we obviously should have picked this four days ago, but it's now day seven, we have to leave in like three hours, we can't fix it. So it's a somewhat unique situation there. And this kind of comes on to the second part of this. It's like people like to ignore join. And this is a bigger thing when MongoDB and things were much more popular. And people have this impression that joins are really bad. And if you ask any person who does relational database stuff like, no, no, joins are a very important part of the whole thing. But no, no, joins are bad, slow, and to be fair joins are slow. So let's assume that joins are really slow. Cool. So let's go, well, how do we do stuff? So we have some documents, let's call them. I have some forum posts by myself. We have a few other authors, and we have some author records. So imagine a forum you have avatars, you have real names, you have like date joined, that kind of stuff. And what we want to do is we want to show a page in the forum. So what we want to do is we want to take all those forum posts, and for each one of them, we want to find the corresponding author document and put it in place. But of course, to do this, the naive way is to, for every single post, scan the entire, say we'll find one, put it in for next post, scan it, and that is very slow. That's n times n operations. Now, you're not stupid developers, I know this, because you're all here, and you're wonderful people. So you would of course go, well, Andrew, it's obvious. You can take all of the authors as like a collected set, like I said in the last example, and just do one query with them. I was like, that's great. So what we can do now is we can scan all the posts, we can then build a dictionary that says, well, this post has this author ID, and then do one big author look up, and then take the bulk result and put all those details back onto the posts. Pretty simple. You've reinvented hash drawing, well done. This is what join does, and join is better at this, because it's not having to roundtrip across the network. And so when I see people reinventing this, I'm like, that's literally what a join is. But of course, some people go, no, Andrew, you're doing it wrong. In documents, you embed things like other documents. Okay, let's do that. Let's put the authors inside the forum posts. This is kind of the recommended way of doing stuff. So now you can just fetch the post, everything is great. That's fine until, so let's say a common feature you see on forums is a date, so like this person was last seen at. So let's put that into our author record. Now, this means whenever I log into your site, you have to scan every single post in the entire database, and on every single post I've ever written, you have to update that record to be the right thing. And so now you've made the sign-in process potentially 30 seconds to a minute long if I've been around the forum a lot. And this is why normalisation exists, right? So it's one of those things that, like, if you've been down this rabbit hole, you will have come across this particular problem, like I have in the past, and gone, ah, that's what we do it that way. But it's a very important observation of, like, you might think, like, oh, people change their usernames like once in a blue moon, it can't be that bad. But there's a lot of other things on a user that can be very rapid fire change. And you've made every single change on that document, on that table essentially, very, very slow. Another one that is definitely somewhat in my own fault. So I used to work at Lanyard before we were acquired by Eventbrite. And one of the things we had at Lanyard, and we still have at Lanyard, is a very wonderful set of cached redis values. So in particular, so this is better than the first example. These are things that should be in redis. So, basically, we take our data set, we work out a few key things, like, well, these are speaker scores and whatever, this is, like, some clustering information. We put it into redis for later use in the site, so we can sort of get it out quickly. But the unfortunate thing is redis, at least in the past, and MongoDB in the past as well, when it was happening, they had a, you know, they had a saving system, and neither of them were particularly great. So redis is one in particular. Redis, it's an in-memory data store generally. If you want to persist, it would write a brand-new save file out. So write it to disk. I finish writing it, okay, I finish writing it, looks good, then it would delete the old save file and move the new one into place. Fantastic. The problem is when you run out of disk space, or rather when you have less than, twice the disk space for this file, because you try and write the new file, but you run out of space to write a whole new version of the file, and redis just doesn't write it. And in the past, when it was happening, it used to not write it, but it kept going. It kept accepting requests, it kept serving things from memory, and so all this data was just going into memory and not being persisted, and you had no idea. We didn't have monitoring at the time for this, we do now. And so, of course, come one day, that server reboots, so it's fine, and we lose six months of cache data, because that data was only in memory. It was never persisted. Thankfully, it was all rebuildable in a couple of hours from our main data store, but if something more valuable was in there, it would have gone away. And then, on a personal project, like a year later, and I should have learnt my lesson, but I clearly didn't. And another thing I learnt in the hallway this morning was that MongoDB used to do this as well. MongoDB used to, if it ran out of space, just corrupt its own save files. It wouldn't even try and keep it... So, you know, I hope that's now being fixed as well, but there's a lot of things with running out of disk space. Give your database servers a lot of room, because you do not want them stopping to serve requests, or just corrupting themselves, or even worse, silent failure, which is keeping on going, and you think it's fine, but it will just... It's just eating all your data, essentially. So, be aware of that. A small example of how it could be bad, you know, say we have event-prite pays at many to organisers. Now, our system is much more robust than this. This is not how we do it. I might point that out. But if we were, like, young and naive, we could do it like this. We could do a query to find our unpaid organisers. We could write a row to say, okay, this one's being processed, because obviously, if it's a cron job, we don't want two of them to run at once and both pay the same person. So we lock the row and mark it as processing. And then we call our bank, as an API, and say, hey, can you send account numbers on so this many dollars? And then when that call comes back as a success, we mark them as paid. But of course, what happens if your database isn't taking rights but not saving them? Well, you never mark as processing, and you never mark as paid. And so your cron job continuously finds all the clients who were ever not paid and continuously pays them money, which, as you can imagine, is not great. But this is why we have proper locks in place of outright and systems and things like that. But like, if you were doing it naively, be very careful about money. So another thing is, is that we come to the other end of the scale now. So previously, we've been people who were like, well, this normalisation database stuff it's a bit much. And now we come to the person who goes, I really love normalisation. I love tables. I love them so much. We're going to make a table for everything. This was me three years ago. So at Lanyard, and Lanyard had, when I joined, it was just Twitter authentication, and we were like, okay. We're going to add emails and passwords. We're going to add LinkedIn. We're going to add Facebook. We're going to add all of the socials. And just have like a, you can sign in with any number of things. And I went, that's great. Now these different backends are different things. So like for emails, have a table with email and that kind of stuff in it. Facebook, have the Facebook tokens, LinkedIn, each of their own sort of identifiers. And so we made a table for every single one. Sounds fine. Now the initial problem with this is when you try and show somebody they use a profile page with all their things set, you have to go and query every single table and say, okay, well, you have, and then for every like, you have zero LinkedIn accounts, you have three GitHub accounts, you have two Facebook accounts and Twitter accounts. That's not too bad. That's like six or seven queries on a given single page. Login is fine because login, you know the methods, you just hit the one table. But the problem comes when this page, so this is a speaker or directory page on Lanyard. And if you notice, there are some icons below the people. These icons show that all the accounts they have assigned to Lanyard. In particular, you know, they have at least one Twitter account, LinkedIn account. That means for every row here we're doing a number of select queries and a number of like seven select queries per row. And so this quickly spirals into a, oh my goodness, we have to like do seven queries per row on this table. And like we can kind of aggregate them ish but we can't really, it quickly makes what could be a simple single select into a much worse prospect. And this in particular is a while to find and fix and clear up. And the key thing here is that database isn't magic. It can't just, if you give it more tables you're kind of promising a database you're going to use them sensibly. You're saying like I am a competent developer. I understand that tables cost space, they cost time. They're meant for querying things that are separate. And so the correct solution to this is to have a single table with a type column in it and it's like oh, type email, type LinkedIn, type Facebook and then like an identifier column where you're like you put the one key thing you look them up by and then it's a single query. But we didn't know that at the time necessarily. And following on from this you have a person who really, really, really loves tables. So I have been developing south of migrations for too many years I think it's about eight now. And I got a lot of emails. Now a lot of the emails are like Andrew, status terrible doesn't work. I'm like this is okay, we'll fix it. And something I was like Andrew how do I do crazy things X? And one of the crazy things X is I want to make tables at runtime. My reaction to this is this picture. So in particular I've had people request so I18N is a very common request here. People are like hey I want to make a column per language. So this is like Django model translation for example. Like you have a multi language site people have to insert content in multiple languages. So naturally people's first instinct is to store one per column. Or one per table in some cases. Like oh this is the English table, this is the French table. Sometimes I had people come to me and go well we're doing kind of a white label hosted solution but we want to have like a whole set of tables per customer at which point my reaction is how many customers are you planning on having exactly? And then some of the even worse ones are writing a CMS and we want people, we want random users in the organisation to be able to just add columns by clicking buttons in the back end of the CMS. I was just seeing them clutching my head going why? So let's take one of these examples which is the language one. So say you want a column per language and say you become a site like Facebook that has a global reach. There are approximately 300 to 400 language variants. Obviously there's usually one per country but there's often regional variants. So for example there's British English and American English and they're very different. Like you spell colour differently in British English say bin in American English you say trash and a lot of other words that you want to make sure are translated properly. And so you've not got just language changes, you've got locale changes and imagine you have a couple of those columns per table. You're looking at a table with over a thousand possibly 2,000 columns. That's not sensible. Like your database isn't going to be happy about that. In particular if you're just trying to select from that table and don't do column restriction it's going to try and send you back every single column every time and that's going to really, really hurt your network connection and slow you down a lot. So don't do that. If you do want to this model then be very wary of this but make sure you use .values and just pick out the columns you actually want rather than requesting everything. But even then I would really like if you didn't do this because even if you are adding columns if you have the wrong database, mySQL. Or if you have a default value in a column adding a column is very expensive. It has to rewrite the storage for every row. Some tables at Eventbrite can take days to add columns to sometimes because we just have so many so much data in those tables and we use mySQL much to my dismay. And so you've got to consider that like sure if you add a column but if that's an interactive thing, if you have a web page where you click a button and it adds a column be prepared if that web page takes potentially hours to return because it won't cut off with a time limit. But you've got to consider that adding columns isn't necessarily cheap and fast and there can be table locks in place there can all this kind of stuff and Postgres as usual does a lot better at this kind of stuff and we'll let you get away with it now and again but it's still tricky. So I kind of recommend that what you do is you either store your translator stuff as like JSON blobs if you can retrieve from within JSON blob like by language. So for example in Postgres you can select just certain keys from a blob in JSON. HStore is similar in Postgres where you can say ok this this field is like title is just a HStore of ENGB is this ENUS is this, FRFR is this and you can say select the language from that thing. Or the generic way is to have an entity attribute value style table which is what EAB stands for which is a table that's just like ok this is column this is language this is value and joining it to those things is something that database is going to do pretty well and it will save you a lot of headache and trying to look at the schema and not having it scroll off the side of a screen and like three miles down the street. Another fun one is this didn't happen to me thankfully is the cold boot. This is a fun one because it gets really good engineers. So you've written a site the site has a very good cache hit rate like you're 75, 80, 90% of the page on the site hitting the cache and you're feeling pretty smug and even better you've got like you've bought down your app servers so they're running at sort of like 70% capacity like you're not wasting money you've got a good cache hit rate you're doing fantastically well engineering good job however I'm evil I'm going to come in and make those cache servers go away they could have a failure, they could just stop working but let's say the data with them is lost permanently so in the case I'm talking about a power failure hit servers and they're running mem cache so they lost the data now you have this problem where you are running at 70% capacity with a 90% cache hit rate that means that without the cache you've got 10 times the traffic coming to those back end servers what's going to happen is your normal rate is going to plummet when the cache drops and then as soon as people can get back in they're going to just overwhelm your servers with 10 times the normal traffic and you're going to be drowned and even worse as soon as you hit the cap on those servers things start backlogging and filling up a backlog or sort of like the sockets will sit there waiting open and it becomes this immense firefighting situation that's just really hard to recover from and it can take hours to repopulate that cache so one of the things we do at Eventbrite when we put builds out is the builds actually pre-warm and cache before they go fully live just to prevent a smaller version of this kind of thing happening but it's a very sort of nasty thing because it can happen to you with a really well planned perfectly done system like you have to think can your system handle either scale in a few minutes if that's something you think you can do on a cloud server or do you have a spare capacity or can you turn off features for that recovery period or one of those things so it's something to bear in mind the next thing is something I call the optimist so the optimist is a very enthusiastic developer and the optimist goes okay I've been through a few companies now I know what I'm doing we have some great stuff and we're in a brand new start-up small company, agency whatever and we have to build a new project and so I'm going to take all the things I learned previously and we can use all of them at once so we can have postgres we can shard it because we want to scale in the future we can have elastic search research we can have react for stuff that we think is inconsistent but we can do with storing we have Redis there's like a sort of cache slash data store work around we can have some flat files logging and big blobs and stuff and do all of these things at once and it seems very attractive it's a very sort of tempting situation where you're like oh well they're all shiny as a developer at least I am I'm just drawn like a moth to a flame it's a new data store and it's like wandering towards it we can put this into the site and install it, it'd be great one of the things that landed in particular was that we weren't allowed to add a data store without removing a previous one and you'll see why in a second because sure you have these, let's say 5 or 6 different data stores so of course for all those you've got redundancy and you've got backups for all of them so that's your ops workload already much higher than it was and then what happens if any one of them dies likely if it's a new project you haven't really built it so the site can handle them going away for example what should happen if elastic search goes down your search has stopped working and just your search but of course what happens is elastic search is quite good at a lot of things it's good at doing big aggregate pages so you start being like oh we'll build a homepage of search as well and then we'll build all the sort of intermediate pages of search as well and just have the main pages be the main view so suddenly you've made elastic search a key part of the site that if it goes down you'll consider a major loss and similarly to Redis we use Redis for our sessions and if it goes away it's a crucial part of the site and what you've done is unless you're very careful you've made every single one of those back ends into a single point of failure because if that thing goes away some crucial part of the site dies and even if you don't think it's crucial like some other person like if it's their pet project or like it's a very important marketing push or whatever it might be crucial to them so you have to consider this thing where sure more data stores if what your problem is isn't shaped to an existing store very well so say like you know you're trying to put a graph database a graph so a strong graph database into like postgres it can do that it's not particularly great at it but it can do it so if it's a really big part of the site like hey we want to get performance improvements here we want to have a proper store that's fine but it's not free every store you add is extra redundancy work extra backups you've got to make sure it's working not running out of disk space like I said before you've been going to make sure that it never goes down it's just making your surface area bigger so like if you're a small project or company really try and use one or two things it's a lot easier when it's three in the morning and one of them goes down because that happens like much much less of the time there's only two of them and if it's redis or postgres or something those things are pretty much solid and so like you're working on a lot less than less reliable stores mycicle it's fine the one thing that's good at is staying up generally and then this comes to a different kind of optimism which I've seen quite a lot is we all have and Django kind of encourages this Django has a primary key field that is an auto incrementing integer in particular it has this sort of wonderful thing that as you sort of add more records it increments and so people go that's great so we have this number that goes up that means that the highest value primary key is the most recent one I mean that's not true right so first of all if you're running that for precise ordering well different pages can go through to the back in different times so like you should put timestamps on them for example if you're running a cluster that's definitely not true and clustering is the problem here because this works on a single instance but you quickly realise that auto increment is a really nasty feature to scale to the point where Twitter wrote entire software stacks to do this for them to just make sure all their numbers auto incremented roughly correctly because if you think about it if you've got a cluster of like 20 database storage servers they have to all agree on what the next number is and they have to not pick the same number and so you end up with all these solutions like well okay we'll sign we'll use the servers in the ring a number modulo 20 and so this one only does like 1 this one does 2, this one does 3 or like well have them skip around and it becomes very difficult to have numbers that even scale roughly correctly so assuming this is kind of a bad thing and the other one I've particularly annoyed is ID's are numbers we can do maths on no they are not they are opaque things you should never treat as numbers I have this big thing that we've been doing this internally in Eventbrite for a little while is making ID's as strict even if their numbers are making them strings because they are not numeric you can't do maths on them right that doesn't make any sense I can't add 3 to an ID to go somewhere it's not even like a pointer where I might do something useful it's just like you know if you try and do pagination by adding numbers to ID's what are you doing I've seen occasions like people want to inflate their ID's to look good so for example we might start a brand new site and go ok we don't want to have this customer's page be ID1 because it's really bad for our customer so we'll start at number 3804 but we'll do it all in the front end so we'll subtract that number when every wordcraft comes in we'll add it when it goes out you see where that ends up right so if you want if you want ID's but don't give away how many rows you have if you want ID's they're built into databases they're basically guaranteed not to collide we have native support for them in Django 1.8 it's great don't sit there and be that person who does this but of course we come on to my favourite part of the presentation so I was at Placon, Ukraine many years ago 3 or 4 years ago I saw a wonderful talk and the crux of the talk was it's Postgres and Postgres can run Python like why are we wasting network overhead querying to and from the database doing views when we can just run it right there and save the overhead and so Postgres has functions now I have some problems with functions and sort of standard queries a lot of enterprise software in particular will write all of their complex operations as basically functions so to add a page you just call select add a page and then bracket stuff and the idea is it's meant to encode all the complex logic into the database so it happens there you can't version control that code that code isn't in version control it's in the database somewhere you need to put it in a migration system to have it version controlled and even then it overwrite properly and it becomes a whole mess but that's not even what this is about and I will say I've never seen this in production but it's such a great thing I had to show you anyway so their Python functions you can import stuff in them in particular you can do templating these Python functions and so if you were say particularly enthusiastic you could go well we can render the template straight in the database we could just use ginger in a Python function and at this point you're thinking that's a silly idea, no one would ever do that well here is the code that does this if you want to go to it, if you want to have internet right now I'm going to show it to you right now on the screen so that SQL file I just showed you bit.ly-whynot-pg contains a schema and a function that renders HTML directly from the database in particular you can just you do select render brackets template name page name and it outputs here we are you get this out you just get a column back which is the rendered HTML of the page and even better because Postgres has regular expressions you could put the URL routing into a select query that use the regular expression from the page it's awful that's the code the code is not to be used in production this is a very express thing from me but I just here's the example of that text and like you know it's pretty simple import ginger pl python will just import things from your standard system python path we can just execute a select query against another table in pl python it's pretty great it comes back as a dictionary of column names to values we then just select the template string from another table and then just pass that to ginger like ginger will take the dictionary it will take the template name and just renders HTML it doesn't know it's doing it in a database and you could do more, you could do views in here you could do custom saving stuff you could do all manner of things don't but you could so yeah that's my thing and the presentation ended up with this like why why is this a thing it didn't come with quite as strong a warning as this one does so it's possible don't do this do it in Django Django is written by mostly sensible people myself excluded there's a reason behind every rule there's a reason why people generally stray away from stored procedures and functions there's a reason why we don't generally make a table plus individual type and all of these have exceptions right like you shouldn't, nothing in programming is a hard and fast rule in particular a lot of people just won't believe things or technology changes and gets faster and we get better CPUs so you should every piece of information or recommendation you get ask why if somebody tells you don't do this be like a small child and go why because often there's reasoning there and it's very sensible reasoning if you come into a company that's been around for many years they'll have certain internal processes and design patterns and often there's a good reason why things are written like they are but as a new engineer your instinct is to go and say this is all terrible let's rewrite it and you're going to that sort of pothole well actually there's always hidden requirements that you never realise and then it'll take you like a year to rewrite it and end up with the same complex code as before so just take a moment to ask why or just if it's a simple thing just as I've done here take the thing you think is good and just try it see what happens see what the eventual result is sometimes that's not possible because it requires a scale to break but you can do like thought experiments and just work through it in your head but just don't write them off without context I don't know what they're doing don't dismiss 40 years of database research don't dismiss people who know what they're doing there are many engineers at Eventbrite who are much better than I am who have been around the industry and programming for a lot longer and they're full of really interesting and useful information from very esoteric parts of programming that there's a very good reasoning behind everything that they say and sometimes there's not a time to say why the thing is like that is sometimes there is so just ask why basically thank you so I believe we have some time for questions so the question is basically like going into company and not how do we get the knowledge and talk about here in the first place how do we get that embedded knowledge of don't make tables for every translation it's a difficult problem part of it is doing this kind of thing and talking to people and giving explanations but I think part of it also is not going into a company and be a positive influence you can be that person who asks why and occasionally the why will be we don't know why and then they can go and ask why and you'll discover the thing is a terrible idea so if you just sort of if you leave something better than when you joined it's a cleaning up rule so you should ask why and you should try and come in because there are things that you as a developer will know better than other developers everyone of us has our specialty we gain it slowly over the years but it does happen because you'll be better at so don't be afraid to ask why and then provide like well I do it like this way because of this I think slowly but surely you can improve an institutional knowledge that way to make things better the question is what's my feeling on columns like JSON and HDAW like schema data types in columns in relational databases I really like them, I'm a big proponent of mixing schemers so in particular one of my big things I like to do on new projects and things where I can is in any given table there are certain columns that are core they're important like the id column the name column, title column, created date and there are columns that aren't as important there are things that could be stored in JSON so what I like doing is if you're going to query on a column make it a full column if it's just sort of a use of configurable things like a CMS is a great example of this you can have a CMS where every CMS page has a URL slug it has a created date, it has an author it has all this stuff but they might have a bunch of configurable things put those in the JSON blob and put the core fields as table rows and then you've got this good combination of if you add all the the complex changing a lot of the time stuff is in a nice blob, doesn't need schema updates and the core fields that you're querying and ordering against are all in proper table rows and so I think that's a great combination and even without a proper JSON type you can do like in MySQL as long as you're not querying into it you can't necessarily query into them as efficiently as you otherwise could but it's still a perfectly valid schema design I think thank you very much