 Hi, hi, hi, can you guys hear me well? Hi, I'm Mark, I'm currently working at Vicky, basically streaming Korean drama. And today I'm going to talk about Database Siko. The top subtitle will be there and back again, just a cue to the Hobbit movie. And today we're going to take a look at how the Siko code in Go work. Now we've got a journey from the code to the backend database, which I will be using Postgres and back. So hence that title, right? So the motivation for the top comes from one of the stuff that I currently do. So we stream movie in Vicky and we have around 8 to 9 million user. And a lot of our code path actually touch upon the database, right? So earlier this year we have a quite a severe outage where due to high demand, a lot of our service ran out of DB connection and causing the site to go down. So we were able to mitigate it, but as part of that we realized that we don't understand the code that we use to interact with the database enough so that we can prevent certain outage from happening in the future. So reliability was a concern. And hence I did a bit of investigation on how the Go code actually work for database. And I'm going to share some of these fighting today, right? So please note that most of this will be based on Go 1.7. And Go 1.8 have a lot of changes, positive change. And I think one of the talk later will address that, right? So the basic API for working with SQL in Go is pretty simple, like on the stuff. Like what you see on the left side here, so basically you just open supposedly a connection to DB, run some queries, scan the result. It looks incredibly simple, but if you dig deeper into the source code you would actually see that there are many things that is abstracted away from you. So for example, you created a pointer to that DB struct. It's going to be shared across multiple Go routine. Is there any kind of synchronization so that these Go routine can make use of the database? For example, that's one, how does your code actually come from taking the argument which is data represented as Go type in memory and translate that so that it can speak with the database. And then other things like what kind of resource does this code actually use? How much memory does it consume? How much DB connection does it actually consume? And how does it behave at scale? So all these details are abstracted away from you using Go incredibly simple API. Most of the time, that's all you need to know. But when you start to run things at scale and then when you start demand reliability, then you need to go back into this question. And so we're going to go with the most straightforward approach, tracing the code and trying to see how Go does that, right? So as a starter, I'm going to have, so I have a basic database with, I have one table that I'm going to be writing to. So GDP table, pretty lame name. So, oh, sorry, right, sorry. You have a screen. Okay, let me. Okay, all right, so this is basically the database. So just a table with two columns. The first column ID is of a type, an enum time. I'm going to use Postgrad as a backend here since it's one of the popular ones. So let's see. So just to see what does it entail, right? So it's just an enum with two values. And then the table has three columns. One is that type. One is just an integer that can be null. And then another one is a timestamp that is generated at insertion time, right? And then the code that I'm going to use for demonstration today is just like this. So just ignore the go-to thing. That is my node. So it's just simple, open the connection, I'm going to write some entry into DB. So we're going to use this query. So this will insert an entry. At the same time, it's going to do a select and return the inserted value. And then we're going to scan it. So again, very simple thing. But once you go deep into the source code, then it turns out quite complicated, right? So we'll start. Okay, change back. So at high level, this is, let's, okay. So we'll start here. We're going to open supposedly a connection to Postgres with this data. So and let's dive deeper into the code. So tell me if it's hard to read so that I can change it somewhere. So for this sample code, what I have done is I make a copy of Database SQL source, put it into the vendor folder and do some modification on it so that we get some locking information out as opposed to the normal one. So we'll start from here. So the open code that you see here will go to this place. Can you see? And then it's going to do a few checks, but ultimately it returns you a struct with some information like what the driver is, where's the data source. And there's two important things. And then it starts something called a connection opener. So if you look at this, this is simply just a loop on a channel. Whenever there's a signal sent to the channel, then you open a new DB connection. So actually what we believe to be connecting to Postgres actually doesn't do anything at all, just returning you the struct and it only happened much later. So let's go down further. So we got to here. So here we execute a query. So this is where the meetup on the Database SQL thing is. So this, let's go 1, 2, 5, 8. So this is where it started and there'll be a lot of commands you'll see later. So it called the query function. Query function called something called query context with a background context. So basically that's just a normal query. And then this comes down to here 1, 2, 6, 3. This is a transaction right here. And so what it does is for maximum bad retry connection, I mean so far for the value of this variable, it will try to query around the query. And this is set to 2 as a constant. So interestingly instead, like what we perceive as a query actually get executed twice maybe. And then it called this small query function which goes to here. And here the online 1, 2, 6, 3, we actually get a DB connection and execute a query on that. So again details is a lot more fuzzy. And how does this work? So let's go deeper into that function. So feel free to stop me anytime if you have questions. So basically what it does is trying to get a database connection and at some point later it will call driver.open so that maybe I'm jumping a bit. But ultimately if you trace the code it come back to here driver.open. And this is where it called the actual driver. So the standard lib database code doesn't really understand how to talk to a different database at all. It relies on things like libpq to interact with those. So again we need to go into that to understand it. And we are here. So coming to here, following it here. So basically trying to dial an actual connection to Postgres. So the way Postgres works is that you will have multiple clients. So which is basically your go code here that will connect to the database through a persistent TCP connection or through a unit socket. And then you will start the connection. So it will start by, so the client will stream data to the backend and it will do a few things like establish the connection, set up SSL, the server we authenticate and then it will say that it's ready for command, for query and then the client will send command. So this is where, back in the code, so this is where within the code that we actually execute anything. Right, so we are back here, we are back here and we are back on this line. So we establish a connection and then execute the query. So what does this do is, so a few important part is Postgres doesn't understand the data type that you have in your Go program. So for example, in our program we have, we insert something like the ID and the number. And this, the type is a country type which wrap around an integer and the other one is a SQL node in 64. So these are not things that Postgres understands. So the driver, we have to serialize this. So what it does it in this driver argument function, which I'll go into details later. So we'll skip this for now and then it will run the query, Postgres will process it. It will return with some information. So particular to Postgres, how it works is that it's going to return a few pieces of data to you, think of it like a table. So it first need to describe the table, send the data over, say that the command is complete and it's ready for more query. And in particular, so for goal, right, so in the code we run the query and get the roles return, right. So, oh sorry, right, so here, right. So what goal store, so it store a few more things, but the important part is the driver role and in Postgres this are exactly stored the information that we have seen earlier, which is, right, which include like what is the column, what are the name, what are the type, there's something called the OID of the type. So the type identified file in Postgres, the format, which might be text or binary. And then we will call the scan function in our rollcode to pass this information out. Right, right. So back here, so we are somewhere around like 57, finish the query, handle some error and try to close the connection and so on. Then we try to scan the data returned by Postgres into our struct. Again, what Postgres return is not understandable by goal, so then need to be some conversion. So unlike the normal code that you would see, I don't really need to do typecast here, I can naturally suck in the custom type that I have declared. So this is because we have declared the conversion elsewhere and I'm going to go through that later. But ultimately, there's four steps and then so serialize, send the query, get the result in the Postgres format and then deserialize that to the goal type. And then let's talk about how those things actually happen. So there's one very useful package if you write a lot of SQL code is the driver package include with SQL. So this allows you to define your own conversion of goal custom type into the form that understand by the DB. So for example, I start with the method value here, implement the value interface and what it does is it converts your custom type into the Y format. So this goes here. So earlier I showed you the function called driver argument, so that converts the goal type into the Y type. So basically eventually it leads to here where it tries to detect if your custom type implement a value interface, which we did by declaring this and then do a casting. So without this, then you have to write your own casting function like for your custom type and it will be pretty ugly for your database code. So that's one. The other one is when you scan data. So again, let me go back to the connection just a second. So scan function here eventually will reach to, so it read the data, do some again do some check and then finally the important part is code convert aside. So this transformed the Postgres data into goal again. So this ultimately go back to here where it do a lot of checks. So just in simple term, then what it's trying to do is detect the type that Postgres send and then if it's one of the basic type, then try to cast it back to goal. If not, then it will utilize things like the scanner interface that we declare here. So this would be somewhere here. So we go down to two, four, one. So there's actually a lot of checks for that. We go down to here where if you implement a scanner, which we did on live 14 over here and then it would try to do a scan and pass the data into the scanner function. And then for us, because the type we have is pretty simple, it's just the data in goal is an integer. The data in Postgres return for this particular thing is a string representing the enum. So I just cast it back to string and map it to the corresponding value. But if you have more complex data type, it's possible to convert a map, slide anything you can think of into something that Postgres have. So this is very useful if you're making custom use of your database. I'm making use of the enum here that usually not available for draw SQL. So in the end, what it does is it helps you clean up the code and utilize things that the database provides. So far, we have talked about what happened when you do query and we have seen the contact in one particular TCP connection to Postgres. What happened when you run concurrent workloads? For example, you have a web server that received lots of requests and only interact with the DB. So as I mentioned earlier, the connection to Postgres only get created when you run the query. When you don't have enough connection, then what you get from the SQL DB is actually a connection point and it will detect whenever you don't have enough connection, it will send a message to this thing, the opener channel, and that will trigger a function to establish a real TCP connection. So code wise here, I don't think it's... So I mean, you can go through the code, I'm going to send the slide later. So if... Does the pool ever sink? Yes. So the pool in the DB SQL work, something like with some cleaning up mechanism like GC. So when you have finished using a connection, it will mark it as unused, return it to a pool. And then you have something like you can configure how many idle connection you can keep at one time. And if the next... Is this configured at the trial level or at the SQL level? The SQL level. And yeah, so if you have configured that and then requests for connection come in and it's free, then you get it, otherwise it will try to create a new one. So I mean, these are just theory, it's not very interesting. So what I have tried is I have made a workload simulator to send different workload to the database. So there's a program, but I don't think you're really going to be interested in the source code. But what do we... So what... Because I modified the DB SQL, I was able to try logs like this. Wait. So I was able to log out things like when I release a connection, when I set... So then there are things I set max idle connection, set max open connection when we actually open new connection. And then when we mark it for deletion and so on. So when you run the program. So you can get some kind of analytic logs that looks like this. So it's a bit tedious, but basically this is a unique timestamp up to nanosecond. And then there will be events that are going to list it here. So for now, we will try to get some information on certain events. So the first one is do work. So this is where I am in the main program. I will ask it to try to insert entry. So do work is basically this function. And it will just insert entry into a table. And when it's done, then it will say it's finished, just simply as that. Then at the same time, so then it's finished, then you will see something like this. And then in the middle, then I need to open a connection. So we'll track something particular here. So Piki opened a direct TSP connection. And then there's this put connection. So I see two Piki opens there, they are concurrently access? No. So because I got to run workload with concurrent GoRoutine, so it definitely won't have enough connection. It will call on that. And hence, you will see multiple Piki open. So anytime we open a TCP connection, then you will see that. And then when we put something back to the free pool, then we will track this event where we append to the free connection pool. So in the interest of time, then I have pre-generated a few set of workloads. And I'm going to show the graph first. And then we're going to try to run some live thing later. So just take a note of the file so it tells you what it does. So for example, here we run a sample one-time workload, basically spinning up 1,000 GoRoutine and send 1,000 do-work requests to the DB. So you see a big spy at the beginning. I have a few more similar. Let me try this same thing, but we just send 3,000 at the beginning. So this looks more clear. Right, so the blue is the do-work event. So this is when we ask the DB to insert. When it's finished, then we'll see an orange line, which is not very prominent. It's somewhere here. Like you can see some orange thing here, not a lot. Then the red, pink color thing is when we open DB connection. And here we open a lot of them. Because we didn't, when I started this program, I set the open connection limit to zero, basically no limit. So it just run free range. And because it run free range, the DB was not able to cope with it. Hence, there was some event here that I tracked also. So it just say it can't insert because there's no more connection. So we tracked that using the black color line. And then you see it rising somewhere at the bottom, at the end. And here to let it keep. So after the initial phase where it successfully insert the field stop, then it will start yielding the connection error. And this is particular bad after you get the big spike. So this is one. This is the same graph at different resolution. So just to see it better. I mean, so this is pretty expected because we don't have any limit. So a more useful one, this would be more like a standard workload. So I ran a sustained workload with a limit of 20 open connection. And there's this multiplier. So basically how many of go routine with respect to the open that I'm running. So 20 by 0.5 in 10 go routine. So this will send write request every 50 milliseconds. And over the duration of 30 seconds. How are you traveling? You're traveling the interval or traveling the number code? Both. Yeah. So I can configure that. So I mean, this would be very difficult to something that you run with a web server. You get a constant stream of requests. So as you can see, we only open the ping line is again number of connection we open. We only open a lot at the beginning. But still within our limit. And then I think it's not 20 years. So later, it opened a bit more. And then that's all it's going to open. And there is no error. So again, just stand that stuff that you would expect. This works fine even when I crank it up to two times the number of go routine. So I'm running 40 go routine in this case. Again, at 15 millisecond interval and over 30 seconds. Just pumping write request. It's still pretty safe. So I mean, this is pretty basic. I won't draw any conclusion to it. And the conclusion would be obvious that you don't want to let it run free range. Always limit the connection. But the technique like modifying the driver and locking this data can be used to understand the behavior in different scenario. You can track maybe memory usage. I mean, that there's already profiler. But other things like how often does it clean up connection so you can tune your idle limit, for example. So let's try to run some real workload here. Right. So we're going to remove everything. So we'll run a sustained workload of over, let's say, 50 connection with 10 idle connection over 30 seconds with an interval of, again, 50 milliseconds between database write. And then let's try to crank it up to four times the workload. So it's going to chunk out the data so we can see. Let me make sure that I don't kill my laptop. So it's going to be around 30 seconds to write on this thing down. And then let's draw some charts on this. So this is a bit hard to look. So let's try a different resolution. So you're going to use 500 milliseconds as aggregation. So this would be the same thing, but different resolution. So again, so we did open quite a bit of connection. But at the same time, we are also freeing them. And if we look at the stack of that, so I thank Liberty to grab out the stat log. So the database SQL also provides you with a DB stack function, the set function that provides how many connections it's using. So we never really exceed 50. But looking at the graph, then you would see that we kind of open and free a lot of connection. So in practice, this behavior might not be good for your application because every time you open something that is latency. So running some analysis on that would turn out to be helpful. So so far then, we have seen the behavior at a bigger scale. I'm sure you guys have higher scale than this. And it would be behaving differently. But the rule of thumb that I would draw from this at least would be always set max open connection so you don't run into the case like what we have, like failure because the DB cannot cope. And you can use some technique to help with capacity planning. So max open connections is not set by default? No. Yeah, it's free by default. And then some tuning will help with the application. So earlier, we have covered about driving your own conversion function for your custom type. So this is it again. It just helped clean up the code base. And then I think another question that usually come up when we talk about database, like C code in Go, is whether to use an ORM or not. So I'm going to try to address it today too. So I started with Ruby and Rails. So one of the very natural first thing I do in Go was trying to find an ORM and try to write my own ORM. It didn't turn out well. I think over time, I reverted back. But I think the basic problem that comes about that trigger people to ask for ORM is that you don't want to write lots of code because one piece would be marshalling data back and forth. The other part is you need to generate a query that almost looks like the same, but it just operate a different table, different data type. So the first one is kind of solved with the custom conversion function. You just shell it into some other file and just forget about it. And then the second one, so this is my personal opinion is that when you use ORM, you're effectively trying to generate C code using some syntactical rule and it will be very hard because C code is a very expressive language that you can really generate on the enumeration in a short amount of time. Most you will end up something like Rails where you have certain query patterns and if you keep using it, then you end up following that same mental path and you will not be able to make use of functionality of the database. For example, like if you use something, like some ORM out there right now for Go, you might not be able to make use of the enum feature in the Postgres database, for example. And that might be helpful if you are sharing your data with other people and then there'll be other things like material view, joy, multiple level of joy and so on. So you can optimize a lot within C code itself and using ORM will dismiss that chance. So for me, I would say you would probably use a few pattern only. So try to optimize that, run the raw query. You can prepare the query for efficiency, but don't take my word for it like experiment and try. All right, so I think that's it for me. I think a lot of things we haven't covered, like we haven't really actually walked through the whole database source code. The fact is it is incredibly complex inside. Then we haven't really looked at performance, like memory usage and so on, beside just the DB connection thing. And then we haven't really see how the DB connection get free as well. So as well as the failure handling. So I don't think we have enough time for that, but I'll leave a few links. If you want to dig, there's always a source code and then there's a few links that you can use. So for me, I start by writing my own kind of DB driver and understand it a bit more. And then there's a few interesting stuff, like this SQL mock from data doc. So you can effectively mock out the DB so that you can test. Then if you drive your own driver, for example, two different, say, different kind of database that may be similar to SQL, because the interface of SQL is just query and argument. So maybe even maybe graph database might be possible, or you can try to write something like Google Cloud Spanner that just came out, for example. So you can use the facility that in this Go SQL test. So this is developed by one of the Go team member to test out your driver. And then as I mentioned, then you need to understand the protocol that you use to connect with the backend database. I'm actually more interested in how you handle a request pipe. Do you have a short circuit breaker mechanism in front of your database, or you set max connections and use errors as a short, as a circuit breaker mechanism? I think... Let's say you have a huge pipe, and you know the database can't handle it. Right. So you limit the spike on your database, set max connections, that errors tell you to circuit break. Okay. I think it depends on how you write the code also. So if your code doesn't know how to handle, like it always assumes the good path and doesn't know how to handle a spire error, then you better add some circuit breaker before the code. If you know that... I mean, if you are fine with returning 500, 400 from the service, when you see a database failure, then I think that's fine. I understand the motivation behind this, because I do a lot of database queries as well. My services are very small. So they come nowhere near to hitting the limits of the database. Right. So I never, ever actually face these issues in production, but I like to gain insights for people like you who run it at scale. How do you handle problems like this? Do you circuit break at the database level or circuit break at the application level? Right. I think I do both. So we usually, we try to rate limit at the API, the front end server level. Right. So that's the first defense, right? So in worst case, if it flows through and then you see a lot of error, maybe you can circuit break within the service itself to drop the health check and solve requests for me. To understand your workload better, is it a mainly write workload or mainly read workload? I have both. So I mean, because of the way we write the app, so some of the path is really heavy, but they still go to the DB. And it's not possible to cache that yet because we need consistent data. So I mean, it's just because we have some design problem. It's not, but it lead to this thing, right? The best way is... So you need the asset database. Yeah, so I mean, something that provide consistency, like a hundred percent guarantee. So as you need that, and hence the simplest solution for us right now would be going to direct to DB. But it's not ideal. You have a single postgres database. Yeah, unfortunately. At the heart of the operation. Yeah. Yeah. Sorry. Right, so I think... Anyway, so that's all. I think later on, there might be some details on Go 1.8 change to the APC code, which you should hear about. I'm not sure. I'm not doing that. Oh, it doesn't break my code. But do check it out. There's a lot of positive change on that. And with that, I would like to end my talk.