 Sorry I just decided to put in some pictures of girls in the district. So this is actually the company I work for right now. So we sell women's clothes online. You think of us as forever 21 for online e-commerce store. So we're literally trying to compete with them. So sorry I forgot to introduce myself. My name is Jason. For those of you who cannot see at the back, I used to organize the Singapore Women's Brigade a long time ago. And Winston... Yeah, he only talked to you. No, no, no. I'm here because of Winston. So I moved the world to Silicon Valley about two and a half years ago. There were a couple of companies. So now I'm at my third iteration for my company in Silicon Valley. So currently I'm working at Toby. As I said, we sell women's clothes. So contrary to popular belief, I'm not surrounded by girls in office. In fact, I'm surrounded by all guys because we ourselves are more like an engineering company. So we think of this as like, this needs to be optimized. Because you think about retail store, how they sell clothes, how they put all the inventory in every single outlet, retail outlet. It's so super un-automized and inefficient. So we decided, hey, why don't we have everything all in one fulfillment center. So we will build, we'll design our own clothes, we'll build our own clothes. We'll ship it to our fulfillment centers when we ship it out. When you make an order, which is that's so much more efficient than say opening up a thousand stores and trying to manage all these crazy inventories. So what that allows us to do is that we are able to refresh our inventory, our skills, like every single day. So every single day we have fresh new arrivals coming in every single day. So I don't want to talk too much about the company on the business side, but today what I want to talk about is more on the engineering side. So I've been a Ruby engineer for about eight years and I've done quite a fair bit of Rails application in my short span of software engineering years. So I've seen quite a fair bit of web apps from a Rails perspective. So joining Toby, I was like, some of the things they did, I was like, this is really unusual. And it's like, some of the things was like, hmm, that makes sense. Some things are like, what? Why? And so just want to share with you what are some of the things that I found. There's no right or wrong. I mean, I don't even know if some of the things they do are completely correct. So if you think that there's a better way of doing things, feel free to just shout it out. I'm here to learn as well. So let me start with our database, web apps. The basis of it is data, capturing data. So what Toby has done is we have a lot of tables. I think we have about 60 plus tables coming to 100. And all these are accommodated over the years. So in order to make sense of all these tables in our system, we've decided to split it into different schemas. So I don't know if you're familiar. For some postgres, you can actually have schema for your tables. Visits table actually belongs to this analytics schema. So it's a good way to organize your tables to a particular namespace. That makes sense for you. So I think we are doing this because we are in preparation of splitting out our app. We are actually one monolithic real app as of now. We are in the transition to split it up into multiple smaller applications so that we can iterate much more faster. And it's much easier to manage. So the first step for us is to make sure our tables are right. There's some sort of ownership. For myself, I'm in a marketing team in Toby. For us, we usually handle all these tables that's got to do with the analytics and marketing schema. So what is the implication of this? For us, we use active records as well. So for all of those, I forgot to ask who here is using Rails app? Doing Rails. Quite a lot. I'm not talking to M movie people. I'm not seeing it, right? Sorry. So model, active record model. So usually at Rails convention, it's like you map your model name one to one with your tables, right? But once you do all this crazy stuff, you can't do that anymore. So what do you do? You use active record as this method called SAP table name. So that's why all of our models have this imitating line down here. SAP table name, the actual table name. So I know. Okay, then the next thing that we are doing while we're trying to split up this humongous Rails app is to really name space our folders so that it makes sense as well. So I don't think, this is kind of like a stop-y app solution, you know? To like completely refreshing and resetting our apps, you know, to spin up. Build a fresh new app. So what we did was we actually tried to name space all our folders in such a way that eventually it's going to be split up into, for example, like we have this. You can see that, you know, for marketing, we have an admin app, and it's going to be handling email schedules, email schedules. So we tried to name space our folders, which is very nice. It's kind of like pointing towards that direction. So what's the result of that? Is that the actual file itself is going to look something like this. It's going to be like a name space by module admin. This is the sub-marketing and then email schedule. And it's still, you can still inherit it off active account, please. There's one gotcha with this that we found is that once you start name spacing, you might be afraid of things like name space clash. It's like you use a name space and it's like sort of, because like sometimes when you call like say something like this, this might be a relative, like a relative to a parent name space. So if you want to make sure that this guy here is, it's the actual like full path, you have to use this colon, colon, colon here. This actually means that it's global name space. So that's one of the gotchas that we are mindful of when we write our code. Is this too small? Can you all see? The next thing that I found, which I wasn't really used to it, because for me, you know, active record is fantastic. You can use like validates presence. They can use all your crazy validation using active record, right? For us, we see AR validation as more like something that you do for your form validations. But in order to ensure data integrity, we actually try to push that as much of that as possible to the database level. So for example, I say adding like the right constraints to your table columns. In this case, say customer, and I think customer table has a... Okay, so for example, a table has a name column, but it's just adding now constraints to that column, adding any other constraints that make sense for the data integrity of the data. So I think the good thing about this is that you don't just rely on AR to solve, make sure that the data is safe. The data is valid. Because if you try to save, say write a SQL query that actually inserts some data that is missing something, you will just... You're coupled, right? And then you throw back an error at you, right? Which then you know what to do. So without that in place, things can feel sort of... You don't even know about that. And for us, we actually write a lot of SQL queries, right? So we don't rely on activity as much as, say, normal Rails apps actually do. So that's why that is doubly important for us. Oh, sorry. This actually should be out there. This is what we use. Oh, I know why I put it there. And in the past, I used to use schema.rb, db-slash-schema.rb to sort of check, like what tables do I have? What columns do I have? And what type of columns? What are the constraints? I used to use that a lot. But in Toby, we don't use that. In Toby, we actually go down to Postgres, PG, and then actually do the actual query to double check what is in the table. So it has become a norm for me these days. I've become a habit already. If I want to know, like, say, this table here, or say I want to know, like, marketing schema, what kind of tables do I have inside? I just issue this command, slash dt, which means describe table, right? Marketing dot colon means, give me all the tables that are in the marketing schema. So I do that a lot, because there are so many tables that I totally forget, sometimes, like, what tables I have. And if I want to know, like, within each table, you know, what is, what are, describe me, you know, like, what are the columns and column types. So I usually issue this command, slash dt, which is describe, and describe this table. So another thing that, to me, does, which is kind of interesting, but I don't, it's kind of interesting. So, like, what we did was, like, you know, for, say, data, like, page visits. So we have this table called analytics dot visits. So, like, you already, like, usually normally people would just, like, just dump the visits inside the table, right? But not for us. And the reason why we don't do that is because we're dealing with millions and millions of inserts. So we had a very high volume. So there's no way we can, like, do straight-up inserts like that without affecting performance, right? So we needed a way to capture this data, but capture it in such a way that it doesn't affect performance. So one of our DB architect, who was a PhD, suggested, okay, let's insert the visits data, but let's insert it into a raw visits table. And what this raw visits table does is there's actually a database trigger that will actually clean up itself. So we insert into raw visits. Raw visits get triggers, triggers, a post-cache trigger, which they will then insert the data into analytics.visits, the visits table, which is the actual data that we want for reporting purposes. And after it inserts that, it deletes itself, right? So it deletes this table record. So we always truncate this raw table to a very small size so that every time we insert, it's like fast. Should, fire, forget, fire, forget, fire, forget, fire, forget, right? So we can handle, like, really choose throughput. This is my... My biggest... Which is that I'm so used to using migration in real life, right? You know, migration, you can migrate out, you can migrate down, you can, like, migrate to version... So, in Toby, we can't do that because for us, we can only migrate up. There is no migrating down. You can't migrate down at all. There's no way you can. And the reason why that is so is because we do not use Ruby code at all in the migration... in the migration file. Okay? I'll give you an example. I think I have... I set up some examples here. Okay. This is our typical... our typical, like, migration, right? So I'm actually trying to insert some new tables in some schema, right? And we actually write, like, SQL code like that, right? Just SQL code. And it's like, my migration file can actually create quite a lot of stuff. Like this. Right? So, you think about that, like, for a complex system, right? Can you actually migrate down without destroying data? Probably not, right? So, I think then it might make sense then that we do not allow to, like, migrate down. The drawback of that is that there's no concept of a rollback. Once you push to deploy the production, there is no turning back. If anything goes wrong, you will have to do patches. You'll have to, like, do fixes. So that is, like, kind of like the drawback of that. Maybe there's a way to rollback that we haven't really, like, think through enough. I don't know. But as of now, you know, like, for us, we make it an engineering-wide policy that let's just go up and go down. And just keep it simple. So, we use it for data release as well as migration. So, we actually release, like, migration files that into production that actually does things like backfilling data. So, let me give you an example. So, this guy here, what this guy does is that we are actually trying to, like, backfill some of the data from our email bounces to our system. And so we actually release, like, a SQL query like this. And this actually is our own custom batch framework. Because, you know, like, data is, like, millions and millions of backups, right? So, we have to, like, batch it and do, like, all the inserts in batches, in batches. So, that's why we have our own custom batch framework. Yeah. This takes about an hour to two hours. About two hours. Huh? No. No, we, we, we, that's a good question because we actually ensure that we actually ensure that none of our migration that we run will actually lock up the table. So, when we, before we even deploy the production, we actually, like, test it out on staging and make sure that there's none of the tables actually. So, we try our best, like, you know. Of course, there's no guarantee, like, you know, like, there's no guarantee that someone would screw up and then I just deploy a nonsense, isn't it? But that's, that's why we have, like, deploy engineers that are actually making sure that when there's a deploy release, like, he actually hawks, like, all the processes and all the DB processes and stuff, like, lock, make sure that nothing actually goes wrong. Yeah. But it's something which I'm not used to as well because I, I, agile, right? Agile, pass, build, push, deploy, right? That's, like, the agile way of doing things, right? But I guess for us it's like we haven't come to a stage where we know how to be agile with big data yet, yet. So, maybe in future, once we have locked down our processes we can be more agile. Or maybe even, like, I think, like, the first step was to split up into smaller apps so that, you know, like, deployment is actually much more sane, it's more sane about it. Yeah. Sorry. Sorry, I'm just curious, why do you want to deploy in the SQL? Yeah, I think that it's, like, got above our habit because I think this one equals one is like, if you were to without a one equals one, sometimes when you put, like, something like you construct a SQL query, right? Like, you have certain conditions and you want to construct it in the once select statements and you, and it's easier to do where one equals one and this and that and this and that. Yeah, so like it's the end because you want to prefix your conditions with end, right? So that's why, where you want to use one end something is much easier, right? Rather than to, like, write some code to actually generate SQL code then it's like skip the end for the first condition. That's how I understand it. I understand it. For simple migration, no no benefits. It's actually easier to just use the Rails. But if complex migration is very difficult to you have to go down a single. Backfield is very it's very complicated. There's no way you can you have to use SQL to do backfilling, yeah? Are your deployment engineers comfortable with it? Is it done? No, they are actually comfortable with it. I think it's just the Toby way of dealing with deployments because we have a big team. It's honestly it's not it's not super we're not super happy with it. I think the department is just they know themselves because the department engineers will usually feel very bottlenecked. Everybody wants to deploy. There's only this many people who can actually deploy it for you, right? So then it's like, oh, all the shit work coming in. There's this queue of shit work that I've got to handle. So we're still trying to figure it out. We are a growing company. We still haven't sought out everything that needs to be sought out. So next one, seed data, right? So the development unit, some seed data to be around. In the past I used to use this seed pool or something simple in order to seed some data but because our app actually is pretty big because even though you see what you see on our website is just a web store we actually have a huge system to handle the warehousing because we actually run our own warehouse 24-7 warehousing so we have a huge system that has different modules and I guess this is our stock measure of seeding data which is that we need to seed data for all the entire platform, for all the engineers. So that's why we have this custom break task that will actually get a database dump of our production data. So our production data that is like maybe two, three days old. So we dump of the data and then we grab it and then we just restore it into our local environment which is not very efficient because we have such huge data right so every time I do a database I gotta wait like one hour and I don't know it's really not optimized but this is really just our stock measure because our ultimate goal is actually to speed up our app into different smaller applications right. So with smaller applications seeding the right data for that small app is very much easier. It will be much much easier. Another thing to stop getting that we did was you know I'm an aspect guy I haven't done unit tests in years but when I met the Tobi, when I joined Tobi it was like everything is test unit. What year are you living here you know but I can understand that for legacy reasons people still keep writing tests in test units at least they write tests so that's good, that's good but we're slowly trying to introduce aspect into the system in our test suite. So right now actually what we have is test unit and aspect at the same time so our CI builder actually builds both test unit and aspects but eventually what we'll do is eventually all these test units will probably be sort of deprecated as in when it's suitable we'll just deprecate those tests and just pull it over to our especially when we start splitting up the apps it will be much easier now it's just one big mess I'll just write the new test in aspect yeah that's a good question it's actually 2.3 terrible right in my last company I wanted to do a full operation so that's what I wanted we've been wanting to move to Rails 4 for some time but you know it's a running business so it's a tough endeavor so I think for us our strategy is we'll do it when we split up the apps we'll just write those apps in the new version of Rails I think that makes more sense if we do it right now it won't be monolithic it's going to be too tough and a lot of jams will not work this well so yeah there are some apps which are like very old versions which is not optimal right actually one of the reasons why test unit is still around is because the aspect that we support it's an old version of aspect which I can't run controller tests controller tests on it yeah so anyway no questions okay yes but Toby has been around for a couple of years so I can understand from a business point of view theoretically that's always the case theoretically you want to like oh we need to scale so we need to refactor now but the reality is business is always running you often don't have time to do that so you always push it and then you let it become technical debt and it's also a resource problem right if you can throw enough engineers at a problem maybe that can be achieved but if you have only this many engineers and you can't add everything you know I would say that for most for most businesses they are always business trump business is always the priority but now at least the good news is now we have come to a stage where these matters are we are trying to scale our engineering a lot currently we are at 30 engineers our goal is to reach 50 to 100 in the 2 years time so the current system like that it's going to be difficult to scale the engineering side does that scare you by the way I can understand not everybody would like that so for us this is the reality of it if you can't accept that it's not the right fit it's not the right culture fit it's just as simple as that I think for us we are looking for people who are team players we are looking for engineers who can really work well with others but so like the good side of Adobe is that we have good culture mature company we are not going to close down tomorrow we can offer you something which a lot of other startups can offer so I guess that's the reason that's kind of like a draw for some of the engineers and plus we are working on things that are at scale that needs like a lot of the newer technology can actually solve it and we are actually open to the most technology to solve it maybe Rails may not actually be the right for some of the problems it depends on that team like I just mentioned maybe Rails might not be the right for that like the analytics analytics piece I don't think Rails would be good for that but so that's really important for discussion within the team itself what do you guys think what do you guys want to build it on no we are not we are not like Rails Nazi transactions well you mean on a migration level or on a code level yes I think for us it's like we look at it as what things really need transaction not everything really needs transaction like some of this like for example like money is a good candidate for transaction because you don't want to lose people's money so you have in place transaction block for those calls but for other things like click visits you don't need it there's no need to lock up your system just for those actions so I think for us it's like we try to use this question really well I think we actually have problems with what problems do we have in migration itself we actually have some problems with transaction but I cannot remember all the time I had when we encountered something yeah sorry last question so you talked a little bit about scale can you give us a little bit of sense of what that is what setup do you have so our setup it's fairly straightforward the only thing that we have is we really go always like we have I think it was like 64 gig time or something so our service is like over capacity there's no way to hit a limit on that scale wise I can't review actual numbers but I will say that we are one of the top e-commerce sites for this women's fashion we're one of the top so I don't know if that makes any sense to you we have enough to make the difference like traffic and data coming in that it's not trade-off anymore because I was thinking we're using database triggers to kind of like the raw data coming in and then writing the the data around I guess have you guys considered using things like redis to store and substitute updates versus running so if you and me I'll do that but I'm not, I wasn't the one who designed that so it has been a problem yeah yeah so I would say that you know like we try to stick as much as what we already have as much as possible until we really really need those so I think like the more the more tools you use the more complicated the system can become especially for mono rails if it's like separated already then that's fine then I think it wouldn't be a separate issue if you separate it out into like an analytics app then analytics like software as a service name is like it's not a non-issue we can just use whatever tools you feel that makes sense I mean this is a side story I don't know if I should say it because it's recorded but this is where you know that in the past a lot of the architectural decisions were made by a certain degree and he was actually quite a bottleneck to the growth of engineering in terms of the kind of things we can implement but now that he's no longer there I think we have a lot more leeway to how you want to design this architecture of ours you know sorry next speaker right so if you have more questions for Jason you can ask after