 Okay, welcome to the third media meetup of the year, first medial and mongo the second session, this year, like I said it's a joint meetup without joints, so bad jokes, but anyway so I just want to find out who signed up from the medial side of this meetup, okay so there's more than I thought because we were a bit worried that there was only mongo people here, so it's going to be some one talk on the medial dynamic collections from Ken and then two interesting talks from the mongo side Lawrence with the steamer's eye and Stephen has come all the way from Australia, talk about using mongo in production, so it should be really interesting, if you don't know much about medial we're not really going to cover the basics tonight but I recommend checking out media.com, it's a very simple tutorial to learn about, you know, a very easy way to go full-stat applications on JavaScript, we also have a meetup group which is not linked to mongo, if you want to learn more and come to our meetup groups, Facebook page, Facebook group, we have a very quiet Slack team, so you're welcome to join that too. Anyway, first talk is Lawrence, mongo DB skimmer design. Okay, so most of you, most of you probably will have heard that after Lee Kuan Yew died this week, so as with most other meetups happening in Singapore this week, last time with a moment of silence, so yeah today I'll be talking about mongo DB schema design, it'll be a really short talk, so I looked at how the mongo DB evangelists talk about schema design and they could give our long talks and I have no idea how they do it, I just remembered all about simple things, so I'll make it short and sweet and if you have any questions just raise your hand and ask, don't be shy. So yeah, what are we talking about? So how quick show of hands, how many of you guys here have used mongo DB before? Okay, so I don't have to talk anymore, yes, so we'll be talking about documents and strategies, all you create, like what's the schema you have for each document and how do you define relationships between different documents because much as you hate joints, you will have to have documents that are related to each other in one way or another. So yeah, terminology first, so since you guys all use mongo here so I don't need to go through this crap, yeah but this is basically a one-to-one mapping between the normal RDBMS terms and the mongo DB terminology. So what are documents? So documents are actually, they were made popular by mongo DB, but they are basically JSON objects, they call it binary JSON in mongo DB and they allow you to have, allow you a lot of flexibility in defining what your data looks like. So it's pretty much a standard JSON object, you can put a new key, you can put an array anywhere, you can put a new object, so it's really very flexible. And yeah, so for normalized data, so let's take a look between normalized data and denormalized data. So for your regular RDBMS, so how many of you guys here have actually used RDBMS before? Okay so you guys have used both, that's good. So let's take a block for example right, if you are creating a block for yourself, you probably need like five different tables, one table for your article, one table for your user, one table for your comments, one for categories and one for text. And what happens is that you will need foreign keys to join all these tables together so that when you need a block force, the database can work its magic, join everything together and you get your data out, is that right? Or is there any other way of doing it? No one has ever, are there ways of doing it? Okay, I'm not wrong. So yeah, so you can have multiple users, actually you can have one user per comment and you have a user for article and then article can have multiple comments and stuff like that. So this is how normalized database will look like in a regular RDBMS. And this is how the normalized set of data will look like. In the article object, you have an array of comments, you have an array of texts, you have an array of categories. And then in each of these comments you reference a user, in each of the articles you reference a user as well. So if it's not a user's ID, yeah. So can someone tell me what's the pros of this first? Like why would you want to use this and not the other way? No duplication. Okay, no duplication. But this has no duplication too, right? They have the most efforts up, so can you get something that... But in MongoDB you cannot get those parts too, right? All of our consistency. But you can do updates in all right. In MongoDB you just update one object, the object is updated. In the RDBMS you start a transaction, you update all the stuff in hand transaction. So both are from Thomas City, right? What's your... Something that is guaranteed. What do I guarantee? Guaranteed. Guaranteed what? Can I make you a guest? Okay. Okay, so yeah. So what's the pattern? So basically long story short, right? The MongoDB guys actually spend like 10 minutes talking about days. I'm going to spend one minute. So in RDBMS when you look at how you structure a database structure in RDBMS, you look at how you store the data. The main thing ask yourself is, you know, can I join stuff together to store less data? So as you can see from here, right? Each of the articles will have to store the user ID, each of the comments will have to store the user ID, which takes up quite a bit of space if you have a lot of applications. Whereas in the RDBMS, like both graphs for example, if you have a foreign key, they actually optimize that and you store less data as a result. Yeah, they kind of have ways to minimize the footprint. So you look at how you store data in RDBMS. Whereas from the perspective of a document schema, you look at how you want to retrieve the data. So yeah, that is the main thing I guess. So yeah, embedding versus reference, how many of you are familiar with this terminology? So who wants to tell me the distance between the two? Come on guys. Okay. One is embedding, one is reference. What is embedding? Keep it in the same document. Okay. And what's referencing? Keep it in the same document. Yeah, correct. So who here have tried doing references in MongoDB? How was the experience like? Okay. Do you think it almost won? Okay. So yeah, like last time I actually got to start out that I'm not going to name who they are, but they actually have references everywhere. They use MongoDB like a RDBMS. So they had like thousands of reference per page load. So it took like 30 seconds for them to load a single page because they had so much processing to handle all the references because MongoDB doesn't have foreign keys. So if you do references, it's quite expensive. So be aware of when you use it. So yeah, in a blog post, right? Like in this case for example, when your comment is outside the article, it's more of a reference kind of model because you will need the article, no, you need the comments to refer to which article they belong to. So one of the key things about using reference, right, the main benefits of using reference is that it allows you to have more flexibility when you query the objects. You can filter it easily. Whereas if you put in an array, MongoDB doesn't really have much query operators for you to use to do things the way you like it. Like for example, if you want to do a regex search, in MongoDB, they'll be very expensive because they don't index the arrays very well. So yeah, it gives you more flexibility to use references. On the other hand, if you use embed, it gives you much better speeds when you query for data because when you query for the thing, you get a whole chunk out in one shot. And one of the things that I've heard people mention to me when they say that embedding is bad, is that what if you have so many comments that it's more than 16 megabytes? So my response to that all the time is always the whole of Shakespeare's, one of Shakespeare's longest novels is 5 megabytes. So it's not too much data. Text is not a lot of data in the first place. So yeah, I guess always figure out your use case if you are going to grab the data individually a lot, use references, if you are going to grab the data as a whole all the time, use embedding. And general rule of thumb, if there's a lot of the things, keeping a separate connection, if you know there's a small amount, like maybe 10 of these in every object, you can just embed it and it shouldn't be much of a problem. And the great thing about MongoDB is you can always change it later on. Just write a couple of scripts to do a migration and get that. So yeah, I've hit 8 minutes, 30 seconds. I finished my top kind of. So any questions? Yeah, you can do a hybrid. So for example, for the case of the blog post that I use here, so as you can see, these are all embedded. But this is a reference. So it doesn't make sense to store the user in each of these comments and articles because it happens too many times. If you want to update the user, you have to update everywhere. So in this case, it makes more sense to use a reference. But for the comment itself, each of them are unique. So in this case, embedding works better. Yes. Ah, yes, that was correct. So MongoDB by itself doesn't actually enforce the integrity of the data. Most of the time people do using ORMs that they use. So for example, in Node.js, this MongoDB library that allows you to enforce integrity between the references. So if you delete the reference, and then if you delete one of the things, where if you delete the reference, there are a lot of other things that you're referring to, it will show you an error. Yeah. What is the problem? It needs to create a set of categories. Okay. Yeah, so that is a lot more difficult. You need to remove from each of them. So you need to think about your use case, right? In this case, for the blog use case, you don't remove categories all the time. Yeah. It doesn't do it well. Yes. Correct. Correct. Yeah. Yes. Okay, so if you are building a financial system or a PayPal, you don't want to be using MongoDB to store your users' transactions because you can't enforce the transactions and you can't do rollbacks. I mean, you can do using a two-phase comment but it's too much of a work to do. So you'll rather pay to use Oracle or some enterprise-level database to handle all this for you. MongoDB is good. I mean, one of the reasons why I use MongoDB a lot is because there's a lot of platforms that give you free MongoDB databases, 500 megabytes. And then you put a shouting instance in front of it and you get as much as you want. So yeah, that's one of the main reasons why I use MongoDB but one of the good things I find for MongoDB is if you have applications that you have not clearly defined what the data looks like. So in that case, you can just dump in stuff easily. And another good thing for MongoDB is logging. So at PayPal, we're actually currently looking at using MongoDB to handle all the logs for our applications. So the good thing about that is that you can actually store the logs in the full text and then you can just do your processing on the logs stored in some random data format. If it works, it works. If it doesn't, just throw it away and try again. And you don't have to do autotapes and stuff like that. So I'll just share one of the stories that I heard from the MongoDB evangelists. So Craigslist, no, yeah, Craigslist. They were actually using MySQL a long time ago. So it reached a point where they have a lot of entry coming every day. So it reached a point where doing autotapes will take months. So every time they want to add a new field to their system, it's a couple of months of autotapes. So they had this archive database and they had production database. So production database was, like every three months they were moving back. They were moving to archive and then you start the new fresh database kind of thing. So it reached a point where doing the autotapes on archive database would take so many months that the main database was targeting too slow. And that's why this reached to MongoDB. So that's the kind of, I guess if you need flexibility and you don't really need integrity of this, MongoDB is a good choice. Yeah. Any more questions? No? Yes, this is an integrity and transaction. MongoDB doesn't give you the asset of this that RDB has. Correct. But you can do two phase comments and ensure integrity. But you have to do your own rollbacks and stuff. That's handling MongoDB? No, you have to write it yourself, which is quite painful. So you basically record down all the transactions you do in the two phase comment and then you perform those transactions and then have a way to roll back if another worker to roll back if something goes wrong there. So you have to set up time on this stuff. So as long as you're able to do that, then there must be this case where it makes sense to actually have the same data in two different collections and one collection is embedded in the document and another collection you've got a full performance lookup table. Yes. So another way I use MongoDB before is actually caching. So I have cases where I store my data. So I need some data to have a certain level of integrity. So I store those data in RDBMS. But when I'm fetching the data out, I actually cache them in MongoDB in the real structures that I need my front end service in the data. So basically a compound version of the data. And then I'll just flush the cache once in a while and you'll be able to get from RDBMS and put in there. So yeah, that's another way I use MongoDB. Any more questions? So for the GeekCam SG page, but that was two years ago, I actually benchmarked MongoDB versus my SQL plus memcache. So on the whole, if you can fit your entire thing in memory, MongoDB is almost on par with memcache. Yeah. But once it goes into this, you start swapping out stuff. And MongoDB isn't that good at handling memory because you can't limit how much memory you can use. You just keep absorbing and absorbing until the whole system's memory runs. Yeah. Which is kind of sad. I'm not sure if they fix it in 3.0 though, but as of 2.8, you're still having a problem. Whereas, you know, in my SQL, you can say, you know, I give my SQL 256 megabytes of RAM, it will never go away. In MongoDB, you just absorb, absorb, absorb. So if you run an app server beside it, you're okay. Yeah. First question. Knowing that MongoDB is not very safe for transaction, but are there any big players who use MongoDB for... Craig's List is a rather big player that's using MongoDB. For transaction. No, not transactions. I mean, which big company use MongoDB already for transaction? What do you mean, transactions? I mean, because usually, you know, if you use already big MS, you have to check whether, how to say, it's two-way between something. Yeah. In two-phase commit, yeah. And just what are there any big players who use MongoDB purely to do transaction online? We mean financial transactions. Yes. As of now, I don't think there's any desk to be there yet. What about someone like Guilt Group? Is that Guilt Group? Because the guy behind Guilt Group is there? I'm not too sure about that. I'm not. I haven't... I'm not sure about that. Yeah. Ah, but they... So they probably have a payment to give you backing up the data, right? Yeah. Well, at least the guys here can start the conference. Ah, okay. I'm not too sure about that. I'm not sure about that. Yes. So I've actually used CloudDB before. The way you think about it is actually quite different. So CloudDB... I'm not too sure about missing out on that one. CloudDB does versioning, right? Does it do versioning? Okay, I know that. But I remember one of the ways... So MongoDB, the way you scale up is you scale up how it's done today. Whereas for CloudDB, you scale up by adding... by making a server model. The shouting part of CloudDB is that. But I'm not aware enough about CloudDB. Just one more question, you know. Okay. For now, in the market, there's a lot of database we use, like Postgres SQL, NoSQL, and now the newest language, like MariaDB, the guy who used to create mice, SQL, that would create a MariaDB. So, in your opinion, which... which database should we use? Let's use the one that's the cheapest. Okay, if it's for your own personal app, just find the one that's free. But if you're doing it for work, like if you're intending to run a proper company around it, go for the one with the best enterprise support. Which one has the best enterprise support? For the best enterprise support, it's probably Oracle. Oracle is good. Their speeds are pretty good. Their NoSQL speeds are actually pretty good. I've used it. I've used Oracle's NoSQL before. They are up there. But, yeah, you need to pay for that thing. But, yeah, if you don't want to pay for it, Postgres is probably one of the better options to go with. There's a lot of documentation online. Yes. I think on the referencing, is there something about one way and two way referencing? So, it's basically just adding one more ID on the other side to reference back, right? Is there a reason why you do that? So, let's say if you need to get to a comment from the article, you need to get from the article from the comment. You need your work sense. It's like an English. If you need your work sense, you need your work sense. If you don't, you don't. Any more questions? No? Okay. That's it. Thanks. Steve.