 Good morning everyone. My name is Ben Butler. I'm from Amazon Web Services and it's my pleasure to be here at NoSQL now and thank you for coming here to learn about analytics and running on DynamoDB, our managed NoSQL offering. And so DynamoDB is a fully managed NoSQL database service, not an actual database application that you install, but it's a service and managed service by Amazon that takes a lot of our best practices and lessons learned in developing the Dynamo NoSQL technology. That's just a screenshot of the cover page of the white paper, our Amazon.com CTO Werner Vogels that described that took the lessons from building the high-scale e-commerce platform and doing a key value pairs. And then also our lessons learned with developing Amazon Web Services, our cloud computing platform, and putting those together to come up with a managed NoSQL offering. And so DynamoDB is a regional service. If you're familiar with Amazon Web Services, we have the concept of regions which are basically independent clouds throughout the world. We have nine regions and so when Dynamo is deployed, it's deployed in a particular region which is then divided up in multiple availability zones which are physical data centers. And so when you launch a DynamoDB service, then you have the ability of having this database service across multiple physical data centers, not just multiple instances, but across multiple data centers on your behalf. And so when we do writes, which are consistent writes, we make sure that it's in two physical data centers before we send back a successful commit. So the reason why we want to showcase DynamoDB is so you can focus on your applications or analytics platforms running where the NoSQL part is managed on your behalf for you. So we'll go into that. It's a managed service, so you store and retrieve any amount of data. So there's no size limits in terms of total aggregate size, service, any level of traffic reads and writes into the service, and then without any operational burden. And we'll talk about that by going through some of the things that Dynamo handles. The software. So there's no software to install and manage and patch and upgrade or purchase. It's all available as part of the service. It scales without downtime. So if you need to increase the throughput capacity or decrease it, that happens without having to take the service offline. And it automatically shards. And so it makes more and more partitions depending on the number of the amount of throughput you need. So the greater amount of throughput that you need for reads and writes, it will automatically determine the amount of partitions to create to handle that sustained load. And then automatic hardware failure. You don't have to worry about if a particular node goes down because the data is always redundant in multiple servers on multiple data centers. And I already talked about the multi AZ replication. And then the hardware is configured and designed specifically for providing this particular service. And we do the performance tuning. So the idea here is you need a place to put your data and take it out at any scale that you want at any size that you want. And we take care of the operational burden and making that happen. So basically it's the logical you turn the dial up or you turn it down depending on what you need. And you pay for the speed of throughput going in and out as well as the amount of storage that you allocate for the index. So it's for consistent predictable performance. We get sub millisecond single digit late single digit latencies. So less than five seconds for five milliseconds for reads and less than 10 milliseconds for writes. And then there it's backed on our solid state hard drives. It's a flexible data model. So key value pairs, key attribute pairs, no scheme is required. And it's easy to create and easy to adjust. The seamless scalability is you don't have to create ahead of time or think about ahead of time the size of your data. You can just start putting it in and then you're charged accordingly per gigabyte per month on the actual size. And so it's unlimited storage. We'll get to some limits in terms of the actual items that you're putting in. But in terms of aggregate storage it's unlimited. And then durable. So it's consistent. Disk only writes. It's not in memory. So once you get a commit that the write has happened, it's in multiple data centers. It's on hard drives. And then the replication is taken care of. So when you put an item in, and we'll go through the nomenclature of DynamoDB, but when you put an item in and you get a positive response back, it's already done for you. So it's durable. Okay, so we try to alleviate that operational burden off your hands. Take a look at an example of Amazon.com traffic in November. That's where Black Friday is happening right after Thanksgiving. You get this spike in terms of demand that's needed for database capacity. When you aggregate all this usage per day, we're only using 24% of the capacity that we've had. And so this gives you an example of pre DynamoDB of the elasticity problems that we have. So what you end up doing is you over provision. You provision for a little bit of over peak. This is much the same philosophy of what we talk about with our cloud computing services. Now with Dynamo, when you have the actual traffic, you can provision Dynamo's throughput and size requirements more dynamically as you need be. Another example, other than Amazon.com retail, when we were in beta with DynamoDB, we have Amazon Cloud Drive. This is where it stores all the movies and videos and MP3s that you'll buy off of Amazon.com store. And so what happens is we store all the metadata in DynamoDB, and the actual video files are then stored in S3. And that's a common design pattern. When you want to have large amounts of data, you put those objects or blobs into S3 or durable storage of 11 nines of durability. But then all the metadata about what that file is, what the file name is, any other attribute that you want to know of that file, then that's stored in DynamoDB for fast lookups and retrievals. So three decisions and five clicks and we'll kind of go through the wizard. I got some screenshots of how you would create a table to get an idea. And then we'll talk about three decisions that you'd have to make when you want to have Dynamo. So let's assume that you've already decided that you want to try Dynamo. Now let's think about what do you need to get started. All right. So we'll talk about the primary keys and then decide if we just release this a few months ago, the optional local secondary indexes or indexes, and then the provision level of throughputs for reads and writes. So given an example here, so you have what we call a table, DynamoDB table, and then each row would be called an item, right? And you can have unlimited number of items in a table. And then the item would have, and we'll talk about the primary keys, but an item will have a series of key value pairs. And the primary key of what it's indexed on is called the hash key. You can have a cash key as a single attribute, key value pair, or you can do a composite primary key, which we call a hash and a range key. So you can subdivide. And then you have multiple zero to many other key value attributes, as many as you need, and that's flexible. The data types are strings or numbers, or an array of strings and an array of numbers. So that's pretty much it for the DynamoDB table. You've got a table, a bunch of items, and we'll go through some of the steps. And then you have a primary key. And that primary key can be either just a hash, one key value pair, or a composite, which is the primary key plus the range key together as your primary. If you decide to do that composite, then you get access to the query API, then you can do a couple more additional filterings and conditional predicates on the data. So that's it in that shell. And then you have items, and this can be one to many, and we'll have some examples, more examples. So indexed by the primary key, we got the single hash and composite keys. And so when you think about database operations for making that translation to DynamoDB, it's just a bunch of reads and writes. So if you want to read normal read, that's one read from Dynamo. If you want an update, you read, you make an update, write, and then read, double check, and then insert one write. And then the provision throughput. And so what we try to do is now we basically offer you a pipe, what we call read and write capacity units. And I'll get into that. But we allow that amount up to that amount for you to push data in or retrieve it out for Dynamo. That's basically the dial. So the more writes you want, you turn up that value, or if you want to turn it down. We'll talk about some ways you can have multiple tables and have different values for different tables based on your workload. And so it's priced per hour based on the provision throughput that you require. So for provision throughput, we separate the writes and reads. And then the writes are based on up to one kilobyte per write, gives you in a per second, gives you one right unit. And then you can have millions of those right units, depending on your application need. And so with that, we charge 0.6 and a half cents per 10 writes. And then then for reads, you will talk about the different types of reads, but for the for the strongly consistent read, you get six and a half cents per hour for 10 read capacity units and the read capacity units are four kilobytes. Okay. And so a pricing example, basically, if you want to do a million writes, you divide do all the math here, you get about 12 right units. And so assuming that your rights are are spread throughout, with the rights and the reads and a say three gigabyte storage, you can get this provision DynamoDB capacity for about $7.50 a month managed for you. It changes if you if you have spikes in reads, and you need to change your provision throughput for higher at less. But that gives you a little bit of an example. And the rights are consistent. So we also offer the atomic increment decrement. So you don't have to keep doing sums or counts. You can do an atomic increment and decrement of a particular value. Then also we support optimistic concurrency control. So we won't do a right unless a value is retrieved correctly. If a condition is met, then it will do a right. And this allows for not having to lock any cells or fields. And then transactions. So the transactions are item level. And then the puts and updates and deletes are atomic consistent, and durable. And then the read throughput. So we have the concept of strongly consistent and eventually consistent. So because we're in multiple data centers, when you put in a right, we put the two rights and we commit. But when we do a read, if you need something strongly consistent read, then we make sure that all the data has been updated. But if it's like a Twitter post that you want to retrieve or something that's not deterministic that you need within one second right back to wait for the rights to pop propagate, then you can do an eventually consistent read and be able to pay half as much for your read capacity. So you basically have the same or double the read capacity if you want it eventually consistent. Okay. And so the three decisions and then the five clicks ready to use. So you create a table. This is an example of creating say a photos table. And then with the photos table, you can decide here the hash and range, or the hash key. And so with the here we'll just create a hash. Call it a type string, you can do number and binary. And then the hash attribute type will just call ID. Okay, then we'll click through there. And then this is the newest option that we offered. And that is the secondary indices. So right now, it's indexed by the primary key. But then you can decide, well, there may be a non primary key attribute that I want to index off of. And because of that, we allow for five local indices to be able to do so that basically behind the scenes, we've got another series of tables that allow you when you do this, you have to do it at creation time is creating the secondary indices. So that then you can key off that. So if you had it, maybe you bought an item, and you have the order ID as the primary key and maybe the date as the range key, you could have another key as like the items that were purchased in that order, right? And then you maybe you wanted to do a query on well, what of this item, how many orders was this item in? Well, what you would want to do is use that secondary index of the item ID. Okay. And then the provisioned throughput capacity. So we have the read capacities and the right capacities and you can change them. And so the way that works is you can up your provisioned capacity as many times as you want. And then you can decrease your capacity one time per day, and then it resets at 12 o'clock AM GMT. And then you can do another down to be able to provision that. And then throughput alarms. So you can get a notification when your capacities have reached a certain level, say 80% of my throughput. So I've said, if I want 100 reads per second, and the reads are hitting 80 or 90 for over a 60 minute period, then I'll get an email. And it's using our simple notification service. So you can get alerts. Those alerts can come to a person, but they can also go to a machine or to auto scaling, so that you maybe want to upswing your provision capacity if you start hitting that higher capacity. This is where you can start doing a lot of that automation, where based on your business logic and your rules, you can increase your capacity as you see fit based on metrics or throughput. And then you review your table and then hit create. And then you're ready to go. And so that table is created. You had the three decisions that we talked about about the provision capacity and deciding on your keys if you wanted local indices. And then you can do it with the wizard with the five clicks, or you can do it with an API call, and you're ready for use. This gives you a little PHP example of being able to creating a table name that's called a product catalog, where you have a hash key, where the attribute name is ID, and the type is a number type. So we give it a type of number. You can be number, binary, or string. And then your provision throughput. You're saying I want to be able to do 10 writes per second, basically. And then 10 reads per second and five writes per second. So again, with that, you're ready for development really quickly. You can be ready for production, and you can be ready for scale. So if you need to jump up in numbers, then you just change those reads and writes to higher numbers, with that API colleges updated. And then so this is going through the wizard and saying, well, instead of 10, let's throw in 20 for each. Then authentication, we do session-based to minimize latency. It uses our Amazon security token service. When you get a credential, it's ephemeral for a certain amount of time, and it has a policy of what you're able to do in terms of the API. And we'll go through the API. It's a pretty simple API to go through. But that's all handled under the scenes if you use our Amazon SDK, which does the grabbing of the token, using your credential, and providing the token that you would use to request an action on Dynamo. And then it also integrates with IAM. So if you already have policies designed in your IAM service, you can just apply those for the STS tokens. And monitoring. So that leverages our CloudWatch. So our CloudWatch is our durable store of rolling 14-day store of metrics. And so with Dynamo, you can check what your read capacity has been utilized, what your write capacity has been utilized, or set up the alarms so that based on what you'd like to trigger off of, you can get notified or take certain automated actions, depending on what happens. And so you get notified of maybe you're getting throttled. So maybe you set 10 or 20 reads per second, but now you're asking based on the amount of data that you're retrieving or getting, it's more than the 10 or the 20. And because of that, it's going to throw up some throttling metrics that you can key off of. And then there's this whole ecosystem. Amazon's a very open framework. Restful and a web service type so that there's all these other frameworks that you leverage our SDK or create their own libraries. It's hard to see there. I'll post these slides. That's just a shortcut to Jeff Barr, our Chief Technical Evangelist. He has a blog, aws.typepad.com. And then in that blog, he has a blog post that shows a lot of these different libraries. And if I have good internet and I don't screw up, I'll have a very simple demo of using Python Bottle Library to do DynamoDB actions. All right, so no schema. Tables don't, they don't need a formal schema. Items are an arbitrary sized hash. We'll talk about limits, but you just need to specify what that primary key is and if it's a hash or a hash in range. All right, so programming DynamoDB, it's small, but it's very easy. The whole program interface can be described in one slide. That's it, 13 API calls. All right, these here are for the tables, right, creating, updating, deleting, describing the tables that are there. And then here you can query specific items within a table or scan the full table, but you only get back a maximum of one meg per call. So you have to do multiple calls or page calls if you need to do a large, large types of scan. When you do those, if you do scan, you want to be careful that you don't max out your read throughput. Then put items, so at the item level, putting, getting, updating, those items, deleting them. And then you can do batch items, also to a one megabyte size, be able to do a bulk update. And then query patterns. So if you use the range in, the hash in range composite key as your primary key, you can do predicates here. You can do equals less than, greater than, greater than or equal to, begins with, right, in between. You can do counts. You can retrieve the top end values or the bottom end values. And you can do page responses. So basically you get a page token so that you can then go retrieve the next set of results and iterate off that. And there's some modeling patterns when using Dynamo. You want to, you can map relation, mapping relationships with range keys. So there's no cross table joins. That's what you lose when you're using a managed NoSQL for us is that you're not doing complex SQL transactions. There are series of key value pairs, right? But then you can create as many tables with as many keys as you need based on, because you have so many tables. And you can model those kinds of relationships. So you have, if you have an order ID, you have ordered ID here. Then you have a range key as the date. Those taken together as your primary key. And then you can have zero more of these filled in. So here, this particular set of items in this order, there was two items. Here, there was only one. And then handling large items. So there's some best practices if you want to handle large items. Maybe maybe you're trying to log in a bunch of emails and you're putting in a DynamoDB table or some other message, right? You can have unlimited attributes in a given item and you can have unlimited number of items. But if you aggregate all the key value pairs plus 100 bytes for overhead per item, that total size is 64 kilobytes. So you get one item as many key value pairs as you want. But if you add that all up, it's got to be 64 kilobytes or less. But then this is what you need to do if you have objects larger than 64K. All right, you can do a range and hash, or a hash and range. So your hash key would be, say, your message ID. Your range would be what your part number is. And then taken together, you can construct all the parts. And so for the message here, you've got this message, but you ran out of your 64K, so you create a new item. But the hash is still the same message ID, but the range has been upticked. And now you've got the next part of the body and so on and so forth until you're done. So you can break down that message. I always forget to do that. Always forget to turn off my Evernote when I give a presentation. Sorry. So here, another example, you could have where you're having multiple tables and then you're hashing on the unique identifier, and then you have the part and then do the body continues. And then I also know I talked to you about the four is using S3 and Dynamo. So what you do is if you've got a large one meg file or a terabyte file, three terabyte file or something, you don't have to put all that, break all that out and put in Dynamo. What you can do is you can just have a reference here and just put a pointer in Dynamo to the rest of the object in S3. S3 is also key value pair. The key is your bucket name and name space and the value is your object blob. And so you're basically using a combination of a NoSQL, highly fast seamless scalability metadata store. And then you're using the highly available, highly durable object store to store the rest of that information. And then this is brand new, is managing secondary indices. You can have five of those. You use a different range key. So you come up with a primary key of a hash and a range. But then for your local secondary indices, you can have five of those. You have to pick another non-key attribute to be the new index. And so that can give you fast queries. And I gave you one of those examples. And then time series data. So you may have, based say on logging or clicking through websites, ad views, gameplay, any other thing you want to do analytics on, you may have non-uniform access patterns. So here, you've got hot and cold tables. So you may have a table for January eight months ago, and you may have a table now. So you want to maybe jack up the rate of the high reads and throughputs here that's happening today, or today in August, and then reduce those to very minimal levels for January. And just do a rolling log and then output the old archives into S3. And so what I like to get to real quick, I only have five minutes left, is then you can use DynamoDB as part of your data pipeline, big data pipeline strategy. You've got your metadata here. You can use any kind of apps to access DynamoDB, putting in all those key value pairs. You can use Elastic MapReduce, which is our managed Hadoop offering on EC2, to basically use between DynamoDB and S3, creating ETL or analytics or anything, and pump that data into Dynamo and put the resulting information into S3. You can then use Redshift. Redshift is our managed data warehouses as a service. You can have a cluster of instances to give you 1.6 petabytes of data, but this is now SQL compliant, which then can fit with your reporting and business intelligence tools, because most BI tools and analytics tools today leverage SQL endpoints. So you can have this workflow that use DynamoDB in a no-SQL format. Do your ETL and analytics or pre-staging into Amazon S3, and then that Redshift can copy either for Dynamo or from S3 and provide that staging area for reporting. It's pretty easy. Basically you copy the Redshift table from the DynamoDB table, you provide your credentials. We have a lot of different users, and it's our second fastest growing service. It was our first fastest growing service until we launched Redshift, and then Redshift became highly popular. But DynamoDB has been very popular and well used by large Fortune 500 enterprises as well as government. I'll show you just a very simple example. I'm logged into an EC2 instance, and what I have here is I have a DynamoDB table called Hotels, and then we'll explore this table. I've got three hotels. I've got Hilton, New York, Alois, Cupertino, if you guys can read it I'll read it out, and Sheridan Fisherman's Wharf Hotel, and then I've got a secondary key or a range key of the main phone number and city and state. So what I'm going to do is just show you really quick using, this is, I'm using Bottle, which is a library for Python, and I'm going to create a connection object. I've already created the table, so all I want to do is add a new entry. So I'm going to add the local hotel here, the Marriott. So the table is called Hotels, and then we've got San Jose as the city and state California, and then so I create this item, table item, and I say the hash key, which is already known to be the name, is the San Jose Marriott, and the range key is the phone number, and the attributes are the city and state. And then I do item.put as the commit. That's doing that add item. And so I'll run the Python code, and when I run it, I'll go to here, and just hit go, and now you can see where the San Jose Marriott ran in. So you can just extrapolate from there, you've got lots of key value pairs. You can then automate just putting those in and out as you see fit. Amazon's 100% API, so what you can see me do in the console, you can automate in script in Python, PHP, Ruby, Perl, .NET. Okay, so I think that hits our time. I'll take any questions out in the hallway. We have, I think, a coffee break right now, but appreciate your time and interest in learning about Amazon DynamoDB. Thank you.