 Hey everybody, we're back at .NETConf 2019. I have a new co-host. Myra, how's it going? Good, excited. Yeah, I know, thank you so much for joining us. So as you probably noticed through the entire conference, we've shifted co-hosts just because we're doing this 24 hours. Yes, it's amazing. And it gets pretty tiring after a while. So thank you for all of our current, for our future and current co-hosts for taking the time to do this. In particular, I would like to thank our speaker, Santosh. How's it going? Hey, I'm doing good, how are you? We're doing great, thank you. So you're here to talk about Cosmos DB for ASP.NET and SQL Server Developers. Take it away, sir. Let me share my screen. All right, you want to, there we go, perfect. All right, are you guys able to see this? Yep, it all looks good. All right, perfect. So I'm going to go ahead and get started. Go for it. Hello and welcome to .NET Conf. My name is Santosh, and I will be talking about Cosmos DB today. In particular, what I'll be focusing on is introducing Cosmos DB to the ASP.NET and SQL Server Developer. A little bit about myself before we get started. I am a Microsoft MVP in Azure, and I'm also a Azure consultant at NewSignature. So today, what I want to do is I want to talk about Cosmos DB, which is, as we'll see, Microsoft's database as a service for a variety of models, and my whole approach to this is based around the fact that as a consultant, I go out into the field and talk to customers, and I often see that people who have been working for a long time in the ASP.NET SQL Server world, especially in the relational database world, they have a little bit of difficulty transitioning over to a more schema agnostic world that Cosmos DB brings. So I will be talking about some of the aspects of Cosmos DB that I've learned along the way that I think are important. And also I was looking at the schedule and there is a data modeling talk, couple of talks after mine. So I highly recommend that you listen to that one also, because it's kind of complementary to this one. So with that, I'll get started. So what is Cosmos DB? Cosmos DB is Microsoft Azure's database as a service this is proprietary to Azure, which means that if you go to AWS or any other provider, you would not find Cosmos DB on there. So let's get that right off of the bat. It's a horizontally scalable database. It is schema agnostic, which means that you can save data in a wide variety of schemas and it will allow you to do that. It's a globally distributed database which you can, looking at this map, you can click on different regions for Microsoft all over the world and it'll automatically spin up a Cosmos DB instance for you over there. So it's very easy to distribute globally. It's a multi-model database, which means it accommodates for different types of data. The SQL API and Monger DB, and I'll be talking about this shortly, are DocumentOriented. Cassandra is a column at Oriented Database. And then you have Table API and Gremlin, which is a graph database. You can elastically scale throughput and storage. So technically, they are hypothetically speaking, there is no limit to the amount of throughput and storage that you can provision for your Cosmos instance. And you can do this across the world in different Azure regions with the click of a button. And Cosmos DB is super fast, like you probably heard about this by now, it's like single digit millisecond latency, and we'll talk about all of that. But the most important thing about Cosmos DB is that unlike SQL Server, it's not, you're not connecting over TCP. You could do that, but it's a bunch of cloud-based REST APIs. And these are encrypted at REST. So you can connect to this as you would connect to any REST API. So and a lot of the SDKs and everything else is built around that. So let's keep that in mind as we move along. Let's talk about the multi-model aspects. The SQL API is a DocumentOriented database. It stores data in JSON format, which as you know is the most widely used format. And it provides SQL-like query capabilities and billers an example of the type of data it stores. Mongo is similar to SQL in that it's a DocumentOriented database. I would say that the distinguishing factor is that Mongo is suppose the MongoDB wire protocol, which means that if you write your code geared towards MongoDB, chances are high that you can point it to the Mongo API at Cosmos and it would just work. So you can simply move from a hosted instance on-prem to the Azure Cloud by simply pointing your connection string to Cosmos. Our table API, we have talked about Azure. We have heard about Azure Table Storage previously. Our table API, I call this as a premium version of Azure Table Storage. It provides exactly the same type of, you can store exactly the same type of data with the same code, except that you get much better throughput and you can leverage the global distribution of Cosmos DB. So with Azure Table Storage, we have read access, the zone redundancy and all of that. We can easily scale, or we can easily enable global distribution by clicking on our map. And I'm kind of moving fast because we're running behind, so bear with me. Gremlin API, data as we know is in the real world is, it can, often we find it hard to describe in relational databases. With Gremlin API, it's a graph database and it's super relational. I call it super relational, which means that you can easily spin up vertices, which are the round entities and edges, which are the relationships that are shown by the lines. And you can spin this up real quick and attach them real quick, which means that you can do multiple levels of nesting of relationships. Something relational databases find it hard to handle. Cassandra API is a column-oriented database by grouping columns together. You can often load entire set of columns and memory for super fast calculations. Great use cases for these is time series data. Talking of the different types, different models, use cases for these are usually found in industries like retail, IoT and gaming. This screen we are seeing right here is a great example of some use of Cosmos because it uses the change feed, which we'll be talking about later in this talk. And it leverages microservices with the change feed to handle different functions in the retail industry. So Cosmos DB, you can really power up the applications by using Cosmos DB. Along the way, through my talk, I'll be talking about different developer tips. And for this particular section, I'll say that leverage the appropriate data model based on the scenario. So if you want a relational, somewhat super relational data, you can use the graph API. Or if you want tabular data, you can use the table API. And finally, these models are meant to complement each other and not replace each other. So, and speaking of that, I wanna give you a thought experiment. So let's say if you are building your own LinkedIn or I was building my own LinkedIn, this is how I would do the MVP. I would use the SQL API to do the profile pages and the posts and to do things, to do research on people I may know, I may use the Gremlin API to list the graph. To log the visits, I may use the table API to run summary calculations on years of experience. I may use Cassandra. And finally, for the signup and billing modules, because I want them to be transactional, I may actually use SQL servers. So there's no good or bad answer. What I'm saying is that you should use Cosmos DB and complimentary technology in appropriate scenarios. Hopefully, if you can take something out of the talk, let this be it, but I have other great stuff in store for you. And quick note, this point forward, I will not be talking about the other models, I'll be sticking to the SQL API in Cosmos. Speaking of the global distribution, Cosmos DB provides turnkey global distribution. You can easily spin up replicas by clicking on the map, so you can spin up different instances. One region, if you have your Cosmos DB hosted in only one region, you get four nines SLA, which is 99.99. If you have greater than one region, you get five nines, which is obviously better. And then you also wanna talk about, whether I want single region write or multi-region writes. And these may vary based on your scenario. So if you have a read heavy application, it's easy to ingest your data and then with a single region write. So for instance, your writer may be located on East US, so you ingest in East US, and then you distribute all over the world so that when someone in Australia tries to read the data, it uses this feature called multi-homing APIs for Cosmos DB and it connects to the nearest instance, which is in Australia to read the data. So that makes it super fast. Multi-region writes, I would use this in a scenario like, if you have clients all over the world trying to write data, instead of sending someone from Australia to East US, I would enable multi-region write, which means that they write to the Australian instance and it would sync up over time. Cosmos DB provides low latency for reads and writes and you can see that they're in single digit milliseconds. Cosmos DB provides five well-defined consistency models. Most databases that are competent has often provide like two, which is strong and eventual, but Cosmos DB provides five. I will say that strong is very similar to the ACID compliant relational databases, but it only works in one region. So if you expand beyond one region, you'd have to use one of the other four. Bonded stillness, you know, your reads and writes are never out of order, but the data lags by a certain interval or prefix. Now within this interval, it's strongly consistent. Session consistency provides strong consistency within a particular session that's connected to Cosmos DB. Consistent prefix makes sure that your reads are always in order with the writes, but there's no strong consistency anywhere. And eventual, it means that your writes could be out of order with your reads. Through part, this is one of the most important, you know, we're getting to some of the important parts which impact performance. Throughput actually is measured in request units and it's a combination of memory plus CPU plus IOPS. And one request unit is the equivalent of reading a one kilobyte document. I will say that writes obviously consume more than more throughput because of indexing and also depending on the consistency, if you have strongly consistent versus somewhat eventual consistent, it may consume different amount of our use, the same write operation. Partitioning, there are two types of partitioning, logical and physical. Logical is controlled by the user by providing a partitioning key. Physical partitioning is, because it's horizontally partition and stored on disks, this is completely handled by the Cosmos DB engine and transparent to the users. The choice of partitioning key can make or break your database performance, which is why I will reiterate that you should attend the data modeling session after mine. This is what a Cosmos DB instance looks like. You start with account, you just create an account. Account can have zero or more databases. And these databases have containers of data, not to be confused with Docker containers. And these containers can have different elements like store procedures, triggers, user defined functions and items, items are the actual data. Now, depending on the model of Cosmos, like for instance, if you are in SQL API, you would call your container a collection and you would call your item a document. So that's what this diagram represents. Obviously, once you create an account, you get an endpoint and connection keys. One quick, another additional note on this, there's read write keys and read only keys. So use these judiciously when you're designing a document, like if you're doing a CQRS system, you could use the read write keys on the right side and read only keys on the read side. The database, the database is the unit under which containers of data are stored, but you can provision throughput at the database level. Now, if you have multiple containers under the database, the throughput of provision here is the cap. Collections in SQL are the containers that store the data. You do not incur any charges until you create a collection. So you can create as many databases as you want with no charge. And at the collection level, you can also provision, you can cap individual collection for a certain throughput. Otherwise, it would vary based on what's, the usage may vary based on what's provisioned at the database level and how many other collections are there. Documents, now these are the actual, this is the actual record, example of a record that may be stored in a SQL API collection. So, for instance, if you take a JSON document that looks like the one on the left and store it, you'll end up with one that looks like the one on the right. And this is, even though we say that our data schema agnostic, Cosmos DB adds some fields, as you can see at the bottom. The ID represents a unique name within a logical partition. It can be system generated or user defined. If the user doesn't provide an ID, it'll automatically generate one. E-tag is used for optimistic concurrency control, which means that if there are multiple clients writing to the database, the same record, then it may use the E-tag to resolve concurrency issues. TS is the timestamp and CELF is the actual URI for the item on the internet. So how do I develop Cosmos and Cosmos locally? So, and this is where you can go to, you can Google Cosmos DB Emulator, and it's a simple Windows installer and it runs as a service on your computer. Excuse me, sorry. So this loads Explorer, this loads emulator on your computer and you get the connection URI and the primary key for the emulator. Now, the one thing to keep in mind is the local emulator stores multi-models. For instance, you also see the MongoDB connection string and you see a data explorer that will show you the data, but unfortunately, this data explorer, as I remember, if I remember correctly, is not available for Cassandra, Graph, and Table, and that may change. So the next thing I wanna talk about is, before we dive into code, is the Azure Cosmos DB.NET SDK, since we're talking about in a.NET Code 3, this goes very well with that. This is the latest instance of the SDK and it has some improvements on it and I'll dive into this shortly. So definitely if you're using Cosmos DB, you use the.NET SDK V3 and with that I will jump into some code. So I'll quickly show you the difference between.NET Core SDK version two and three. So for my.NET Core SDK version two, this is an example of the.NET Core SDK version two. So generally what you do is you instantiate a client that with the endpoint and the URL that you get from the Cosmos, once you create a Cosmos instance, you get the, once you create a Cosmos account, you get the key and the URLs, you instantiate a document client and then you create a database and generally in the cloud, it's always good to use defensive programming techniques, which means that you don't assume something exists. You always create it and then use it. You plan for creating it or plan that it doesn't exist. So in this instance, it goes through, it reads the list of databases and creates a database. Now this is version two, version three, actually is in version three, the document client has been replaced by a Cosmos client. So that's cool because obviously you want to have a more generic one and you can use the Cosmos client and the create database, if not exist, that call and similar calls at the database and collection level have been made much more stable in where you don't have to go through and catch the exceptions before you create the database. So it's a more intuitive developer experience. So there's definitely improvements on that side. So let's see if we can one. The other thing that the.NET SDK V3 has introduced is, it's using streaming APIs and the advantage of using streaming APIs is that previous versions always did serializing and deserializing of the data each time you'd requested data and that kind of incurs overhead and the streaming APIs cut down on that because the stream of the wire. So if you want to get data from the container and pass it on to something else, you can use the streaming API and not have the overhead of serializing, deserializing in between. So next thing I want to do is jump into this. Cosmos DB obviously provides SQL style queries. So we will see if we can find a collection that has data in it. And obviously I have a collection and this collection has some data in it. So we'll run some queries in here. So let's look at the form. Let me look at a data record and see what I can query. So I'll query for all the records that have day of week Fridays and see what comes up. So I'm gonna say select C dot day of week equals Friday. Probably, oh it's case and still, so you gotta watch out for that. So obviously this query ran and it consumed, you can see that it's only showing 100 records because when you get your records back, if you don't want to display all of them, you can paginate them and that's always a good practice. And this 400 records, it's consumed about 12 resource units. So as you can see, you can run SQL style queries and the one thing I would watch out for is every time you run a query, always keep tabs on the request charge. And you want to constantly monitor your request charge because that may decide how much resource or request units your provision for your collections. So this is an example of how I would run a query from a C sharp client. So in this instance, I'm getting a query stream iterator and I'm passing in the actual query. And I'm using the partition key, which is in the partition key last name. So I'm providing the family name Anderson and that will return some data. So this one thing to keep in mind is if you generally, when you run your queries, if you want to get the best efficiency and a large amount of, if you have a large amount of data, and you want to get the best efficiency, somehow try to include the partition key in the query. Indexing. So this is another component that impacts query performance. So let's take a look at this instance. So in this instance, if I run a query on locations where the city is Berlin, it'll go through my collection and it'll filter out all the data that have fields called location and that have location fields. And these location fields will have city subfields and the city will be called Berlin. So it's filtering out all. So if you have multiple schemas, so let's say I have other records in my collection that have headquarters, these don't have locations or city. So in this case, it will completely ignore those records. So this is another way, this not only helps filtering out data and making a query smart efficient, but it also helps with schema agnosticism. I will say that other things you want to think in here is you want to measure your request units per query and show you have provisioned enough throughput when you run your queries. You want to always use partition key like I was saying earlier. You want to follow SDK best practices like direct connectivity and all of these are listed in the documentation and also try to run your queries within the same, the account for network overhead when you run them. Server-side programming, this is important because often when you get data, you want to validate and transform your data before you store it or right after you store it. And in that case, you use something like a trigger. There's also stored procedures and user-defined functions that Cosmos DB runs. These run server-side because they are JavaScript code. They can basically map to the JSON data and perform optimizations like lazy materialization. The other thing about server-side programming is that you can guarantee that the database apps within stored procedures and triggers in particular are atomic. So you get some amount of the A in asset transactions. Talk about change feed processor, which is one of the best features of Cosmos DB. It's basically a persistent log of documents. So when Cosmos DB ingests data or updates data, it must maintain a persistent log and you can connect multiple clients to your log. And these clients basically use another Cosmos collection called leases that way even if they drop their connection they can come back and resume from a checkpoint. So this makes it really resilient and it's used in scenarios like event sourcing and even near real-time migration. So for instance, if you ever did the wrong partition key which happens more often than we would like to admit, you can perform a near real-time migration by updating the records in your collection and then reading them through the change feed and copying them to a different collection. So you can actually use your change feed in a whole bunch of cool scenarios. Definitely something that you should read more into. Entity Framework 3 obviously any discussion of Cosmos in a .NET conference would be incomplete without Entity Framework 3 because I'm running short-term time. I'll quickly see if I can package and in your context you would basically override your unconfiguring method to use the Cosmos and in this case it uses the local connection string but obviously if you're deploying this to Azure you would replace this with the Azure connection string. So once you do that your context can use the Cosmos client and then it can access the database and the container and it can just work like any other Cosmos client. I see that I'm running close on time so I'll keep this mobile. I was going to say we're right on time. And finally DevOps, no discussion of Cosmos is complete without DevOps. I want to talk about a couple of things. Hey Santosh, I was going to say we're right on time. Okay, I'll quickly wrap this up. Cosmos DB provides an emulator that you can use and you can actually connect your tests to that emulator so you can perform integration testing on your DevOps pipeline with Cosmos. And I will wrap up with a couple of thoughts and after that if I'm permitting I'll take questions. So my parting thoughts I get started right away, download the emulator. It has sample projects you can get started, polyglot data, the LinkedIn example I gave. So that's an interesting thought experiment to get you started but you can use a whole bunch of different scenarios. Shift your focus from thinking of cost to thinking of what value it adds and TCO. It's the total cost of ownership or non-ownership. So you don't have to host anything or that's really important. And then understand partitioning throughput, how it impacts performance, attend the data modeling talk that's an hour from now. Learn to leverage server-side code like triggers and stored procedures. Use the change feed. Make sure that you'll find a good use case for that. And finally use good coding and DevOps practices. And these are some resources and that's mine for if you ever wanted to get in touch with me. The best way to do it is Twitter. And I'll take any questions or if you're over on time then I'll just wrap up. Yeah, we're over on time. So everybody if you have questions you can use your Twitter slide there, buddy. Santosh, can you bring that up for us? Minimize to Skype and show that quick. So people can see that. So anybody if you can any questions go ahead and put them there. And we will get started. Thank you so much Santosh for taking the time to talk to us. And we will get here going with Steve and talking about the eShop. Okay. Thank you so much everybody. Thank you.