 When I give this talk before is about 40 minutes and so I'll try and get it all into 30 minutes. So I'm Robert Stewart and I'm the Chief Architect at CastLight and so what I'm going to talk about today is how CastLight Health uses MongoDB to do geospatial searches for healthcare pricing data. So what I'm going to do at first is just give a little bit of background about CastLight Health to give some context for the particular problems that we needed to solve and then I'll go a little more in depth of into those actual problems from the business perspective as well as kind of the technical problems I had to solve. And then as is often the case, you have iterative solutions to this. You take a shot at it and it do something simple, it works great and then things change. The data grows dramatically and then you have to adjust to that and scale up your solution and so we went through a couple iterations of that and I'll talk about mostly about the final solution we came up with, with MongoDB and geospatial indexes and even there kind of like we iterated on which indexes to use and then running the database on top of SSDs and why we did that and why there's huge value in doing that for our use case. And finally I'll talk about an operational aspect of what we did, taking advantage of a feature in MongoDB called replica set that made it very, very simple to flip between big bulk sets of data with like really no impact on our production system. And this is, it's nothing earth shattering, it's not patentable, one click flipping or anything like that but for just a little bit of extra cost around the SSDs it gave us incredible operational flexibility so if you can do it, I definitely recommend it. So Castelite is a, we're a growing startup so we're probably at about 270 people or so now, we're in San Francisco and so we've been there about five years and what we do is we develop hosted and web applications that provide consumers with unbiased information on healthcare costs and quality and so you can effectively shop for healthcare based on cost, quality and convenience. So what we sell to employers though, so employers and health plans and so typically we're selling to medium to large to mega large employers who usually have their self insured which basically means that rather than the insurance company taking the risk, they're taking the risk but at reduced premiums and so they have a lot of skin in the game around how much is spent on healthcare and how healthy their employees are and of course the consumers do because a lot of these companies are switching to high deductible healthcare plans and so the employees and their adult dependents that we support it's very important for them to be able to search for high quality low cost healthcare providers. So we've taken in about 181 million in funding, fortunately we still have most of that and things are going pretty well so we've gotten some pretty good recognition in the area of what we're doing and growing fast selling huge customers which of course has caused my database to grow enormously and we're hiring as always. So if you're an employee of a customer of CastLite, this is what you would see when you log into our application and we have a web application, a mobile web application and then native mobile applications on iOS, Android and Windows phone and so this is the main web application so an employee or one of their adult dependents could get access to this. So in addition to being able to see information about your past care so we have nice charts and graphs so you can see how you and your family are using healthcare and what things cost which providers you're going to and you can see information about your plan but you can see real time information so we get real time information about where you are in your deductible and it can be very complicated, there can be individual deductible and family deductible and these things called sandwich deductibles and then the co-insurance and the copay and there are a lot of complicated provisions and so rather than you having to kind of run those numbers in your head of how is this going to affect the cost if I go to a provider, obviously we have the computer doing it because it's fast and it's accurate and it doesn't mind doing those things and so what we do is we provide the user with personalized out-of-pocket estimates for a particular procedure or service before they actually go to that doctor and we can show you comparisons and we have a lot of quality data that will show. So on the middle of this you'll see, this is where you would search and so the big red search button out in the middle so above that there's a zip code that there is there right now but you could make that a street address or city and state and we will geocode that to a latitude and longitude and we just use Google to geocode that and so that will be the search origin, the latitude and longitude will search around and it defaults to 25 miles but you can make it 2 to 100 miles, the search radius of where you're going to look and we're going to find all of the providers that perform that particular procedure or service that are in network for you and we're going to show as much as we can estimates of your out-of-pocket cost you know it's very specific to you. If somebody else logs in, there's somewhere else that are deductible, they'll see different price estimates. So let's say you searched on primary care for adults and so the pretty common thing you're looking for primary care physician and so we'll kind of work left to right in this so like on the left side you see you can adjust the search radius and and then there are a lot of filters the the initial visit is typically a little bit longer and so the cost is based on time for these office visits and so we need to know to give you an accurate estimate a little bit about the visit and then there are a lot of filters so we'll limit it well we'll show you you know how many male and female and various specialties like internal medicine and we'll show you the actual numbers for those so it gives you a sense of like if I were to filter against these things how much is that going to reduce the result set before you actually do it and they'll change as you click on them and then at the top in the center you can see we show the price range so we've gotten gotten the men and the max prices for all of these providers for which we can we can provide estimates and so we have to go get all of this data so in some cases like you search for adult primary care and a major metropolitan area and this is like in downtown San Francisco you might actually get you know thousands of results back so it's a lot of data we have to pull back and do all these out-of-pocket estimates on and it within this location so definitely you know you can imagine where geospatial index has become very important for this to perform well and so then on each of these individual providers if you click on those we'll show you quality information about the providers as well as the cost of other procedures you know maybe you see a great price on some procedure and like I think I'll go there and get that but it gives you a sense of the overall cost of that particular provider for other procedures so so the two main things we needed to do is that kind of search that is showed so you were looking for the prices for a particular procedure service performed by any of the providers that are in network for you in a geographical constrained geographical area and so that's the main thing I'll talk about today also there's the lookup for all the provide all the prices for a particular procedure by that single provider well for all procedures by that single provider but that's obviously very simple thing to do you can do that with basically any kind of database you could index that and also we want sub second response we want this to be super snappy you know we don't want this we want this to be like any kind of consumer shopping sort of website and so we demand of ourselves that we keep the search time under one second for these searches even when we're turning thousands of prices and doing all these calculations to give out-of-pocket estimates so as I mentioned obviously need a very fast geospatial index to pull that off and our rate count out in the and this is the actual this this is the curve of the rates that we have in our database right now and on the far right in August we're we're now we hit a billion rates that we have and again a rate is a procedure performed by a provider at a particular location as part of an insurance network because the price may be different if you go to the doctor at his clinic or go at a hospital where he has privileges or she has privileges the price can vary and we have that information and the way we do this is we take in medical claims so we pull in right now we have over a billion medical claims that are de-identified and we have a team called the revenge team and they do reverse engineering and so they take in these medical claims and they reverse engineer the negotiated rates between insurance companies and the medical providers and we have a green plum cluster that we use for that and so from that we've generated a billion of these rate estimates for a health care pricing all over the US and every month these these increase so we do another one of these runs we get more more claims to analyze and just in going from July to August we went from 750 million rates to a billion rates and so obviously a pretty big jump that we needed to be able to handle and get that data into the app with minimal impact ideally no impact on the actual users another thing about it is that it's difficult to index the data so that you always get sequential reads so you know with with databases typically you're using a most common type of index is a B tree and and and then and really specifically a B plus tree and and that kind of goes back to the nature of block devices and sequential versus random so typically the you know there are a lot of reasons to create an index on on a table or a collection of documents and two of the main things you're trying to do one is you're trying to reduce the the data the items that are in the collection or table to as small as possible as close as possible to the result set before you return it and the other thing you're trying to do is try and make sure that those items that you do look at have locality on the physical device you're storing it at now that's really important on a hard drive because the the while the sequential read time is really fast basically the bandwidth of the drive the random read time is affected by the seek time on the disk so now you're looking at like maybe 10 milliseconds or so seek time you do a lot of random reads and it really adds up but as it turns out with SSDs it totally changes the story because with an SSD the random read speed is much much closer to the sequential read speed and so you don't have to worry as much about this data locality there's not a big penalty of doing more random reads and the payoffs of that is you don't have to worry as much about the index you don't have to have like in my case I would have had a very complex big index takes a long time to build a long time to maintain the big thing is bad of course on an SSD because it costs more but but also just the time of maintaining it as well so it's kind of you can cheat on that of having a simpler index if it's on an SSD and the cost has gone way way down so definitely worth looking into so no our system it's there's lots more moving pieces in it but just as far as like the pricing retrieval aspect of it this is what it looks like so up at the top the user they're coming in through a web browser and or they've got a mobile web browser we have a mobile app and then the mobile web app and then these three native mobile apps but all those however they come in it all funnels down through it's like there's a Ruby on Rails app and then we've got a applicant client side apps that are single page apps written in JavaScript with Angular but it eventually comes down into our service layer and that's written in Java and so the pricing service is kind of the layer that interfaces down to the prices so one nice thing is I'll talk about what I started with was my sequel I was able to transition it to MongoDB just in that layer and not affect anything above it so what we started is and this back when we had maybe two million rates that were these two million prices that we're looking at it just store it in my sequel pull it out of my sequel when the pricing service starts up but once that hit about 150 million rates it started to become problematic because what we need to do is because there's no geospatial index built into my sequel so I had to do that in Java and the pricing service so I had to build at least the basic part of that index and and so that takes time to build and so it I eventually was hitting like I had a 55 gigabyte JVM heap which is rather large and I was spending a lot of time on GC tuning CMS collector that is your friend when you're doing things like this way way better than the parallel throughput collector but so it's a lot of time I'm like trying to work around this and then it takes it was taking 20 minutes for the pricing service to start up because I didn't want to slam the database while I was reading in all that data and then I still wanted to background cash all the additional data and that was like three hours before that happened so when I would start up we do a rolling restart on the pricing service it would be pretty slow until we got a lot of the data cashed that's fine if you're doing your rolling restarts late at night but if for some reason you had to do it during the day not so good and by slow I mean that if the data wasn't in in memory and in my heap and and the NODB buffer didn't have it cashed on the my sequel side it could take up to two minutes to query so obviously that's pretty bad I couldn't afford to do that so enter the Mongo so I've been working with MongoDB for quite a while and and I was aware of the geospatial indexes in it and so I thought well let's take a look that probably will fit our use case and fortunately for us it worked out really well and kind of solve the problem for us so MongoDB has a bunch of different types of geospatial indexes and I'm not going to go into a lot of detail about MongoDB there's a lot of information out there's a lot of information this conference about it so I mainly just focus on the parts of it that we used so there there were some standard 2d indexes in Mongo when I started looking at it for this it was around 2.2 was the version of Mongo and it was really just too slow for for my use case and but there was another one called the Geo haystack index and it was pretty interesting index turned out it was conceptually very very similar to what we were doing in Java it's basically taking a grid and just so like an XY rectilinear coordinate system and slapping it down on a you know whatever you're using like say the earth your locations or latitude longitude and then storing things inside of those grid cells and and so that actually worked pretty well but just in the last couple of weeks I've switched over to MongoDB 2.4 and it has a 2d sphere index and it's a lot more flexible and turns out that my testing shows it's it's actually even faster so we're we'll be switching over to that in our September release so just a quick aside on mappings and projections and so the in the effects of using a non-spherical index so when when you look at something like Google Maps you see what's called a Mercator projection typically and that's it's a cylindrical projection so if you just imagine the earth and it's splattered out onto a cylinder that's what it looks like on the map and so there there's definitely distortion that you get and when you look at say the latitudes as you go 10 degrees of latitude from the equator to a pole it stays the same so 690 miles roughly in between 10 degrees for every 10 degrees but with longitude obviously the the longitude is going to collapse down to zero at the pole and so the farther you go north the more distortion you get and so you have to be very careful about if you're using the haystack index or any other kind of rectilinear index that not taking into account this the spherical nature of the earth the farther you go north the worse your results get or the more inaccurate and you have to account for that so at the southern tip of the US it's typically it's down to about 650 miles is 10 degrees of longitude but even by the time you get to the northern part of the US just a little bit above into Canada well above Montana and North Dakota it's down to 240 miles and so you can see there's quite a bit of distortion that's happening there and and also it makes these states look bigger so I mean Canada is big but it's not that big as it sees on Google Mass Montana North Dakota actually not that big Texas however actually is that big so so the haystack index so we use longitude latitude for XY coordinate system so you have to keep that in in mind as well because longitude is the X and normally always talk about latitude longitude but it but you have to reverse that if you don't do that you will get crazy crazy results and I can speak from experience so then also the for us our default search radius was 25 miles and about halfway up through the US kind of the midpoint of the US that's about 0.5 degrees so in the haystack index you you have to do all the radius and bucket size all of that is in the same units as your coordinate system which is a little bit funky you know our users don't typically say you know they're not thinking okay I want to find a doctor who's within 0.4 degrees of longitude of my house so you have to accommodate for that and because of that distortion you end up if you don't you end up with this elliptical search radius in the ellipse gets really elliptical up in the northern part of the US so you overcompensate for that and that's what we did and then go back and prune all the ones that that fall out they're too far away on latitude and so this is just to show what it's like in MongoDB so this is the JavaScript commands the first one is to create that index to create the geo haystack index and when you do that you can have with a haystack index you can have one additional filter and so in my case the procedure mapping ID is was the additional thing so the thing that provided the most selectivity of course for me anyway it was the location and then the procedure mapping and then the second is there's there's a you just have to use a this database command and so I'm in that command I'm saying I I've broken all of my pricing data into these natural buckets that are a simple way of partitioning my system based on insurance companies and employers and so I'm giving the search origin the max distance in degrees and then I can specify one value and this is a little bit of a downside of the haystack index that second part of the index has to be a literal value like a number or string it can't be an array which is too bad because that would really have helped me and so those are some of the drawbacks of it but it's still it's pretty fast and then the third thing a little bit of murmur is there's a bug in the haystack index if you try and search on the second part of your composite index and you don't have an index on it itself a separate index then you get an error but if you're going to search on that you probably want to have an index on that so so that worked for us pretty well up until now but what has started happening you know again you're like your data changes and so we started getting these these buckets of data that you know like one of them we have a hundred and fifty million prices and a lot of the prices that are in there are out of network for the users because there are a lot of provider networks and the user has access to a small number of those so what happens start we're looking at a lot of data and then filtering based on the provider network on the app side so I took a look at the 2D sphere index because it would let me have additional additional parts to the composite index and it also supports earth like spherical geometries which is great so the default is in fact the earth for the spherical geometry that supports and then when you specify the location I was still using X Y pairs it works out well but you can use GeoJSON to specify the location of the document that you're inserting and it indexes that also supports line string and polygon so pretty good support for GeoJSON if you're already using that it's pretty nice and then the other thing with the with a haystack index you really the only thing you could do was essentially inclusion in a circle and a flat circle with the 2D sphere there's besides inclusion and there's inclusion in like you know it's inside of this polygon and you know inside of different shapes is quite can be quite nice as well as intersection you know give me all the documents that intersect these two two you know shapes in proximity to a point give me that the top hundred documents that are close to this particular point so much more flexible and so for comparison here's how you would actually use it so the first command again this is the JavaScript of how you would do it so like in the Mongo console you can do this so what I'm doing is I inside of my pricing database I'm saying in I have a collection named priceables one there are a bunch of those in there so on that collection I'm telling it to make sure there's an index so if it already exists it just ignores this otherwise it creates it so create an index make the first part of the index be a 2D sphere index on that low key and it's an array of longitude latitude in my case and then the second part of the index is the procedure mapping ID which is a key on the document named PM and then the third part of the index is the provider network so then there's also a PN key that's on there and it doesn't have to be in this order if you found more selectivity by some other thing and then the geospatial part didn't it provided some additional selectivity but wasn't the most important you could put it later so 2D sphere is really a very flexible index then the second command here this is just doing a search in it so I'm telling MongoDB from the Mongo console to find all the documents where the low key and that's it was just what I named it again is the key inside the document and the geo within so that's an operator and MongoDB and it's basically specifying inclusion and the type the shape that I'm I want to find things that are included in is a if you think of like a yarmulke somebody drops a yarmulke down on a gigantic 125 mile radius yarmulke drops it onto the earth and so you get this spherical sort of cap sitting on the earth that's the shape that I'm searching in and so that's what I'm specifying here I'm specifying the longitude and latitude of the center of that and then the radius but this time it's nice it's in radians because of course now we're in spherical coordinates and so now I don't have to worry about distortion and then finally I'm specifying that a particular procedure mapping and then another so there's another operator in MongoDB it's it's very much like in SQL and in so I'm just saying where also where the documents have this particular procedure mapping ID and they have a provider network ID that's either of one of those return those back so my results so it's highly accurate geospatial accurate I didn't have to do any kind of go through and do any pruning of the results that came back and it was even faster than a haystack index in almost all cases and the main reason it was faster for me is that I really benefited from that third part of the composite index and that made a big difference so so it's great so now I can I have these geospatial indexes and so I can search for search for prices based on location but at this point I was still using hard drives and so the the good thing is so the MongoDB Geo index was twice as fast as Java on top of my SQL but twice as fast just means it was one minute as opposed to two minutes so it's pretty bad when the data was not cached and and so I spent a little quality time with IO stat and VM stat and became pretty clear I'm suffering death by random read just way too many random reads to find all the data so I went out and got a $200 Samsung SSD 256 giga gig SSD stuck in a Linux desktop and now my typical query time went from a minute to you know well under a hundred milliseconds for most queries and even the big ones where I'm pulling back tens of thousands of documents it was still taking only about 150 milliseconds so it was really really fast so the MongoDB comes with this nice tool it's just a generically useful tool I mean there are a ton of tools out there like Bonnie and stuff like that for measuring disk IO performance so Mongo perf is pretty great for that and it also it has a mode that gives you a pretty good sense of how a particular device will work with MongoDB because of the way MongoDB uses memory map files and so what I did just to compare some things so I've got enterprise grade SSDs in my production and QA systems and in QA we're using VMware VMs so on the in the production system I'm getting 74,000 IOPS and so that's pretty great you know with with a hard drive I was getting you know with a raid 0 plus 1 array or 1 plus 0 array I was getting about 300 IOPS so this is a huge performance boost there is a drawback of going with a virtualized IO with VMware so that kind of killed it down to about 30k IOPS still pretty great but that definitely a noticeable cost now this is I'm slamming the drive you know so it's not quite the normal mode of operation but so you definitely want to keep that in mind of this huge when you get with an SSD when you look at the percentage loss you get by running VMs on top of it it's not insignificant you know so definitely consider that and then just a consumer grade an MLC 256 gig Samsung 830 that I had I was getting 47,000 IOPS just sitting in my Linux desktop box so that's pretty great run everything on that put put your whole system on that you'll be happy so I wanted to so now I've got these geospatial indexes so I can look up the data really fast we look it up based on these geospatial indexes and it's fast on these SSDs so now I want to get this data into my system and how can I do this with having as little impact as possible on users because I want to change I want to be able to change this data just on the fly and be it have it be totally transparent to the user when I put in new pricing data and so we do major price updates monthly but I also wanted the flexibility of like whenever I wanted like oh we found it if we found an error in the data or we just want to refresh the pricing data I want to be able to do that during the day and so and so our the main thing to be kind of aware of in this that that affects us is so our for us anyway for our use case we're IO bound we're not CPU bound these queries that MongoDB is doing almost no impact on typically I see like 0.1% CPU use across you know six cores but the IO can be pretty heavy the load on the SSDs so what I did is I set up two MongoDB replica sets and I put multiple SSDs in the servers and so I and so I'm running multiple MongoDB demons on each of these servers so this picture here kind of depicts what we did and so so the ellipses there are represent the replica sets so prod pricing one and prod pricing two and so on in MongoDB a replica set kind of the minimal replica set is has three members and you have two members that have all the data one of them at any given time one behaves as the primary and one behaves as the secondary and then the third one is you know if you're you're fine with just having the two is an arbiter and so it doesn't store any data has very little impact on the server it's running on what it's doing it's heart beating with the other instances and doing quorum voting so if the primary were to go down and that actually did happen for us recently the arbiter and the secondary would vote they would see okay we still have a quorum and the secondary would be promoted to be the primary the primary let's say it was a network partition then the primary would go okay I'm all by myself now I will stop responding to requests and so for me what happened so I've not had this problem with MongoDB but I actually literally had the SSD die on me and on the primary of course and MongoDB very nicely within milliseconds failed over to the secondary and so that was pretty awesome so then I have the second replica set and so these in the the first replica set the instances they're all listening on port 28001 and the other one it's 28002 just what I chose just to keep myself sane of like ones and twos everywhere you can remember things so it's running on the same system it's just its data directory is on a different SSD so that I can load it kind of at will and have really no impact on the active replica set and that turned out to work fantastic so what I actually do is I load up all this data on our QA systems and because the the files are portable of course I just zip them up and use PIG Z super awesome if you got a lot of cores and then I transfer that data up to the SSD in on the production on the passive replica set I put it on the primary and the secondary I don't even use replication I just literally like replace the data directories whole and then I restart those Mongo D instances and then it's ready to go I have the new data and and that's actually pretty fast then Mongo has this other really nice thing like if you if you use my sequel you know probably a lot of other databases if you do like a select count star a lot of times on a transactional table that will cause the the database demon to page it in so with Mongo and it's using memory map files you get really great performance if you're working set is in memory and so there's a touch command and it very very efficiently reads one byte from every page and so that will page for a particular collection like in this case the price was one collection it will page in all the the indexes so the be all my geospatial index and all the parts of it and then if I specify data true it will then page in actually all the data as well and so as much memory you got you know you definitely want to pull it into into memory and then on my pricing service I I just have an operation on it where I can hit the pricing service instances kind of across all of our app servers and have it flip so I just tell it essentially I'm just telling it connect to this other replica set and when you do that you want to tell it when you tell it to connect to the other replica set you want to tell it all the servers that make up the replica set if you just tell it one it'll find it and it will use that that part of the replica set to find all the other members but if your pricing service restarted it has to look up that information and that guy happens to be down then it won't find it so definitely important to do it that way so the the drawbacks of doing this so there's a little bit of extra cost but it's only the extra SSDs you know I didn't have to have separate servers and we're running on managed servers at rack space and cost is a big factor here these are pretty big machines so I'm right there 128 gig machines with six cores and obviously these SSDs so I wanted to memorize minimize the number of machines that I had and I really only I only need one replica set to have its memory have its data and memory at a time so so if I had a passive replica set on other servers they would just be sitting there on totally unused the actual server itself so that's all read-only data so this is great you know so this works really well if you have a lot of read-only data so we started adding caching though as well and so we're caching data from remote pricing lookups into the into these Mongo D instances so of course I'm going to lose that when I flip and Mongo DB does have this really nice feature called TTL collections it's it's kind of like we also have memcache in our architecture and so very similar sort of functionality they vixit after a certain time that you set for the item but it was just nice to keep everything in Mongo but we typically we do this flipping late at night and and it's only a three hour cache and so there's really not that much cost to basically flush the cache when I flip over so the the results that we end up getting is with a cold cache as opposed to two minutes or even one minute now it's it's totally acceptable it's like a maybe a hundred to two hundred milliseconds kind of worst case if I if the cache is even warm just hitting in things off the SSD if I have a warm cache performance is really awesome it gets down into tens of milliseconds to look up all this data and then the pricing service startup as opposed to taking over an hour to do a rolling restart on my app servers now it's just a couple of seconds per instance because now the performance is so good I don't do any caching in the service layer I just don't have to do it Mongo does it all for me and and now I can do these major rate updates and there's no production impact so actually if I did want to do it during the day that would be totally fine I could flip over to another replica set I can load the the passive replica set during the day as well and not impact production and which lets me do these minor rate updates which before we could never do anything like that so then in summary the the geo haystack index if you're retrieving a lot of documents in a you know I say a constrained search area but we were searching out to a hundred miles and performance was still pretty good and you've got very simple geospatial searches because again it only lets you do show me all the things that are inside of this circle and a flat 2d coordinate system and if you only have a single secondary filter so if that's your case which ours initially was that was exactly our case then it works really well and it's pretty easy to use there's not a ton of documentation but it's not that hard to figure out if however you have a more complex type of searches that you need to do or you need to do more complex indexing like the that secondary part of the composite index doesn't provide enough selectivity then you definitely want to look at the 2d sphere in general I would say look at the 2d sphere and probably at this point not really bother with a haystack index at all SSDs if you've got a lot of random reads it's just a just a massive massive win the value is enormous there and you can get huge performance at a huge performance boost at a fairly low cost and it definitely reduces your need for complex indexes and then replicates that flipping it's going to work great if you do need to do these fairly instantaneous swaps of large amounts of data and your data it's primarily read only you know maybe if you have a cache that you don't mind losing and so it's if it's a fantastic trade-off of cost for operational flexibility right so any questions yes yes so the question is about such a good question about where the reads and writes go so the the writes always go to the primary and they get read so when I'm writing to the cache it gets replicated to the secondary when I do the reads there's a with MongoDB with the drivers their read concerns and write concerns and so our right concerns are kind of main thing but with the read concerns you can do things like you can say I'm fine with reading from slaves or I want to read from the slaves from the secondaries there's also another one set that's what I want to do is read from the nearest and so that's really good if some of the members of your replica set are remote or they're even in a different rack and so it will prefer and it keeps track of ping times and so it will use the instances that are closest to it and so that's what I use because since the data is not changing very much and it's between the big bulk data and then the cache if I don't find it in the cache even though it's really there on the master it's not that big of a deal then you know I'm fine with that and so if the master got really heavily loaded the primary got really heavily loaded then all the reeds would start going to the secondary and I would get some load balancing there and so then if eventually I got to the point where I really had a lot of load I could just start adding more secondaries to this and more of the reads would they would just automatically get offloaded to the secondaries so the questions yes yeah yeah so the question is whether you could use polygons for searching versus a sphere and absolutely so the when you do the search that geo within so you're saying I want to find things included in this shape and in my case I the opera the second operator that I specified with center sphere and so that was the circle kind of projected but I could specify a polygon at that point so you can specify the points the geo JSON points that make up a polygon and it will return just the documents that are included in that polygonal shape okay anything else alright thank you very much