 My name is Niklas. I'm CTO of the Swedish company AdToma. I'm going to talk about how we build our software called Fusion, what decision we took and why. Hopefully I can give you some advice on what to think about when building new applications and selecting new technologies. And in the end I hope I will give you the answer if you can make more money faster selecting a no SQL rather than a traditional SQL database. So, we're working with ad-serving. So what's up with that? Isn't that just showing a randomly selected static image on the web page? Well, that was true maybe in the beginning of the internet era, but that's not so today. Why is building ad-serving application costly and complex? Let's look at a typical workflow to start with. You have a publisher, in this case I used Boston Compost as an example, that published their media content online and on those pages I want to show editorial messages. Then we have the advertisers that want to buy inventory from the publisher to show their ad message on the pages. They can then buy directly from the publisher or they can use an agency. Agencies often bulk buys inventory from the publisher to get better offers. And then the publisher uses one or several systems to publish the editorial messages on their web page. And the number of visitors can be quite substantial. It's not uncommon that the publisher has up to 1,000 visitors per second on their home page and that gives you 2.6 billion page views per month. And if you have 5 or 10 ad spaces, you multiply that and get 10 to 20 billion ad impressions per month on a single page. And for each and every one of those ad impressions, we try to do as much investigation about the viewer as possible. We try to find out where he's been before prior entering this page. We try to find out the third party plugins that are supported. And if possible, we try to profile the user to find out genders and so on. I'm just going to give you quick history and information about the company and myself just so you will understand why we took certain decisions. Atoma is a provider of a complete ads of solutions. And it was founded by alumnus from DoubleClick. And DoubleClick was later bought by Google. And Google is, as you are all aware, a small and tender competitor. In the beginning, Atoma offered ads of solutions from a company called Checkmate that we started to develop our own product in 2007. And that's when I was fired as a CTO for the company to build this new application. We released the pilot in the same year. We have customers worldwide. And we saw more than 18 billion ad impressions through the system today. That's rather old figure. I think we're around 20 to 25 billion ad impressions today. But I know about making complex applications using data persistence. I've been a developer since the mid-90s. I've always worked with object-oriented databases, persistent storage in SQL in some way. I've developed complex business intelligence systems, warehouse systems, retail systems prior to entering the ad-serving business. And I work with the traditional solutions as SQL server, side-base and what come and so on. I also have experiences from using data mining databases like Vertica and ClickView. And lately I've been experienced in no SQL and new SQL databases. Okay, so back to the product. What was it that we set out to do back in 2007? We wanted to replace the current ad-serving solution with a much faster and much easier to use application. We wanted to be able to use high-quality algorithms not just to select the perfect ad for the current visitor, but also to forecast the future. And that means we need to use a huge amount of historic data to project into the future. We would like to deliver at least 15 billion ad impressions per month and server. And we wanted to simplify the user interface to make the complex become more easy for the user to understand. And I said that normal publisher uses five to ten different systems to do the whole workflow for delivering ads online. And we wanted to make that possible to do with just one single system. And that means that the system needs to handle customers, orders, proposals, inventory reservations, finance like invoices and so on, and also reporting and business intelligence. And we wanted to do this with a limited time to the market. And that is really crucial because we had a very small budget to start with. The most important part of an ad-serving solution is of course delivering the ads. If you don't deliver an ad, the publisher loses their revenue stream and that means that we will get no money either. And there's a lot of requirements to consider when delivering ads, not just that you cannot accept in a downtime at that moment. We need to try to find out who is the current viewer. Let's keep the diapers commercial away from the bachelor sitting at home eating pizza and looking at NBA sports. It's been shown before. Maybe it's time to show another, the same editorial message but use another image than we've shown prior to this view. What technology is supported? Can we show flash ads? Or do we need to stick to static images? Should we apply frequency capping? We might just be allowed to show this ad twice an hour or maximum 10 times a day for a unique visitor. And also we might need to apply some retargeting. And that means that we need to try to find out the browsing history of the current user. So if you browse a lot of sport pages, we will increase the possibility for you to show sports related ads. And most important to the publisher, which ad brings the most cost effective view at any time. Let's try to keep the margin as high as possible. And to be able to deliver and make reservations in the future, we need to try to forecast the future and try to predict how many visitors will be at any time. Try to figure out how many page viewer page will have in the future. Just trying to find out how many times will this be viewed to any user at all. That's kind of easy. But we need to be able to filter down using key values. So to start with we project the future using a huge amount of historic data. And on top of that we apply curves to adjust to the season of the year, adjust to Christmas time or holidays or whatever they might be. We might also need to adjust to a specific event like the finals of American Idol or something like that. And then we need to be able to find out how many inventory there will be for specific key values. That means apply the frequency cap as I spoke about. Try to find out how many unique visitors there will be in the future. We might just get paid for a unique visitor. How many females will there be visiting the pages in the future? We might just want to target a specific gender. And also we might need to target and add to someone that's searched for the word iPhone on eBay. How many of those there will there be in the future? Also we need to try to find out how many will support third-party plugins like Flash. And then we also need to be able to target and add to a specific location. Maybe we just want to show that in California or San Jose only. And not only that, there's close to infinite ways to combine these key values. So there's a huge amount of data to investigate at all times. I went deep into the hard drive on my computer and I found the very earliest sketches set out by the founders. They wanted us to implement a heat map to show what inventory would be available in the future and also view the current orders that was in the system. The competitor system showed the same information using huge amount, or not huge amount, but a list of data tables that you need to look through finding values that could figure from 10 to 10 billion in the same column which makes it very hard to find information that you need. And it's a really tough requirement for us. You should be able to load this heat map in just a few seconds. So we started to work with that and tried to pack as much information as we could on the X and Y axis and then use the coloring of the heat map as a third dimension. One big problem though is that each and every cell in this heat map on thousands and thousands of counters on which we need to apply forecasting. So could we be able to do this in just a few seconds? We could if we could find a database vendor that would make our server side really fast. And as I said, we had very limited funding to start with so we need to have a very cost-effective development short timeline to the market and then use as few developers as possible. So we need very easy to learn APIs a very uncomplex development environment and as few points of failure as possible and that basically means that we need as few lines of code as possible. We wanted to be able to serve as an ASP solution as well as license models and we also wanted to be able to run the complete solution on a single developer's computer and we wanted to use a finely granulated object model so we need performance for that and using a traditional SQL database definitely leads to negative performance using finely granulated object models. We looked at the options we could try and use a traditional SQL database. They have their advantages they are well tested, well verified they won't be that many surprises during the development and we would be able to set a very accurate timeline for the project although it would probably not be a short timeline but it would be pretty accurate and the disadvantage is that it would be costly on top of licenses we would need to hire a database administrator and the development time would be longer because we need to implement source codes in more than one area. We need to implement database schemas, application code and then some layer in between to translate between the different layers. We can also try and find a mix of technologies let's try and find a vendor that's best suited for the ad deliveries and another vendor best suited for the back office functionality and then maybe a third one to support the data mining and reporting or we could try and find one vendor that would suit all our needs. If we should go by the traditional route we then need to start by implementing object model in .NET or Java, define the database schemas make a layer in between to translate between the different layers and try to get performance by scaling up the hardware and we also need to in each and every upgrade and patch we need to upgrade both the database and application code in parallel educating developers will take more time and that leads to scaling development department will be much harder let's look into some example code I've used MySQL and Hibernate if we should select the traditional solution we then would need to define the database schema where we would define data names, property names, tables, relations, data types and so on and then we need a layer to translate between the database and application and of course we need application itself. Here we would once again define property names, data types, relations and so on and hopefully we will use a good pattern to implement them pretty similar in the database schema and application. There's a lot of code just to do a simple task in the system. In our previous lives we had all used traditional SQL databases so we disregarded that solution and wanted to try to find the no SQL or new SQL solution to use in our application but we were looking for more database that would support the performance that delivers and the performance for the back office functionality with inventory and forecasting and so forth as little as possible during development we would like to avoid database schemas in whole and of course scalability and performance even though we would use to implement using fine granulated object models. Now it's time to try to find a vendor that could hold our hands during this project. We were looking for scalable and reliable database even if the performance is fantastic we cannot lose data at any moment and as I said we cannot accept downtime in the ad delivery at any time. We want the database to update the database schemas using the program date, classes, properties and fields and of course easy to use API and a huge amount of performance. We looked into the market and there's quite a few good ones out there. Among a few notable that we tested were Matisse FastObjects, DB4Objects, StarCounter and quite a few others. We ended up selecting StarCounter mainly due to that StarCounter offers really high performance for transactional data. They can handle more than 500,000 transactions per second and CPU call. It's scaled really well. You can run the same server on your developer's computer as you do on your main server. It has full acid support so you don't risk losing any data. It works in memory but it stores and secures the persistence by writing transaction logs on disk. And there is no database schema. You just inherit one class in StarCounter and that's where you end your database programming. So we should look at the same example as we did with MySQL and Hibernate using StarCounter API instead. How would that look like? To start with we have no separate database schema at all. Hence we will need no traversal layer. You just do your application code. And it's very neat and tidy solution. Even the best of developers tend to add more bugs the more lines of code there is. Or that might just be me. Sorry. Now with selected technology and vendor we set out the following timeline for the project. We wanted to spend as long time on specifying and experimenting with the technology as the actual implementation and testing itself. We gave ourselves six months to implement and do the testing before the pilot. And looking at the StarCounter timeline it pretty much matched what we set out to do. Finding four skilled developers proved to be much harder than we expected and educating them went really fast. And then we used Agile Scrum methodology to reach the pilot and we released the pilot in December of 2007. And that pilot had both add deliveries and back office functionalities. We did one crazy decision. We had the opportunity to run one of the largest media houses in Sweden as the pilot. I would recommend that you use a smaller vendor or customer than the largest one. They were willing to take a small risk to get a better solution for the add deliveries and for the sales people. And in the end we succeeded with the implementation. We worked 36 hours straight for tweaking and fixing the algorithms for add deliveries, fixing bugs and so on. If we look at the infrastructure that we used for the pilot and that is basically the same as we use today. We have the back office where we handle inventory, forecasting, orders, invoices and so on. And that service is available as an ASP service. System users and administrators can access the back office service using HTML5. Well not in 07 because we didn't have HTML5 then. Or they can use WPF. And then we have the front end service that handles all the add deliveries. They are multiplied not due to that StarCounter can't handle the pressure. We also have the Windows and the network connections of the poly and we need several fallbacks. And this serves the ads to the actual web browser users. And at their sites we try to run scripts and cookies and try to find out as much as possible about them. And of course they are just happy to see all the ads. Did we reach the goal? We deliver more than 12 billion ad per month than for a customer. We can run all our customers on a single server. We store counter data per hour on multiple levels throughout the system. We have very user friendly interface and very low server hardware requirements. And we had a short timeline to the market and we used very few developers. So I would say that we succeeded and exceeded expectations. We implemented the heat map and added a few add-ons. The user uses the same user component to see what's been occupied in the inventory and also make new reservations. The safe person can use the system talking to the customers on the phone, make the reservations, put the ad online while the customers are still on the phone. And to my knowledge there's no other competitor today that do that. Stability and uptime, that's really important. We have more than 99.999% uptime in the front-end servers. And they could be 100% if we didn't have a major power blackout during a thunderstorm in the hosting service. And of course the UPS failed at this very moment. So lesson learned was to have a UPS for that UPS. And we have a service level of more than 99.99% in the back-office server. And the only downtime is during major upgrades and patches. We serve all kinds of ads as you can imagine. The traditional static placements, corner folding ads, or other interactive ads where the system user or the browser user needs to interact with the editorial to see the message. Transparent layers, you can encapsulate the whole page. And of course you can embed the editorial message in the video player. We serve to any device that's connected to the internet. It's PC, internet or we even serve in-app editorials. I think I did this in record time. Do you have any questions? A sample of the code. Which one? The hibernator or the stalker? There's actually very few lines of code. Or any specific thoughts about... You said you had no schema, but I thought, let's see, he had a bit in his record and how many some kind of index... You can hint to the stalker directly in the code that you want an index on a specific property. Or you can add an index5 and it will be loaded with the next reload of the database. So you can make hints to the stalker on what index to use to get performance. Yeah, try and use this. But you don't define any separate schema, it's just the performance with the indexes. Because we don't implement the database schema and the code in between to translate between the layers. So instead of implementing three source codes for the same task, we just do one time. And it's just inheriting a base class in StarCounter and then that's basically it. And it's really easy to educate developers using this. Since they don't have to learn any... Well, it's good if they know SQL, but they don't need to know any specific database technologies. Well, let's have a topic. Does each class has a corresponding physical file? No, everything is loaded into the database. And as I said, it works fully in memory. So they load everything into the memory with their own. I don't really know how to do it. So you don't get a single file per se, you get one database that's loaded into it. Well, it is the file in this... Even in memory, it's not... You say you just need a class, right? You have a pop feature in the class, you have a nice class. So you can create a new one. You can create a new image. So how do you store this... They actually use the virtual objects that are created by the .NET engine. So they don't create separate objects. It's the same object as your program that's stored in the database. You can get much better information. So our counter is up in the exhibition. So if you want to go in depth there, they have the CTO is really good. So let's see. Okay, face one. Then you have a need that takes information from one whole object. Yeah, they support SQL as a query language to find information. So you can use SQL and then you get a chunk of objects back from the SQL engine. So that's just a view or a reading material? It's there. Two objects that you get. So you can query the objects and then you can modify the objects with properties and fill directly in the results set. SQL language is used to fetch information. Yeah, it's embedded. I don't have a sample code to show that, but it's really simple. And the good thing is that you can query both on actual fields and then on properties that you have get and set properties in .NET. So you can query either one of them. Okay. Thank you for listening. I will be around the exhibition if you want to talk some more and I will be up with the stock counter, guys. Thank you.