 Hi everybody, we're back. This is theCUBE, SiliconANGLE's continuous production of MongoDB days. We're here live in New York City. I'm Dave Vellante and I'm with my co-host, Jeff Kelly. Julian Simon is here, he's the Vice President of Engineering at Creteo. And we're going to talk about how they're using MongoDB, how they're adding value. Welcome to theCUBE. Thank you. Thank you for inviting me. It's a pleasure. Good to see you. So tell us a little bit about Creteo, what you guys are doing and what you're all about. So Creteo is a French company. It was started in 2005. And we work in the online advertising space. And we're a global leader in what we call performance display. So in a nutshell, what we do is for our customers who are the main e-commerce websites and retailers and brands, we build and serve on the internet, advertising banners. And we're really personalized in real time for every single display. Okay, so let's talk a little bit more about how you do that and what you do. Obviously using Mongo as part of that infrastructure. So we'll talk about that. How do you use Mongo? Well, we use MongoDB to store what we call the product catalogs. And as you would expect, those are the product catalogs of our customers. And this is really the starting point of our platform because at the end of the day, we need to show product images and product information in the banners. So we need to ingest in our platform this product information. So we have product feeds coming from our customers. So over 3,000 major e-commerce companies worldwide. We work in over 30 markets. So we ingest this information into our platform. And we use MongoDB to do that. And then it will be fed to our web servers and it will end up in our banners. Okay, so how does the system figure out what banners to serve and how frequently does it update and refresh that model? So we have our own algorithms. We have what we call prediction algorithms who are used to decide whether we should buy some advertising space in real time for a given user. So the decision is really, is the price of that space compatible with the chance of generating a click? Because that's what we wanna build, right? Clicks. And so we have those algorithms and if we decide to buy that space, then we need to recommend products. So what are the products that we want to show at this given time to this given user for the best chance of success? So really our technology relies on algorithms and data. And part of that data is product data that we store in MongoDB. Okay, so you get paid for the software to be able to do that. Your customers get paid for the clicks and your customers' customers get paid for conversion. All right, so at the end of the day, the better you are at your job, the more conversion that occurs and the more value gets created. Exactly, and that's why we're calling that performance advertising because to be very clear, we only get paid, Criteo only makes money if we deliver clicks. If we buy advertising space to display banners that don't generate any clicks, then our customers or advertisers don't pay us. So I keep saying we need to be very smart or we'll be very dead very quickly. Yeah, so you're arbitraging that opportunity in a way that allows you to be profitable. So as the VP of Engineering, your role is to work in the product, the architecture, all of the above? Well, I'm the plumber. Okay. So my guys and myself, we're trying to build a highly scalable platform that can serve over one billion ads every day in over 30 countries. And it keeps growing. You know, we're still a fairly young company and the growth has been spectacular and it's a constant challenge to scale both the infrastructure and the applications to keep up with the business. So certainly there's two real critical parts here. I mean, certainly there's scale, but it's the analysis to determine what to display. And then of course the actual content that you're going to do, the data associated with what you're going to display, which you're storing at Mongo. So I was told you've had a growth rate of more than 200% in five years. Now, I don't know if that's accurate. That's a big number. It sounds crazy, but yes, the company has a storage experience. No, no, this is revenue. Oh wow, that's even better. 200,000% and you know, it's difficult for me to even comprehend that number. But yeah, as I've said, the growth of the company has been spectacular. We have now over 700 employees. We have offices in 15 countries. Europe, the US, South America, Japan, Korea, Australia, et cetera. So we have a global presence. And so to handle that, we need to have global infrastructure. And you know, that's a large part of my job. Again, making sure we have proper resources in Europe, in the Americas, and in APAC, to keep growing and keep scaling our technical capabilities and eventually the business. Right, so are you running in the cloud or are you running? So no, we're like bare metal. So we have, today we have seven data centers. Three in Europe, two in the US and two in Japan. We rent hosting space and we buy power and we hope it never goes out. And we do everything else ourselves. So maintain, buy the hardware, deploy it, maintain it. Part of the team is on duty 24-7 because you know, there are no business hours for us. Yeah, yeah, yeah. Yeah, so it's really a crypto platform built from scratch and operated by us. So let's talk about Mongo a little bit. I mean, as I mentioned, Mongo plays a critical role here in storing the data that allows you to serve up these ads. So walk us through a little bit how, or why you kind of went with Mongo and what are some of the attributes specifically about Mongo that allow, that is well suited to the workload you're doing. So initially we were using a Microsoft SQL server and it was fine for a while and in early 2011, the growth of the company, the number of customers, the size of product feeds, just the traffic was growing very, very fast and we had a number of technical difficulties with our SQL server and that was the end of the road for that technology with us. And so we looked at alternatives and we were very, very keen on using open-source software. We're strong believers in open-source, especially myself. And well, we had our evaluation matrix and MongoDB came on top and what we really like about MongoDB is how easy it is. Everybody today, you know, repeated that and I can only confirm it's easy to use, easy to deploy, easy to manage and it has built-in scalability, high availability, which is very, very important for us, replica sets, sharding, et cetera. So yeah, a lot of the nice attributes that we were looking for. So yeah, you mentioned scaling. Let's talk about the scaling a little bit. How does MongoDB handle that problem? And again, I'm sure as your business continues to grow, hopefully at this remarkable pace, scaling is going to be a major issue. It's always the number one issue as far as I'm concerned. Regarding MongoDB's scalability, the key thing is really shard, shard, shard because that will allow you to spread the traffic over multiple masters for rides, multiple slaves for reading and it's really the number one thing for us. So yeah, we do have quite a few servers to do that. And now in terms of the, talk a little bit about the analytic part of the equation where you're doing, you've got to do this in real-time analytics. I mean, we're talking sub milliseconds. You've got to make a decision on what's the best ad to display based on who the user is and what your inventory of ads are. So what's the kind of infrastructure and databases you use to spread that? No single technology can do the big data, if you want to call it that way, analytics and the millisecond scale processing. So basically the heavy lifting is done in our Hadoop clusters. We have a multi-theta byte clusters to crunch the data. We get about 20 terabytes of additional data every day. So we need some storage and some processing power to do that. The results of those jobs, of those prediction and recommendation jobs done in Hadoop are then fed to a caching layer. So we use memcache and the typical technologies to do that. And those cache servers will be queried in real-time by the web servers. So obviously when you have to take decisions in the few milliseconds, you have no time to query any kind of database. So you have to access data directly in caching. So it's a multi-tier infrastructure. So Hadoop for the really, really heavy lifting and crunching, MongoDB for product information that is fed in Hadoop. And then caches and logs and lots of commodity servers and scale-out architectures everywhere. So there's a lot of discussions going on about sort of ad tech. And I want to come back to what you guys were just talking about, you know, Jeff Hammerbacher's famous quote, the best minds of my generation spending their time trying to figure out how to get people to click on ads. So that's a compliment to you because you're one of the best minds of his generation. I want to know about that. I want to know about that. Hammerbacher is such a buzz kill. No, he's kidding, we love Jeff, he's been on theCUBE. But one of the discussions, and you guys were just touching on it, is this notion of bringing together analytical and transaction systems into a single database. And you alluded, if I understand you, that there's really no system that can do that today. But there seems to be a lot of attempts to do that or discussions about doing that. Do you see that as something that is folly or is that actually near-term going to be a reality? Well, there's a strong push to reconcile the big data batch processing with slightly more real-time constraints. So Hadoop is totally batch processing, but then you've got this storm extension that's been released. And it allows you to do stream processing on your data. That's very interesting. But this is still very far away from millisecond processing. MongoDB had a map-reduced framework from day one. Not very good on all accounts. Now they've deployed their aggregation framework, which is similar, and it's a massive improvement. But again, at the scale at which we're running, it's just impossible to use one single data store. So yeah, everybody wants big results in very little time. So, and there's a lot of interest. I'm sure technology and startups will catch up with that. But as of today, we don't see any silver bullet, and we have to use this multi-tier architecture. How often are you able to, let's call it reprice, change the pricing in the system? All the time. Every single impression is evaluated independently. And what I mean by that is if you look at two web pages in quick succession, and you have banners on both pages, then every single display will be evaluated independently. So our arbitrage process is really real-time. And our models are refreshed multiple times per day, but every single display modifies the state of the model. And if we show you 20 times the same banner and you never click, well, we have to learn from that very quickly and stop buying that space because it's, you know, we're just losing money. So it's a combination of heavy lifting, heavy crunching to pre-compute, I would say 99.9% of the problem. And then the last tiny part of the decision need to be made in real-time because this is where we know exactly what we could show you in what context and we want to do that in real-time because that's the most relevant time. It's a very competitive space, you know, the whole ad tech. Obviously, Mongo, you guys do well there. We just interviewed another company yesterday, Velocity Aerospyke is doing some stuff. And as well, IBM Labs invented this technology. I want to ask you about streaming technology. And then there's actually been a spin-out called H-streaming, I don't know if you've heard of H-streaming, it's a company out in California. And my understanding is essentially what they do is they allow you to make decisions as the data is ingested before you persist it. So I'm wondering what you think of that approach, that technology, is it something that you've looked at? Do you have experience with it? Do you think it has merit? Well, the technology that Crito uses is in-house technology. The code for our business apps is our own code and we don't rely internally on any third-party solutions. So people always ask me, are you using some kind of BI software or analytics module or anything like that? And no, we don't. So that explains how you're able to reprice so quickly. Yeah, we do use third-party technology like Hadoop, MongoDB and some more. But it's really a technology, we're not about to rewrite all of that, thank God, the file systems and Linux and... No problem there. But the Crito technology is really our technology. So, and I think that's a strong advantage. And this is also the reason why we build our infrastructure because we think we can tweak every single part of the platform and extract value and save milliseconds, et cetera. So there's plenty of different products and solutions, but we're a technology company, we have 170 people in R&D, and if you include all the engineers in the company, it's over 300 engineers working on the product, working on R&D, on infrastructure, on QA, et cetera. So it's pretty much 40% of the companies are engineers. So we build it, we run it, and we like to have full control of everything. Are you building that type of technology? That what I described as a stream, in other words, the ability to make decisions prior to persisting that data, or is that something that we do? Yeah, part of the data we use and part of the algorithms we use are really, again, strictly real-time. So if you saw something a few seconds ago, we'll know because we'll have that information in a cache somewhere, and we can use it immediately. And eventually this gets persisted to logs and then ends up in a dupe, et cetera, but every real-time action by a user may be used very quickly to modify the profile and take more intelligent decisions. So you've obviously got a lot of smart people working on this problem, so what's the next way of innovation in the ad tech business? What are we gonna see next? Is it just serving the ads up faster? Is it just making them more personalized, or is there something else? Another way you kind of innovate in this area? Sure, of course, we always strive to improve our existing model, try new variables, and inject new data. And so you can tweak that thing endlessly, and we do a lot of A.B. testing to prove it, or prove it was a bad idea sometimes. And so there's an endless stream of improvements. And the two things we're trying to improve are the click-through rate and the conversion rate, because at the end of the day, advertisers want sales. They're happy to get clicks, but those clicks need to become sales. So conversion rate optimization is an important thing for us. Then there are all the products we could work on. Mobile comes to mind. It's a very interesting challenge. Yeah, so there's plenty of different topics to be explored. We're not out of I.D.s, that's for sure. Awesome, hi Julian. Well I really appreciate the information stopping by theCUBE, we've got a great story, doing some really leading edge stuff. Thank you very much. And appreciate the collaboration with Mongo. So keep it right there everybody, we'll be right back with our next guest. We're winding down here, this is a full day wall-to-wall coverage of the MongoDB days. This is theCUBE, right back, this is Dave Vellante with Jeff Kelly. See you in a minute.