 Welcome back amigos. Great to see you here. My name is Usheen Lonnie. I'm delighted to have been your host here this morning on Track 2 at the Big Things conference. Now remember we have a digital treasure hunt and you are collecting points just by logging in and just by joining us for our amazing next presentation. Now I promised you at the beginning of the day a power trio of sessions and so here is the grand finale here on Track 2. I'm delighted to be welcoming two gentlemen who are going to talk about implementing a multi-cloud customer data platform or CDP in Google Compute Platform and the Mighty Salesforce. So this is actually a production use case of a multi-cloud CDP which is elegantly walking the line between personalization and anonymization. So prepare to get yourself connected with making science's design approach and you too can deploy an advanced and agile CDP in a cloud agnostic manner. Do remember to get your questions in as the presentation is happening in the chat window. We do our best to ask these to our rock star speakers. Do remember to share the love using the hashtag BigTH21. Okay so it's my pleasure to welcome to this stage Kevin Daly who is the first party data director at Making Science. Hola Kevin, he is joining us via the magic of Zoom and Juan Manuel Pozo who is the senior data ops engineer at Making Science. So Kevin, Manuel, join us on the virtual stage. Thank you. Okay now using the magic of technology we are going to beam straight back. We have teleported Juan Manuel into the studio, into the mothership here at BigThings Conference and we have Kevin joining us remotely. Bienvenidos, welcome Kevin. How are you? Fantastic. Good afternoon. Muy bien. Okay, bienvenidos. Right, so I'm going to step aside. Kevin, we're going to hand over to yourself for the intro and then Juan is going to take us on a deep dive of the text. So Kevin, welcome. Thank you for joining us. The stage is yours. All right, thank you. So as indicated I will start off a bit fluffy. I'll give a bit of background on the drivers positioning and why some of our customers at Making Science are requesting a multi-cloud approach to CDP. Manuel, let's go forward one slide please, since you're driving. There you go. Actually, we've gone a little too far. In any case, not all of our customers are demanding a multi-cloud CDP but year over year, because we've been in this space now for quite some years, we are getting more and more interest and it starts to fit in with the noise, the drivers, the movement about data democratization. I'm an old guy. I go back in time pre-internet, if you can believe that. I'm sure most of you are post-internet. But I remember what it was like to have to look for data, to look for information, having to go to a bookstore, go to the libraries, talk to friends, talk to colleagues, to pull information together. Now with the internet, we've all got it just at an easy reach and that's part of the data democratization process that's starting to occur with businesses. They're saying, why are we just leaving the data sitting with the BI team or the IT team and letting them farm or mine that data. Let's get it out to all the business units. As we know, there's more and more tech-savvy people coming out into the industry. So pushing that data out into the business units does make a lot of sense as a businessman. We're seeing that tools are also getting easier to use. Every year, all of the BI tools are getting easier to use. So your analysts are finding it far, far easier to work with the data. These are the drivers in any business that wants to become data-driven. But we can't forget about an age-old problem and that's about data governance. The data governance is one of those things that still has to be in play. The data governance is all about making sure your metrics are calculated consistently for the business. There's a clear definition of how those metrics are calculated. You've got the controls in place to make sure you've got no holes in your data, your data's current, you've got no duplicates, all of the normal things. So we can't forget about the unglamorous part of data democratization and self-service analytics. These are underlying elements, okay? Forces moving us in this area of a multi-cloud CDP. But before I get there, you may ask yourself, well, what's the real value in the data democratization? Because between Manu and I looking at the same data set, our same data, we can both identify and potentially bring to the business new insights, very different insights over the same data. So it really, it's making the business more efficient, but it's also helping the business make far better discoveries over the same sets of data. Let's go forward, Manu. As we talk about a customer data platform, we talk about first making that data available, getting the data governance there, making my tools available, making sure I've got all my metrics calculated, that is accurate. But making science, it's not about having the data and the insights, it's the next step that really gets interesting and brings value to the business, and that's activation, okay, or activating the data. So taking action with those insights. And this is where many of our customers, it doesn't matter whether we're talking about finance, or we're talking about telco, healthcare, insurance, there's some types of actions that we may be taking with that data that sits over top of sensitive accounts or sensitive information. And that data is going to be typically sitting within, what we've called over the years, a walled garden, okay? In this customer that we're going to look at today, it's an environment where we've got a Salesforce marketing cloud, a Salesforce service cloud, and a customer data cloud, which is a correlation of information from all the sources, websites, back office, back office CRM, back office marketing, bringing it all together into that customer data cloud, but it's all still sitting within the walled garden, because in this customer there's a lot of sensitive information. Sitting just outside of that walled garden in our CDP as a concept and architecture, we've got the analytics area. And this is where we really turn the data scientists loose, the business analysts loose to do their magic over the top of the data. But when they do their magic, they make those discoveries and they want to pitch back into the customer, maybe a marketing campaign, a cross-selling idea, upselling idea, maybe an adjustment on an account. That information is going to be communicated back into that walled garden. And today, our discussion with Manu, we're going to take a look at how we move data from the walled garden into that marketing analytics part of the CDP, and then how we take the insights that might come out of that, an offline scenario or continuous analytics scenario and pushing it back in to the walled garden. It's an interesting, it's simple enough for you to understand. Technically it's a bit challenging. Most customers are focused on just getting a customer data platform in place to enable the marketing teams and the analytics. But here we've found a nice balance to keep the IT community happy and the marketing community happy within the same customer. And Manu, if you go forward a slide. If we take a look at, what are some of the characteristics or the differences between those two clouds? This slide for me sums it up pretty well. The customer data cloud is, it's an IT space. It's typically managed, run by the IT departments. I'm an IT guy by education and background and experience. IT personnel typically are very focused on security, stability, reliability, controls. The customer data platform, the Salesforce platforms, your online transaction processing systems, your accounting systems all fall into that domain of that customer data cloud. On the other hand, the marketing analytics cloud, a piece of your CDP, that's all about agility, experimentation, speed first, fast fail. I'll say anything goes when you're trying to make those discoveries and insights. And again, what we've done with our CDP approach, multi-cloud, is making sure that we are serving the governance requirements, the legal requirements in maintaining the private attributes, very confidential, very controlled, a limited set of accesses, very known rules in terms of how the data can be utilized. And on the other side, we're making that anonymized data available to the marketing teams and analysts to work very fast to try to look at new flows, new discoveries, new insights, and then pushing that information back into the customer cloud for the route of activation. And on that, I'd like to hand it over. We're going to take you through, I've got one more slide, Manu, go in this slide, starts to present, okay, peeling the onion a bit deeper. We're not going to hit every one of these boxes today. Some of this stuff is very, very traditional in terms of, I'll call it ETL, so you've got your transmission, your events processing. Where we're going to focus today is in the middle area of how we're going to be moving the data between the two clouds. So again, we're not going to try to do it too deep on the far left or the far right, because we're sort of limited on time. We're going to look at how we're going to be moving that data between the two clouds, both taking the data in a transparent form into the customer data cloud, anonymizing certain attributes of that data, pushing that anonymized data down into the marketing analytics cloud, storing it, allowing the insights and analysis to occur, and then showing how that anonymized data comes back up into the customer cloud for de-anonymization and activation over the given channels. So on that, I'd like to now turn over the floor to Manu to take you on a bit of a deep dive into some of these components and how we make this happen. In a real client environment, it's following this philosophy. So before starting the presentation, I want you to think about a batch loading data, receiving the files in CSV, and processing a lot of millions of data in batch. So first of all, we need to focus in one part of the flow. In the customer cloud, you can see there the API mapper, which is called the API mapper, but it's really an app engine which has the mappers to be able to recognize the file origin and how we have to process it, okay? So we are going to go deep in it and we are going to see an example. You can see there how are we mapping the CSV file. We are mapping the fields needed, so we also can check if the CSV file is correct or not before starting the processing. Also, for each file, we have a property that tells us if we have to anonymize it or not. For example, in this case, the only field that has to be anonymized is the NIF, which is the client identification. With this mapper, we can generate a new file with the sensitive information of the client anonymized, and we can move it to the market cloud. You can see that there is a lot of other parameters in the JSON file that we get from the app engine, but those parameters are used in the marketing cloud just to transform, for example, in this case, the dates to have them in a correct format for us and for the data science team. The system that receives the information is the Kubernetes, where we deploy our code, we call the API mapper to recognize the file and to know which parameters do we have to anonymize, and we anonymize it with a hash. It's a little more complicated because hash values cannot be the anonymized, but we can store the anonymized value and the real value to check it before, to check it after. This is a common method to be implemented. It's public, it's a public library, so it's not very complicated, and the next thing is to store the results to be checked after. We use Firebase because we want to have a real fast answer when we want to recover the values. You can see an example in the slide where we store the hash set value as a document to be able to use the fast property of the Firebase. Okay, once the file is anonymized, we create a new one in the Google Storage and a Cloud Function recognizes this save and informs our marketing platform to ingest the file. This is also a very common Cloud Function that is triggered by a change in the Google Store, so it's very simple. We just give the marketing platform the information of the file name and the storage, the bucket, where it has been saved. Now, in the marketing Cloud, we just have to do three tasks. One is to save in Google Storage the original file just for security matters. Another one is to load the information in BigQuery in row without any modification, just saving the timestamp and so on. The third way is to save the information in BigQuery, applying some modification like the AppyMapper modification which we saw before that we changed the type of the dates. We transformed strings to numbers, strings to dates and so on. On the other side, once we already have the data loaded, we have a team of data science that use Dataflow and Cloud Functions to compile information and to generate a response. For example, thinking in a common use that I'm sure you already live it, you buy something in a shop and then an email reach you with offering you another product. In the previous slide, we have seen how we have loaded the information of your put chase. The marketing Cloud is going to take your identification, it's going to search your put chases and it's going to generate a response to the customer Cloud to send an email offering you related products of your previous put chase. We don't know your information, your personal information, so it's the customer Cloud which is going to change that anonymized information to the real information. It is going to use Firebase as I said before and it's a very simple de-anonymization. This is another architecture that we are working on right now. It's going to be deployed, we hope, at the first quarter of the next year. It's basically the end of the architecture is the same, but we are trying to focus on self-oriented events, to receive events instead of batch events and the different models are independent in Google. They are self-controlled, they are serverless and it's more comfortable for us. For example, Kubernetes is very potent, but we are wanting to offer an architecture that is serverless. This is a more serverless approach. As I said, a more event-focused approach and that's it. If you have any questions of the architecture, I'll be pleased to answer you. That was very interesting. Thank you so much for taking us to the brilliant Juan Manuel and thank you so much Kevin for beaming in. I would just take issue with one of your remarks being the old guy. I think you're spectacularly young. I think it's all relative. I remember being introduced to a group of young folks as this is Usheen and he's from the days before the internet and I did indeed feel a little bit prehistoric, but what the hell, we've got to own these things. Thank you, that was absolutely brilliant. Folks watching at home, don't forget, you can get your questions in on the chat function and they will be beamed to our magical iPad here, but I just have some questions myself. We've seen some innovative data architecture there. This is an interesting, really interesting new platform, very innovative and it's about building new connections between data. You spoke about how this is going to be useful for folks in marketing, for folks in data analytics, for folks in IT. I'm wondering, this question to both of you, are there any use cases or uses of the platform or technology that companies are missing out on? Can I do some wonderful things that maybe many people don't know about or more companies should be using because it's a competitive superpower? Any insider tips for the folks watching at home? I'll start, Manon, you can come in wherever you want. In our space, which is very focused in digital marketing, there's been a bit of a paradigm shift and again, that customer data cloud, CRM systems, it's the area that's known as first party data and in marketing teams until recently, didn't really have access to first party data because it was confidential, it's very private. Again, it's following the IT rules to business. So what we're doing is showing some very large customers across sectors, the ability to move that data, take advantage of that data, they're the custodian of that data. They don't own it, the consumer owns their data. The bank or the insurance company is just the custodian, but allowing them to leverage that data to serve their customers better. In some cases, in this example, we were allowing better targeting of medical doctors based on certain conversations that were happening, applying natural language processing behind the scenes. The data scientists that put up the models and all that intelligence and recommendations in terms of the best doctor, they didn't know who, if it was Kevin or Manu's conversation that was occurring, it was all anonymized, but they saw the text, the conversation and they could make a recommendation as opposed to using a generalist, they could get to the specific doctor needed. That was a real big benefit that they were able to take advantage of in that first version of the system as of a year ago. Absolutely, I love the AI, machine learning, NLP under the hood to help businesses really super scale. In this world of interconnected ecosystems, as I was saying a bit earlier, you do need those connections and you do need that help for machine learning and AI to really bring your business forward. Similar question now for you both. Are there any new business models that you think could evolve from this technology? You're joining the dots, you're bringing in big data and machine learning and natural language processing. Do you think this could give birth to new businesses and are there any sectors or areas that you see could be a weak point as to... This one excites me because we, like I said, I'm an IT guy, I've got a lot of responsibility for hiring and all of us. We know, depending on who you are, if you're a manager and you're hiring, you're looking for talent and IT talent is very difficult to come by. If you're an IT guy, you want new projects, new opportunities and there was a time in which, again, the data in the walled garden, only certain people could have access to that data. Now, if we're putting it into an area that's governed, it's anonymized. I've now, as a businessman, I can now reach out to experts out in the field. I can open up my door to new IT partners to leverage the data, come back to me with insights. I've now just taken my IT team and I've made it virtual. I can reach out to anywhere. I'm sitting here in the north of Spain. You could be in the south of Spain. You could be in California. You really can take advantage of making your data available to the best resources wherever they're located in the world. Amazing. Absolutely. I think this is going to excite IT staff, excite folks who like building new stuff. This can be an amazing tool for both recruitment and also retention as well. Does anybody really want to work so much with silo data and where stuff is complicated and difficult to connect? Having this kind of access to open data and these connections and plugging in stuff like machine learning, NLP and AI can really just make things more fun for brilliant scientists like yourself. I like what you said in the intro there. You are turning the data scientists loose to really kind of build the future there. Folks, I'm afraid we have wrapped up our time. We've just got a little bit of time left until our amazing keynote presentation over on the main stage. I just wanted to say thank you so much to Kevin and Juan Manuel from Making Science for bringing some magic here to Big Things Conference 2021. Thank you. Muchísimas gracias. Thank you for joining us remotely, Kevin. Thank you very much. Muy bien. Gracias. Folks watching at home, don't forget to share the love using the hashtag BigTH21. If you learned something new about data science and about data connectivity, I certainly did using that last presentation. Please share your thoughts. Please feel free to tag me and Big Things Conference and use the hashtag BigTH21. And don't forget, there's still loads of time for you to win that brand new iPhone 13. Just take part in the gamification section of our website, even while you're watching the speakers. In fact, watching us right now, you're collecting points. How cool is that? Make sure to network, make sure to apply for some job opportunities and make sure just to get involved. Everything you do pretty much is going to give you extra big points to bring you closer to vouchers, books, and that iPhone 13. Okay, so that's it for Track 2 this morning. Make sure to share your thoughts on what you thought was a great session, like the one we just watched, like all of them to be honest, this morning here on Track 2. Share your feedback using the evaluation tool. Tell us what you liked, tell us what you want more of, and we will make sure that next year's event is even more awesome. It really does help us to improve, and we appreciate your time in sharing the feedback with us. Okay, and if you do win the iPhone 13, if you win any of the prizes, make sure to take a selfie and share it using the hashtag BigTH21. We're going to move over to the main track shortly. We're kicking off at 1320, 20 minutes past one here in Madrid, and we're going to welcome a real Rockstar speaker. He is the global CIO of Zoom, one of the companies that really has helped us all get through the pandemic and through the lockdowns. He's got some amazing, amazing things to share with you, and he's going to put some big ideas in your head and show you some big things for the tech awakening and for the future. So, my name is Usheen Lune. It's been an honor to be here this morning and to connect with you. Feel free to connect with me on any of the socials, and I'll be checking the hashtag BigTH2021. Tech is Awakening, and the future looks amazing, and I will see you there.