 Thank you for joining this very interesting session that we have today, which will be presented by Balvinder Kaur Khurana and Sushant Zoshi. So they will be talking about, you know, having real-time insights for the products, customer experience and the resilient platform. Without any further delay, I'll hand it over to Balvinder and Sushant. Thank you, Ajayal India and everyone for giving us the opportunity to talk. And thanks everyone who has joined in to decide, who decided to spend your Saturday evening listening to this topic. I was listening to Pramod's session, Pramod Verma, Dr. Pramod Verma's session before that and he said that it's a patience, build a patience capital as you work. And I think this journey what we're going to share with you has evolved over the last three years. So we have really seen not as much as in but some part of patience and we hope that we will be able to share learnings with you. So with that, I'm Sushant Zoshi. I work as a product principle with ThoughtWorks. Product is all what I do earlier. I used to be a product developer and then been part of a team who created platforms in product. And I have Balvinder with me. Hello everyone and thanks for coming here and listening to us. I'm a data architect and I also work as the lead for global data community in data. I stumbled into big data accidentally and since then I've started loving that work that I do. So I love how we can help enterprises with the magic of data. I think this is a very simple, some line of wisdom which everyone has shared with us. Start your day with your business dashboard. So whatever business you have, whether it is a simple shop or a large enterprise like Reliance, every executive would want a meaningful dashboard to be presented to them. And so it is true even for us in a normal life. I usually start my day with my Strava, that what's going to be there or what route and all of it. There's some program. And just to for a simple example, imagine that you want to go to an airport, right? And then what it serves, what information it gives you is wherever you are or whatever your starting point from there to your destination. How do you go to that destination? Say, what are the alternate routes available to you? What are the different modes of transport available to you? How long it is going to take? Are there any traffic bottlenecks? And if you want to grab something on the way, probably it will also serve. That's Google Map for us, which is part of our life day in day out. And what we're going to talk about today is synonymous to Google Map for the platform which our client was building. So we just created something similar for them. And the environment I'm talking about is that of a bank, retail bank, it's based in India. And this is pre-COVID days. Digital was kicking in. Payment infrastructure was mature. It means UPI and IMPs and any FD were there. Transactions were happening, seamless. And millennials and millennial thinking was taking into a frenzied. So what it meant is everyone was thinking more about customer experience, right? And for that, they needed to know their customer. Until then, everyone has done KYC for compliance. I mean, all of us have submitted the documents and we do all of that just for a regulatory purpose, right? But this was a time when people were looking to literally know their customers so that they can serve them better. And with that vision, the front-end engagement platform which was being built. And in order to realize this vision, what the ask came to us was, hey, can you create something or can we use the data what we have in order to know our customers better? So in literal sense, can I know my customer as that customer is on my platform? And in order to know my customer, it's not only enough to just know who that person is, but I also know what does the person's need, what are my systems today which are helping them to cater that need, which are helping me to cater that need, right? So that was the broader meaning of know your customer with which we started our journey. And probably Balwinder, would you like to add something on the tech side of it? Sure, definitely. So I think this is a very interesting way of putting it out, right? Knowing your customer not just for compliance, but knowing your customer holistically and at a deeper level to serve them better. And in order to do this, you need to capture a whole lot of information as Sashan said. So I just don't want the data about the customer, but I also want the data about my systems. And I also want the data when these two are interacting with each other, right? So technologically, if I break it down, yes, the first ask was to build a data platform and to also do it on cloud. And essentially, you know, each organization today wants to be a data driven organization. So for us also, the bank wanted that we've collected this whole lot of valuable data and can we use this data to build AI driven systems and serve our customers even better. So if I put it in the words, the next ask was to create a data strategy for our client and do both of this, the data platform, the data strategy in a way that they grow along with the business and the organization. Yeah. And so all of us have, I mean, those who are part of a technology would know any customer facing application which we deal with, right? It will have multiple elements coming from business, coming from customer care, coming from the support, right? And different type of questions are asked by different people. Just to simply, when I'm creating a product, what is attractive to my customers? What do they want, right? And as I go and serve them with those products, the different detail granular level information comes in, right? So like in our case, say, do we know what's happening in Tamil Nadu? What sort of customer behavior a paddy farmer might have and what sort of products I might have to build for them or how do the agrarian economy needs, what sort of loans they need and what sort of buying behavior they exhibit and how do they interact with the platforms, right? And remember India is a diverse country. Like we have a digital savvy people and at the same time, if I remember, we still have a lot of millions of feature phone users who are also accessing all the digital infrastructure we are creating. And in order to serve all of them, what was needed was a actionable insight which is coming just in time for everyone who is going to serve these people. The bank is a single entity which is going to cater to everyone, which also meant that can we cut down the time required to collect the information and serve it in the way it is required? So it has to be more of a self-service thing. So a branch working in Chennai would want to know, hey, how I did yesterday, what sort of loans are there, what sort of things are happening the day before, what sort of things were happening month before? And at the same time, the same information is also needed at an aggregated level, right? So all of that was there. But this is not enough because this is powered by a platform which is built technically. It means that it also has APIs and infrastructures, all of it is there, right? Which is growing complex day by day. So you also need to know that health. And what has happened with the digital advent is earlier there was a simple system, now it has become complex. There is a specialized technology, there are specialized roles who are serving different needs in the technology landscape. So everyone has very specific ask and very specific focus, right? From business to developers to security, all of them work in their own lens. And a few years back, the data generated was in the ledgers or in the paper form. But today, the data generated is immense. I don't know the exact amount, but probably a reasonable size bank's website and mobile lab must be generating a data in terms of GBs for their daily transactions, right? What customers take on their website. So for that, it also means that your data scientists have to operate on that data or a data analyst have to operate on that data and make it available for everyone to consume. And same goes for customer facing executives. I know a well-known bank in India today who if you go and ask for a historical statement, they just take that request, they don't serve it immediately. They build it in the backend and then give it to you. That sort of data we are talking about right now. And imagine if some sort of customer is sitting in front of you in the branch, you can't say that, hey, wait, let me get it it's a bad job. So it means the technology had to power all of them. So that's the explosion of personas and the explosion of requirements what happened in front of us just now. And pandemic just expedited it like anything. Yeah, I think Sushant talked about complex enterprises and how big they can be and we'll probably touch a bit more about that in our subsequent slides. But what you see on the screen right now is just the tip of the iceberg and just only so many users that we could list on one slide. But there are many more and they also have a lot more questions. And these questions and the answers to these questions help them do their day-to-day job. Not only that, all of these questions, answers when collected together achieve higher level organization goals. But the problem here is all of these people are working in isolation, they don't even talk to each other, maybe they don't even know who works in what department. That's the complexity of enterprises. And all of these people have only a limited set of resources, only a limited functional space in which they operate. So they try to attempt giving answers to these questions in that limited space. They try to solve these problems in that limited space and create a lot of isolated solutions. So let's look at some of these isolated solutions that these personas could create. So we start with the business level executives or the C level executives who are concerned about how their organization is performing overall, how is their business doing? Are there any leakages in their business and just high level business numbers? They could also have a lot of ad hoc requirements. So the data that served to them is mostly manually synthesized, collated and shared with them on emails and form of Excel reports. If we go a level deeper, we have product owners, these guys own individual business units and they want to understand how their business units are performing right. And the details that they need is probably a level deeper than what business executive or C level executive would need. So they have some periodic data being delivered to them about the KPIs of their own business unit. They could still have some more manual requests and tools like Google Tag Manager, Google Analytics, the data is pulled out from these tools shared with them on Excel and emails again. And another level deeper, the developers and the IT support guys, the people who actually interact with these systems, the infrastructure components and the services day in and day out. And these are the components which are hosting my business functionalities. Now the developers and the IT support guys, they need some real time monitoring of these infrastructure components because they cannot afford any downtime because it would mean that your business is not available for that much time. So they use a whole lot of different set of tools to get answers to their questions like how is the health of my system? How is the health of my infrastructure components? So they could create Grafana charts on top of from Atheist data or they'll use tools like EFK stack. And they need all of this information at the tip of their fingers in real time. So what we realized is each stakeholder is using an isolated type of solution using isolated tools. And while they're doing this, what they're also doing is this entire organization which has all of these different unique data points, even those data points and the utilization of data point gets isolated. So even though each stakeholder got some answers, but the bigger picture went missing. There is still a lot of silos, there is still a lot of fragmentation. The three pillars of any big enterprise or organization, which is the business, the data and the technology, there is a rift and they don't operate on the same plane. Essentially what ends up happening is I do have my systems which are available all the time, like the Six Sigma availability or 99.99% availability on systems and business. Everything is working well, but nobody knows how my customer is responding to this. I still may have a really frustrated or a pissed off customer. And there are longer term impacts of these piecemeal solutions. As we already saw that there is no consolidation, there is no coordination and no one is connecting all of these different dots. Even if someone could do that at some level, it still requires a lot of manual synthesis. And as we know, like human work is prone to errors. So it still sort of generates a very low confidence in the stakeholders. And so what Balvin that explained is something which is at a conceptual level and then like how it happens. But imagine this situation in a large organization which operates Pan India has more than 5,000 branches. It's played across all the states. It is not only operating its own business, but it's also part of the networks like the ecosystem which are quite there today. And it has its own different, different, different verticals, sub verticals to do the business, running just like an independent business. The size and the scale just increases the complexity exponentially. And while at most of the places we see that the technology department is horizontal, it's cutting across everyone, yet we have seen time and again that the technology department also gets aligned with the individual business units as they are driven by the their business needs. And Anisha in the afternoon gave a very nice analogy like if data is a new fuel, there needs to be a refinery which is going to just keep getting extracting it out for betterment. And we wanted to create something like that which will help bring all of this together, which will which will make all the data points what we talk about, right? It's not only about how my how my channels are working, how my what my regulator is asking, what my employees are doing, or what my ecosystems are asking, or the health of my API endpoint, put everything together and create some something which is unique for for everyone who needs that data. And when they need they look at that data, that data doesn't look fragmented, it's like a chain of events as it has unfolded, right? I'm not saying that today this isn't this doesn't happen, but today like Balvin said earlier, it is a lot of manual synthesis. It means if something I need today, I might have to request it a month back for that the event has to happen probably a few weeks before that, right? So our goal was to make it easy, make it quick, make it available real time and make it available in a way that it is consumable by anyone and everyone. A lot of places that tech people know like the big data and they have tools to access it, but for business people who are very friendly with Excel, how would they access this data? How would they make use of this data? Can there be a proactive alerting? And if I have a thought in my mind, if something I see observed in a branch, is there a way to validate it? That was something which we wanted to bring it through, bring it out to the solution. So it essentially was a platform which would help you to ask questions and get your answers yourself. And now when we are operating at that level, it wasn't possible just like, okay, build a point solution for it. We needed to think deep. We needed to think long-term and that's where we needed to see through things with the patients. We needed to talk to a lot of people, what are their needs? What are their wants? And while they express their frustration, which is immediate, we needed to see beyond to get all of it in the form of how they'll be served. It also meant they will get something which they have never experienced. So we needed to validate those concepts. We needed to bring in that mindset of validation and consumption oriented of whatever product we build. And something, the challenge what we were struggling with all throughout is the data what we're talking about is of different dimension, like a Grafana giving or a Prometheus giving a time series or a business application giving its business data or API giving just their standard logs. How do you make sure that all of this data is available to people who need it? And for that, it needed to be discoverable so that anyone who is on the platform can figure out what is that, what are the data sets they are interested in and they can easily reach out to it. And then they don't find a Greek and Latin there. They find something which is which they can understand. So the presentability was also key aspect there. So putting this holistically business and tech was something which we started thinking right from the day one. And one aspect which while we built a patient's capital, we always had scale in mind. And then by that we mean that while we started with the one product, one business product, we knew that the bank offers probably 30 other business product. So making sure that this plan whatever we build is scalable, it is easily portable and to an extent interoperable with the other business offerings as well. I think Sushant spoke about a lot of important principles that we had to keep in mind while we were building this data platform and a future-ready data platform. Let's look at the technical architecture bit of it and how we actually operationalize that data platform. So we start with source systems here. These are those source systems which were earlier isolated with all of those isolated solutions. So with the data platform that we were creating, we wanted to cater to all of these source systems and also probably the source systems which would come in future. So if bank expands its business, gets something new, my source system should be treated in the same way. And hence the data ingestion or the data integration layer that was created, it was with all those things in mind. So what we did was instead of catering to each individual type of source system, we cater to the patterns. So we would identify if my source system is providing data in a specific type, I will have a ingestion component for that and so on and so forth. So these source systems, they could have different text stack, they could have different architecture, they could provide data at different volumes, velocity, they could also provide me a different variety of data as Sushant spoke earlier. So I could get like really semi-structured data from social media versus my banking applications could give me a very structured information, but my ingestion layer can cater to all of them holistically. Once this data was consumed by the ingestion layer, it was stored as raw data in the data lake. So I think a lot of you would be aware with this data lake philosophy of creating the bronze, silver and the gold layer, which is what we also followed. Now storing the data as raw data ensures that we are serving the use cases which are required today, but we are also able to serve the use cases which would be required in future on the same set of data. So we're not manipulating the data as we got them from source system. This also helps us doing any kind of reprocessing, any kind of rollbacks, whatever is required. So to do all of this, we had a data transformation layer. So you have your batch, ETLs, ELTs, and you also have your real-time stream processing engine, which was the core of this platform that we wanted to provide all of this information in real time. Now the transformation layer, it takes the data from the raw layer and transforms this data based on how my consumers need it. And where this transform data is stored is also based on my consumers. So if I have data as a service, the APIs that want to consume my data, it would make more sense to keep that data in an operational data store versus if I have dashboards and reports as consumer of my data, and there could be some ad hoc querying, some ad hoc reports that would be generated, something like a data warehouse would be a better choice. This entire data platform was operationalized using sound data ops practices. So everything that you would normally do in a web application, we followed all that in our data platform as well. So the entire infrastructure was written as code. There was a testing pyramid which would test my unit, my integrations, and my end-to-end data pipeline. We would develop against the master. Each commit was deployed on the environments and being tested. Now there was a lot of thought on the non-functional and the cross-functional requirements as well. So this entire platform was written in a very self-service configurable way. So what I mean is if you look at the source systems, if today a new source system comes into picture and they want the data within that source system to be observable and to be onboarded into the data platform, the data team is not a bottleneck. You don't have to essentially write any code. There is just a bunch of configuration for which the domain team could themselves make those configurations and the data gets onboarded onto the platform. Similarly, on the consumer end as well, a lot of these consuming applications were having self-service capabilities. So the business users can decide what KPIs or what reports they want and they could do that themselves. What kind of data they want in their mailbox. They could decide and they can create those scheduled emails for themselves. Essentially, we also created business users as tech savvy users. So they were all aware of how big data platforms work. This is just a quick view of the tic stack that we used for operationalizing this. So we were on cloud and when I start speaking about cloud, there is one more important aspect that I want to touch upon is the security of the data. As we were going in the cloud and banking has much higher security requirements compared to other domains, it was very important for us to take that into the consideration as well. So just like everything else that we did configuration based in self-service, the security was also configuration based. So when my domain teams were onboarding the data, they could configure what kind of security they need. Banks always operate on zero trust policy and white listing versus black listing and all of this was configurable and domain teams could choose the security configuration for themselves and same for data quality. I think data quality also is really important for us to achieve right numbers here at the consumer end and it was also configurable and domain teams could say that what is good data for them and what is not like really good data for them. Another important aspect that I think needs discussion and mention is thinking of this entire data platform with domain-driven data boundaries. So essentially breaking down the entire data that I'm getting on my data platform right from the ingestion to the consumption and decomposing it using domain-driven data boundaries. Now what that means is I think we could draw analogy from the web services world and how microservices replaced big monolithic web apps right. So we had domain boundaries on these microservices and we created these small architectural quantum and we distributed the ownership to each team right. So each microservice had a specific ownership and the microservice could independently evolve. The domain-driven data boundaries and the term that is used is called data product. Thus the same thing for your data platform is essentially creates the architecture quantum for your data platform. So in our case you could see data products like auto loan, personal loan, credit card, etc and the entire ownership lives with the domain team. So the data platform team doesn't even decide how do I store this data or what KPIs I need that everything is with the domain team and the platform team, the data platform team helps them with the infrastructure required helps them with how to actually operationalize it rather than doing all of those domain specific handling. Essentially you could also compose these data products to create composite data products. So it's not that whatever you consume is whatever you ingest is going to be delivered as it is to the consumers but you can compose them for more evolved use cases and more evolved consumers. And it was interesting right because while we were working on this architecture and this data platform, ThoughtWorks was also coming up with this novel approach of creating data platforms using data mesh and data mesh is essentially a network of these different data products right. The important thing to note here is we're not going back to that silos again so some of you might feel that you know we came from collecting all of the data together in one place and now we are going back and distributing the data so it's not that by domain boundaries we don't want to create plain data sets but what you want to create is products and looking at data as products and a product with that certain attributes right. So those attributes are expected out of these data products so things like discoverability and traceability. So if I come to my data platform and I want to understand where does this data X come from I should be able to do that using the discoverability attribute of my data products. And there's a lot of literature available around data mesh and data products on on the net and you guys can go ahead and look at it. So what next I think we are at a very exciting juncture we've done a lot of this base work and the platform and the framework is ready and now there is you know time to do more exciting things so we want to increase the self-service expect of the platform even more. So there would be data governance and you know authentication and authorization aspects of it which would be also made self-service to the domain teams. We want to move into the AI direction as well as I spoke about earlier in the ask so anomaly detection and other things so if something is not going right instead of reacting to that problem can I proactively alert the relevant stakeholders that you know what this is something abnormal or not something that is expected of our system and can we look at it. Adaptive journey completions I think Sushant also spoke about India has like huge demography right we have people who are very tech savvy and we have people who are still using feature phones right so how do we create journeys which are adaptive so for you know type A of customer can I have a completely digital journey versus someone who's not very tech savvy can I have some sort of assisted journey and can I do it in the real time and extending that further can I do more personalization can I reach one is to one level can I do nudges and also I spoke about domain driven boundaries so can we make it even more better is what we're planning to do next. Coming to impacts and learnings I think whatever work we do we come out of it with certain learnings it's not that we are able to do a hundred percent perfect job and that's required as well so that we can do it even better in future so that some of the learnings that we got from this data platform I think treating data as a first class citizen you lose a very important time when you don't start looking at data being captured and how this data would be utilized from day one itself right so that was something that we realized while we were working on this staying source agnostic as I said that really helped in you know scaling to a lot of different consumers and sources and also both trapping our systems really quickly I think we reached a stage where we were able to onboard new products and new features in matter of two to three days and timely scaling consideration I think this is really important the way we started this data platform you could say that because I'm working on a big data platform I'll go big bang and I'll have like really big infrastructure which can cater to a really really large data but that is not like what we did we scaled just in time so we avoided the problems that you would have with you know late scaling and we also avoided the unnecessary cost of early scaling so we would regularly take a stock of you know what kind of scaling capabilities we need we could also automate that a lot so we would get that information you know as alert that there is a scale requirement and we need to scale a specific component so all of that was done just in time yeah and then just on that data as a first-class citizen I mean obviously data as a first-class citizen for data platform is always given but we're talking about a source systems or a platforms who are actually generating the data there's still a huge in what you can say the learning is required to start treating data as a first-class citizen and then the reason we say that is which is the last impact what we were able to create just listed here is a standardization of source platform through through this realization through this findings we could we could pass on the the way different teams have done the different things same thing differently in one platform right because a source platforms have hundreds of developers working on it they it's very difficult for them to do exactly the same thing but certain base principles everyone wants them to follow so through this data platform we could pass that pass back that information very timely and then that could drive the standardization of source platform and so far what we have seen is we've spoken extensively or rather little casually about different people using the data platform in their day-to-day job right this this meant that a lot of data democratization happened in the organization different people were able to just pull up the data right and the one thing which caught my what you can say the attention is earlier when we started people would say that hey can we ask this data to a data team or someone else and over a period of time they just said that hey just let's just pull up this data and now it could be anyone whether we're talking about a business or we're talking about a developer or we're talking about an intermediate leader who is who is looking to take some decisions they had the platform data at their fingertips which they could just pull up self-service and and get it done that naturally led to a data-driven planning on a data-driven product planning and this product planning is not only in terms of technology product so what what to build next but it could also start started giving an inputs to business planning like and we're talking about a digital business right not like anything else so so what sort of focus where should where should you put the focus your next focus what should be the theme of the month all of those could could start at coming through a real-time data as we spoke right and and one of the one of the times sorry one thing which another to mention is when we started we would have to ask the data of previous month just to make sense of it and see where we are going right today people who are observing the dashboard can spot if something is not changing for an hour and they just say hey something is not right things are numbers are not getting updated or last two hours we are seeing the same dashboard right so that's the kind of a mindset and a culture cultural shift I am seeing at the organization where this is being placed right so so just to us to say if you're if you want to go to a some place and you go to google map to see what are the traffic jams and what are the what are the different paths you could take what are what are the different options you have through this data platform today we are able to give them the same sort of map to the businesses and the technologies to visualize what's happening and make those timely decisions which will help them to do better that's that's all from us thank you for giving us a opportunity to present our work to you thank you so much everyone yeah thank you Balvinder and thank you Sushant for that wonderful and insightful session