 Yeah, okay so Today's case study is about innovation policy platform So we'll be discussing us like how did we go about and implement it in Drupal and how did we improve the data discovery process and Before getting started with the session. So today it was supposed to be two speakers Venkatesh Goteti and myself so he was not able to make it due to some Unforcing reasons. So here's a co-founder of Azadeh solutions And he's also the co-founder for an initiative called as DevTon wherein we are trying to Have college students discover something by doing it instead of reading the theory So you have a problem statement and they can come up with a solution on their own. So instead of having normal Basically where people compete. So this is more like a collaboration idea. So wherein people can come together So if there's one developer one team or they can sit together work on an idea for two days and come out with a solution which they can pursue over the time and he was also the he was a member of screening committee for what a phone mobile for good and He's he was also a part of hacking for governance workshop He was a co-anchor and he was also the session facilitator for curious design yatra that happened That's an annual event that happens the last one happened at Gover and About myself. I'm Gokul. I graduated from RBC from Bangalore. I started my career with Java I've been I was working with Java at my entry for two years I started with Drupal as a hobby and then I saw promise and moved to Drupal So it's been around four years since I moved to Drupal. I've been working with Azadeh solution since then I'm also the lead maintainer for a more module called as automator So what automator module does is it's a 360 degree integration for Drupal websites and marketing automator tool So generally what happens is you have the data that is captured But that is not used enough in personalizing the experience on websites What happens with automator module is you have an integration so that you have all the data that is available to you And you can make your site dynamic and you can personalize the content without knowing too much of new technology So what we have done is we have an integration with rules tokens and web forms So these are the models that everybody knows in Drupal So we try to build an integration in such a way so that they don't need they need not get used to a new interface again If they know Drupal, they they're good to go with automation marketing automation I'm also the co-founder for a small startup called as as it is it's a custom merchandise startup and it's again built in Drupal and I'm also a bowling addict So if anybody is interested to grab a game just let me know so today's overview will be discussing the background of Innovation policy platform and we'll also look at what is a problem statement and what is a call that we were trying to achieve I'll give you highlights about what are the things that we did and what were the value auditions and I'll give you a brief about the technology and the solution stack that we use and I'll be breaking down the problem statement into various parts and I'll address as to like how I'll cover How we addressed each of those problem statements So are the background so innovation policy platform It's basically a partnership between World Bank and OECD World Bank and OECD have lots of data about innovation So the policies regarding innovation they vary from nation to nation for example what works in a developed nation may not work in a Developing nation so that's one challenge and the other challenge is they have a lot of data So one is lot of data and each of those policy documents are huge documents So generally what happens is you have so much of data and people don't know what exactly they're looking at So for example, if they have a particular problem statement They want to know about the policy innovation policy in a particular country in a particular time frame It becomes very difficult for them to identify those and since these documents are so huge You never know what part of the document makes sense to you So these were few of the challenges that they had so basically they came together to implement innovation policy platform So This is the description of basically given by innovation policy platform itself So this was a pretty much it's a joint effort by World Bank and OECD To facilitate knowledge exchange and peer-to-peer learning on various policies specific to develop and developing countries by providing open data access through a rich knowledge Discovery layer on top of data warehouse of all OECD and World Bank data aimed at transforming Available data into strategic intelligence. So there are too many keywords in this So let's just skip that and I'll tell you what it means in a line It's basically a one-stop shop for all the policy practitioners and it should make their life simple So that was so this is the website that we came up with them So it was basically a consulting process We had to sit with them and understand what were the challenges and then based on those we had to suggest various alternatives, so this is a website and So we have a video let me just That was an intro about IPP So the first problems These are the problem statements that they have the first one is large number of documents related to innovation policy in developing and Develop nations and the major challenge was it was difficult to search as well as filter so the one part is search and the other part is filter both of them were difficult and And the major challenge is there was an integration between various portals and stakeholders So it's not just a single organization that has all these documents So since it's across countries and across organization It was difficult for them to communicate between each other have them all in a single portal so that everybody somebody can access all the documents at once So this was a goal IPP portal will facilitate collective learning of both conceptual and how-to concepts of innovation policy Tailor to the needs of developing and developed countries So the major challenge is you have all the innovation policies when a person comes to your side So how do you let him know how can he reach to the content that he wants and how do you personalize the experience for him? So it basically it was basically targeting peer-to-peer learning both north-south and south-north south and It has to mobilize global resources knowledge and expertise to help policy Practitioners across the world learn about innovation policy design Implementation measurement and evaluation and it will also identify the good practice practice solutions that are most appropriate for a specific user context So the technology stack that we suggested obviously it's a Drupal console Drupal was a Suggestion that we had made and the other part of the stack was we used easy rdf And we used raffle and jace mind map for visualizations and we used Apache solar for search And we used our web forms for surveys So we'll be getting into detail of each of this and how we use these modules and style and another part was bookmarks So let us break down the problem first So the first one is unable to locate exact policies that are relevant So how do we what was the solution to that the first part was improve the navigation? So So how do we do that? So generally different people have different styles of searching for data For example, one person could use search for other person could be would feel better with the visualizations So what we try to do was instead of deciding as like what is the type of navigation that we're using we try to come out with Alternative navigation so whatever the user is comfortable with so we can get started with it For example, some users might come to your website for a particular need So for example, let's say a particular practitioner from India in Karnataka. He wants to know something very particular So when he comes to the website, he knows what he wants So we need to give him a way so that he can directly get into the document that matters to him most But at times you can have somebody who is looking for general data. They just want to know how the What is the current scenario? What are the policies? How does it affect him? How can you make the decision based on that data? So this is for somebody who wants general general data So he also needs to have a way wherein he can start with the generating Then start going a step by step So once he makes a decision then he can go to the next step and then he can start making it more specific and specific So the first one is a browse by topic. So for example, now what happens is Okay, so let's just see what are the things that we did so the first one was browse by topic and the other one was browse by country and then we had an Hierarchical navigation and we also had visual navigation So for example when I say browse by topic for example, you can see the topics that are here So for example, few people are concentrating only on measurement for policies So how do you when you have a policy? how do you measure the effect of it and a few people are particularly concentrated about governance policy governance policy governance Public policy governance. So how do they go about that and for example few people are concentrating on ecosystem? So for example, let's say somebody is looking at only the financial innovation Financing innovation. So for example in developing countries there's an innovation policy. So how do you finance that? So somebody has a plan. So how do you go about financing that? So this is somebody who knows what he wants So he can come to the topic so he can go to that topic and he can particularly select that So this was one way of navigation And another thing is country. So most of these Policies they're limited to countries. So what might be good in a particular country may not hold in other countries So when somebody comes to the site, so they want to know what are the policies in let's say of Continent wise and then country wise. So what we had here was we had a map based Navigation so that when the user comes in so first they can filter by the continent and once they filter by the continent Then you have the each countries also and once you click on particular country So you have a country landing page So you have a country landing page where in again it links to all the topics and all other things So you have different entry point here And then When you come to especially when you have these Policy documents so most of them are not like a single piece document. So you have Hierarchical structures. So for example, let's say that you have policy goals and means policy making context So it's a huge data So when they make an entry point then we made sure that we have a hierarchical navigation for them So they have the basic description here and then you have something like a book like navigation on the right So they can go to the particular topic and then get an idea about the contents of All the contents about that particular policy and then start drilling down So though we had two navigations they felt that that may not be sufficient So for example, when somebody comes so the number of clicks will increase they may not know what they're actually looking for So this was a suggestion that we actually made for them. So it was a graphical navigation So basically you start with a particular topic and then you have all the sub topics of that coming down so for this we used Raffin jls and one of the challenges that we faced here was that There were more than Thousand topics that were already tagged in the website So you have so thousand topics and for the visualization Initially the model that we used it was fetching all the tags at once So it was overloading the visualizations and it was taking a lot of time So what we had to do was we had to break it down into layers So what we did was we did Ajax fetch so for example when you are at a particular topic It fetches only the topics at the first level and then when you click on that particular topic It goes to the next level and fetches those topics that was one thing and We made that suggestion and we did the implementation But still we had a challenge the challenge was that for a particular level So you have like around 50 to 60 sub topics that are available, which is very confusing for the users So we tried various visual navigations, but none of them worked out in this particular context So we had to sit with them and make a suggestion as to like For ease of navigation We had to suggest that at each level they can have only around 10 to 20 at max of topics And then wherever they felt that something was not specific enough then we created a new topic for them And then we moved all those things into that topic So we just made sure that at each level of the navigation there are only around 20 to 25 at max So that it's visually also good and it's easier for people to navigate Another thing is as I told you there are too many policies huge documents and difficult to get the context right So this was something that was very critical for them. So there was a search in place So you get you a document, but it's very huge and you don't know what happens So you don't know let's say that you are searching for India You have a document and the data details about India is on page. Let's say 232. So how do they know that? So that was a challenge So further what we did was the first one was we made sure the for example improve the search So so what I'm talking about particularly is in the extract fragments So what we did was we made a suggestion to them saying that for example Let's say that you have a document instead of the document show showing up as a single search result We created copies of them with context. So for example, you'll have a single document But let's say that the details about India is on page 2 So they create a new content. They create new content and saying that start page and end page So for example, they tag it with India and they say details about India starts from page 2 to page 3 And then similarly other for other countries of other context they create multiple content But what happens in search is so, you know that it starts from this particular location and it ends at a particular location So you're able to throw up more meaningful data for them and we also had in page In page document viewer so instead of asking them to download the whole document So we were able to show them the direct contact direct page that had context and that made sense to them so That was one part and another part was Since you have two minutes the data can be tagged in multiple ways. So for example, let's say that you have a document So you have a publisher for that But the publisher can be different and the actor who is working on that innovation policy data That could be different and then you have various types of documents that are available So for example, you have according papers. You have handbooks. You have consultancy reports You have case studies and then you also have publication dates and There are three things especially when we speak about date one is a publication date and the other one is about a year So generally there can be a document that is speaking about a particular time time frame So it could be in the past or it could be in the future and also all these documents They have like a shelf life So for example a policy that is available to you and makes context to you but it may not it may not make sense today so other part was like we had to tell till when a particular policy was valid So what we did was in the search we made sure that we have all the options available and Over the time what we did was we looked at what are the people searching for and based on that We made sure that all the important filters that people generally use we had them at the top But at the same time we give them all the filtering options so that they can filter by any of those And another challenge generally is that you when you have lots of documents So the data within the documents also has to be indexed so we used Apache solar for general search integration and For calling through the documents. We used Tika integration So that people search not only for the actual content on the website, but also what is within the documents So that is also visible to them in the search So this was a fragments part that was that I was discussing just now So basically you have a fragment start page and fragment end page So the same document they could create multiple fragments So we made sure that the document is not being duplicated But they could create multiple versions of it based on the context And another thing is so we did all this but still after testing it out for some time They felt that search results were not specific enough and they wanted more specificity And so what we suggested was so think like an explorer So basically what we told them was let's say that you're searching for a particular string So what are the documents that you would like to come upon come up in your search? So we just gave them a text field within the content type and we say just start filling it with the keywords that you would generally search If you want to reach this document So what happened was that they could add multiple search result entries here So they thought like an end user. So what would they search for what are the keywords that they would use So they started populating it and we made sure that in Apache solar This gets the highest priority followed by the title and other fields And another particular scenario in this project was that generally when people are looking for policy related documents It's generally they have a fixed requirement. So they're looking for a particular policy But they keep coming back and referring to those things. So there were two things that we did here The first one was since they keep searching for the same keywords again So we gave them an option so that they could bookmark a particular search query So that what happens is they know what are the search queries that they generally use They can bookmark it and tomorrow, even if the data changes So you have the query and they can quickly look at it, but the data will be updating regularly And also at the same time within the search results itself, we had bookmarking options So you have the search result page listing. So instead of actually going to that page So let's say they search for a particular query. So three interesting results shooter They can bookmark all those and then revisit them Generally what happens is if you have a particular policy document that you have drill down So if it makes sense for you, it might make sense for many more people in the similar scenarios So one thing is we had a bookmark manager so that they would be able to organize their data So it's not just bookmarking. They can create folders and also they can move it across and they can either have it as Private or public. So if they feel that something can make sense for other users So they can make the all the bookmarks public so that other users can actually come to their profile and look at What are the policies that would make sense to them as well and Bookmarking in Drupal one of the challenges that we faced was we use the existing bookmark module But the challenge with that was it used it could help us bookmark particular content and entities But as I told you when we wanted to bookmark the search results also So it has to be not based on the particular content or entity. It had to be based on the pages So we had to make some alterations to the module and we contributed that back so basically when you're bookmarking you're bookmarking the page instead of the particular content and The most interesting part would be this so basically they had the integration between various portals So World Bank and OECD were the major partners But they also interact with many other people who may not know the platform that we're using So for example all the development that we did so it was it was a project that we did for around four months time And almost in the three months. We did not have the content available with us So most of them had to be done under the assumption that content would come in a particular format So we had to make them a suggestion as to like what would be the Format that they would have to import or export the data So the solution that we had was we wanted to find a common format That's the first thing and then it had to be future ready. So today it could be particular Platform tomorrow it might it might be something else. We might have to speak to other formats so we had to decide a format which would make sense and We chose to go for go with RDF for side building and You might have few questions as to like, why should we go for RDF? Why can't we stick to the CSV format that we generally use so before descending on RDF We consider two things one is just the data import and the data export part and the other thing is Even if that data is not imported into any platform, it should still make sense So for example, when you have RDF resource description framework, so it's not just the data It also helps you identify the relationship between data So for example, let's say that you have a content and you have a particular vocabulary You can mention that this vocabulary uses to is used to tag the content and you can also mention in what way So that's why we chose RDF and an interest one thing is we use RDF Not just for the content of the website, but it was also for the complete settings of the website as well So for example, let's say that there were various search filters that were available So even which filter should be available and in what position should be available all those data was coming from the RDF itself So for example, somebody who doesn't know Drupal, they would still be able to change the settings without knowing the UI of Drupal So they know RDF format So they can just change the entries in RDF and re-import it and it would change the configurations on Drupal website So what we did was we used RDF and in specific we used turtle, so this was the interface we used So when you're building the whole site out of Drupal, so the first thing is you have to decide what would be the content types What should be the vocabularies and you'll have to import the settings related to that and then based on that You'll have to import the content. So this is like Dynamicity at two levels. It's not just one level So you are defining the content as well as you're importing the content based on those definitions So what we did was we separated that into schema.rdf So which would contain all the details about the entities that would be in Drupal site it was including content types vocabularies and search settings and then based on those Settings after that import is done then users could import the content. So Within RDF we chose turtle in particular and the major reason for choosing turtle was that it's very human-friendly and it's very readable So generally when you are speaking you use a subject predicate and an object so turtle also uses a very similar Syntax so you have a subject you have a predicate and an object so and so basically, you know What is the relationship between two things? Another thing is each of those is expressed as a particular web URI so including the subject predicate and an object So that's a web URI. So basically it can be accessed outside and it's an independent entity and The challenges that we faced was basically we had to decide on what would be the format of RDS So I'll be running you through few sample RDS that we used and I'll explain you what is a format that we use But the challenge was that if something goes wrong, it was very difficult to identify So logging was something that was very important We made sure that we had a different interface so that they could check logs of all Imports that are done and the major challenge was especially when the schema or RDF is being imported If there are any challenges that would affect the content imports as well. So Let me just open few audio So as you can see here, so the first thing is we had to leave them some common terminologies So even somebody who is not who is not Who doesn't know the terms in Drupal so they would also still be able to create the RDF files Which we can import. So initially you have all the definitions of the terms that we would be using So for example, let's say that somebody who doesn't know Drupal just looks at this RDF So for example, you can look at what are the various things here So for example, you have no which is a class vocabulary And then each of those we had to mention whether it's a class whether it's a property all this thing So that it becomes very clear and for example, you can look at each of the fields here So for example, whatever you have as a property here So this is basically described in the facets to Drupal so that by importing it knows what those are So basically field name is a property field type is a property Vocabulary is a property then whether that particular field has to be used in a filter or not So these are the properties and then basically you have few properties which are used to describe the nodes So we had common terms like a publication date. So what are Each of these things that you can see here each of those are fields that define the nodes So publication date is one. So then we have to decide whether it's new in IPP portal So since data comes from various other portals, so it could be new in IPP, but it could be imported from elsewhere So is the data new within IPP? So that was something that had to be Discussed and then for example, you have a document ID or so these are all the various fields that are defined in Schema RDF and once you have the schema.rdf imported. So basically while importing So there are various things that we need to consider the first thing is the field that is being imported from the schema.rdf So that field could already exist in the site or it may not So if it exists in the site then we if it doesn't exist in the site then we create the field instance So if it is already existing then we make sure that we connect it to the particular content type where it is going to be used so that was one thing and The major challenge that we faced was with related to the references So sometimes what happens is you have the content might be imported later So the same challenges that you generally face in migration So we followed the similar strategy and we had placeholders initially while importing So now what happens is let's say that you have a vocabulary term that is already referenced here But it may not be imported. So what happens is you have a placeholder that will be available It doesn't show up in the user interface but when you actually import the dependent data then it starts showing up and Regarding the content so we had they had a static site which was there earlier and most of it used to be HTML So from HTML we had to scrape the data and we had to import it back into Drupal. So for that we so for that what we did was While importing the content that used to be internal links and external links So identifying those internal links and external links was another challenge and if it's an internal link So that link has to be entered as a reference so that it becomes easier to navigate and filter So what we had to do was we had to look at those URLs then we had to find whether it's already there on the site So if it's under if it's on the site, we Ran additional code to convert it as a reference. So if it's not there, we made it as a link and then now normal external links So these are few other rdf files. So for example our document rdf So this is a document rdf. So as I mentioned so all the fragments start page that you see fragment start page fragment end page So anything that you see in the front end whether it is a setting or content. It used to come from these rdf files so review the review relevance by so for a So as you can see so these were all the fields related to the documents and it also includes the file import So we basically had to Have a standard folder where all the files were put and then we had to re-import import those all those documents and reference them in the content So just Open the source code that we use for rdf imports Sorry, schema file Yeah, that's right So in schema basically all the settings that we have in Drupal So that we had to have it in a way so that when you are going through these rdf files You know what to create out of it. So for example, let's say that you need to differentiate between whether it's going to be a content type So or whether it's going to be a vocabulary and then based on that What are what are the particular fields that you will be using and for example, let's say that you have taxonomy terms So you need to maintain the hierarchy as well So whether you have sub children for those particular taxonomy terms or not So all those things instead of using the normal Drupal standard terminology We use the rdf so that somebody who doesn't know Drupal also can just take a look at this and he'll be able to Get an idea as to like what is the content and what are the references between them? Yeah When you're using the rdf so there are various Standards that you can use so you can either stand use the basic ones or for example, let's say For example, so prefer label prefer label is something that comes with SKOS So for example, instead of creating your own terminology, so you can use those existing ones So that somebody who is comfortable with SKOS, they know that so for example in Drupal terms We can call it title, but in SKOS they call it as preferred label so that it's used across So you know that that is a label for that particular content So this is a particular field that is a so is new in IPP. So as you can see it's pretty much readable So basically you are defining what that is and then so the field name for that so these are the things that we gave basically So what is the field name that so the client can enter that and for example field type So these are the ones that we had to suggest so whatever fields that are available in Drupal So we gave them these are the entries that you can specify here So based on that it will be imported like that particular field So for example, if they call something as let's say field type is equal to date So that will be imported as a date field within Drupal while the settings are imported and so we also have descriptions like if it's a date field What are the possible values that they can give so it could be started, end date and those stuff and then so whether This particular field is going to be used in the search filter or not. So that's so basically This is the high-level description of what is being used. So whenever you're importing an entity So what is that going to be? So in our case it used to be any one of these things So you have node so internally content types, but basically we're calling it node Ok, we'll return content pages. So these content pages are basically They had HTML content that had to be imported so and then we had documents and then Regarding the topics so they had something called as top-level topic and then sub-children of that So that's something that had to be differentiated and then you have sector region So these are the major this is basically the top-level classes that we use and then within that what are the fields that are going to be So that you have it here. So for example So you have it as information architecture properties. So these are going to be the fields that will be used so For example has sector has country and then each of this has a definition again and one more thing that we For the settings so it was not compulsory that it has to be in a single scheme of five So it could be imported in parts. So The advantage of that is that you can break it down and tomorrow if you have a new interaction So instead of re-importing everything so you could just import the whatever has changed you could you put Basically So So So we have the same definitions, but when import so at schema level we just define it as what is a vocabulary of those things So in that we will have two entries So we'll have one entry for this particular sectors and one entry for topics and then within that so whatever you see here So these are basically imports for that particular content So particular vocabulary so for sector you have this import So for example agriculture and innovation would be one term and then these are the other terms that are here and then you have The top topic imports here and you have properties like whether it's a top level topic or not So these are the things that would be converted into Drupal fields and terms See for example, if you have a lower level term that needs to be reported. So you you basically have Okay, so this is that it is a subtopic of public CRD So that is referenced again here. So that based on that we cross-referenced that so we use UUID for Keeping those IDs unique and then based on that you we told them that this has to be something That's unique because based on that so we would be making the references So internally we're using UUID, but when the client was providing data So this was a format that they used to provide So we told them what would be the unique keys that would I mean for example, if you have another one with the same ID it would be overriding that So for each of that in the schema.rdf itself it would be defined as like what is a unique key that would be used for a particular entity So based on that they were generating this content So as I told you we had the content coming from outside also So when they pass the RDF file, so we also had to identify from where the content has to be failed So it could be one website was cable.com and similarly they were other websites from where the content was being failed So this was something that we're taking in dynamically as well So that we could directly ping that URL and then face the HTML from that process that and then re-import it into So this was a team that worked on project Mohan Sunkara who is also the founder of our company He was the account manager Venkatesh Goteti was a digital head So I was a project manager and technical architect and Chakrapani was a technical architect and The developers were Shyam Kumar and Shri Hirasham so That's it from my side. If you have any questions we can Okay. Thank you guys