 Hi guys, welcome to our session. I hope you guys are having a good time at Rupalcon. There's still about a couple of days left, so make them the host of it. Okay, so jumping right in, I'm going to quickly introduce ourselves. I'm Devanju, I head business at OpenSense Labs. You guys can call me Dev, but I am not Dev, I am Biz, he's Dev. Okay, so a few of you laugh, but I had this slide just to like empower me, so just to give you off to Biz again. I am Biz and I had architecture at OpenSense Labs, and so one of our key focus area is to bring technology, which is outside the periphery of PHP and Rupal, and use that to make better digital experiences. So that's where NLP and that's where IPFS was something we picked up, right? Awesome. Someone who can take a picture, beer is on me. So the agenda of the session will be very simple. We'll first understand what are the current problems of the internet with the dealing with. Okay, what is the common thread connecting those problems? Possibly put a solution out there via IPFS, and then connect Rupal and IPFS and let's think about that. So just to be clear, the agenda of this session is to bring the community together to be more accepting to technologies that are outside Rupal as well. Okay, so starting off with the code, there's more than one way to burn a book, and the world is full of people running about with it matches. So that was said by Bradbury, author of Fahrenheit 451. So he meant it in a more literal and metamorphical sense, that there will always be people who are deterrent to accepting new technologies, but we as an open source community should be more accepting, and see what are the possibilities outside of Rupal and amalgamating that. Okay, so these are the three network topologies that we essentially deal with. How the internet started out was essentially a distributed system itself on the right-most side, where the client and the STB server was handled by a group of people or a single person. So it made more sense to share data and files in a much more seamless way. How the web has progressed is towards the centralized system where a few organizations or corporations control how we access our information as well, which basically means that if that central node was to go down, all that access to information would go away eventually. And considering now we are at that centralized system, we want to shift back to either a distributed method or a decentralized thing. So what are the current problems with the centralized system? One is of course a single point of failure. So just to give you that perspective, so Google about three to four years back went down for about five minutes which brought down web traffic by about 40 percent. In Jan 2011, Egypt was able to shut down 88 percent of its web traffic because they control the logistics behind it. And that's sort of the power that those central nodes have. Censorship again, so Russia has anti-extremist laws. If you post or share post-opposition content, there are a lot of countries like China, Bahrain, Israel who control the sort of information you can access. There are bad keywords in China. Okay, so again, it's a censorship that's pulling away from the centralized system as well. Impermanence, so everyone's aware of error 404. So one is of course with how much information you want to share out. If it's given with consent that you're putting away information and archiving it, but what happens if those central nodes go down and your users are served with this massive message itself? Bandwidth latency multiple times. So according to me, this is one of the most important failure points of the current way the internet is panned out. With the advent of low-cost smartphones, what's happening is the way people can access internet is growing exponentially, but the logistics behind are not at the same rate, so which is causing a lot of latency. For example, if I was to ping a server in Netherlands from New Delhi, it takes about 200 milliseconds. If I was to do it to Iowa from New Delhi, it's about 350 milliseconds, and that time is eventually gonna rise because people who are accessing that kind of information, they will grow at an exponential stage, which brings to another problem, bandwidth. So for example, I have to download a one GB file, a movie file, and Vid here has to do the same. So we are collectively using two GB, and for example, you guys wanna do that, so we're using multiplied into that bandwidth as well. Now, if I have that file already on me, he can just access it from me versus doing multiple network hops to get that information in. Multiple devices, if they are, let's say, accessing that central node to access that information, okay, well, the channel is restricted, so it becomes slower for you to access those files. So that's another problem we're dealing with, offline. So we're still living in a world of online offline at the end of the day. That central node, I keep repeating the word central node because it's a centralized system that we're dealing with. And if that goes offline, you're not able to access that information. For example, that one GB file was there on that server, I'm not able to access it, but if he has it, why can't I use that? Why can't we use that together? Security, so again, so we think security has encryption of data, but what we're actually doing is we're just encrypting the route to access that data, not the data itself. If we were to do that, that basically means either we'd be security experts or we'd have to hire one. So we have to, in terms of give that power to ourselves where we are able to share, hide the kind of information we wanna do and then share whatever that's relevant to us. So those are the problems that we're dealing with and yeah, we can throw a laptop off, yeah. Okay, so what's the common thread that connects all of these problems? It's location-based addressing. If I was to access a file on my own website, I'd have to do multiple network hops. I'm sitting in New Delhi, okay? So this does a network hop to Europe first, then somewhere in the US and a couple of cities in the US itself to access that file, which means I'm accessing a physical location, okay? Why can't it be something much more simpler in that sense? So what's the solution, content-based addressing? It basically means I'm taking that file, passing it through a hash function which returns the hash of the file itself and it's almost like a digital fingerprint of the file. This is essentially what Git uses to track changes in repositories as well. We make the content, it's a content addressable, right? So which brings us to IPFS. So IPFS is essentially a protocol and it's a peer-to-peer method of storing and sharing hypermedia in a distributed file system itself. It started out with a mission or a vision to replace HTTP itself, which I personally, and that's my personal opinion on it, it will happen because the internet has evolved to a certain extent but not from HTTP, HTTPS, from content perspective itself. So advantages, so no content duplicacy. So what happens? So if I'm, let's say, uploading a file to the IPFS, it returns a hash, right? And he was, let's say, sitting in another part of the country and he uploads the same file, it will return the same hash value, which basically means I cannot have duplicate data on the IPFS network. Integrity, because the data structure behind, which we will cover later is, the data structure behind is the Merkle DAG. So if there was, let's say, in that image file, a pixel was to change, it will change the entire hash value itself, which basically means I will be able to detect if my data has been tampered with. There's, of course, high performance now because you don't have dependency on central nodes, but nodes around you itself, which means you will be able to access that information much more faster. It's cheaper hosting. For people who are hosting a lot of data, they have to pay thousands of dollars. It becomes a more cost effective way to use IPFS and distribute data internally as well. Censorship resistant, again, there's no central dependency, so nobody can block keywords. You can post as much content as you want. And yeah, keep it censorship free. Access to offline data, so those central nodes never go down. You will always have nodes with you. As of now, there are about a couple of thousand, 3,000 IPFS nodes that are active right now, and you can host data all over the world with these guys. So yeah, okay, Johnahill always saves the world. Okay, so who else is using IPFS? You bought, it's basically an identification system that uses IPFS to create an identity for you guys. Akasha Project is a social network that again uses IPFS to have censorship free content posting. Open Bazaar is essentially a marketplace where buyers and traders can transact online without any platform fees. They use IPFS for the distribution methods itself. Argo is, if you just put it across, it's Dropbox on IPFS, yeah? Okay, so this is where it comes in. Hi, so we have seen what Drupal, what IPFS is about. It's a distributed file system built over the core technology which Blockchain has, but it is meant for sharing files on a decentralized system. And so the common philosophy, Drupal is all about openness and content and giving editors the power, right? And IPFS is about keeping them, keeping them maintaining the integrity and keeping it immutable, right? And it's, it appeared very natural that these two things have to come together or might be very early staged at this point of time, but they will come together to probably solve bigger problems, right? And that's, so just let me quickly go through the internal data structure that is behind IPFS. This is Merkle DAG, it's a cyclic tree structure where each of the files is actually split into multiple blocks and each of the blocks have their own hash and the combination of the child hash is used to generate the parent's hash. So in this example, they've already mentioned that if you just change one of the blocks at the last level, try to modify it, it will actually change the topmost parent's hash itself. So that ensures integrity and that also handles redundancy of files. And this is something which is a similar behavior if you notice on a S3, if you upload two exact files internally as S3 on AWS, handles redundancy. Not sure if they're using Merkle tree or Merkle DAG, but yeah, something like that happens. Yes, so now we started brainstorming what would be the possible use cases of IPFS around Drupal or websites in general, okay? So let's, as a backend storage system, so Drupal file system, let's create two file entities, same image. We upload into different media entities and we have duplicate copies on the file system, right? If something like IPFS or the algorithm that Merkle DAG provides, we could actually, this is not a problem for everyone from small sites, but when you're storing large amount of content behind a dam, it is actually a problem. And that's where these technologies inspire us to put this method of creating the signatures using hash to handle redundancy. So that could be one use case in the space of internet and websites, right, applications. Then version control, so we already mentioned Git uses a similar, say, Merkle tree algorithm, right? But version control system for images behind dam, you know, large video files. If you try to upload similar video files, we can use a versioning system which is not actually creating two full copies of the same video file which has slight changes, right? So that could save and storage and have a better versioning system for media assets, large media assets. And CDN, this looks to be the most promising and interesting one, so we had this problem of network hops accessing a server which is based in US here and we are accessing from India or any remote location. That's where the CDN solved the problem to a point where it is replicating the assets to, say, eight, 10 data centers and you are able to access the nearest one. IPFS can be used to solve the problem to the next level where it's not only about eight or 10 data centers but it's across the world on a full IPFS network of 2,000, 5,000 nodes, right? So I think that's one of the very interesting possibilities that can be used in the very near future. Then we started more brainstorming, okay, how do we give the power what Drupal provides to content editors? So IPFS is still a technology which only very tech people, tech savvy people can actually leverage and that's the beauty of Drupal, giving editors and the content creators that power to manage and handle content. So this is a small blueprint that we made and there's a small demo based on that just after this. So it's a really simple, editors create the content on Drupal, we actually push the node object as adjacent to IPFS, we maintain the mapping of entity ID and the hash returned by IPFS within Drupal, okay? And that's what is again used by the users to fetch the content. Now there are a lot of problems in it. This is not something which will actually solve a serious major business problem right out but yeah, this is a good starting point. And so yeah, I have a small demo right there, sorry. So we made a small module which interacts with, so I would like to, so what we're doing is we are uploading three images on three different nodes, two of them are exact same images and it is being sent to IPFS, I'll skip this three images upload, I'll just show you the final result. So yeah, this is where we are uploading two exact images into the entity IDs are different but the hash returned by IPFS is exactly same. If you notice 30 and 32nd one, yeah. So this is what I was referring to, okay? So this is a smaller demo which is where we're trying to use this with the media system of Drupal, okay? And the challenges, now to solve problems, to make it actually usable in real world scenarios, there are certain problems which needs to be addressed, probably as I said, it's in the early stage, searching in IPFS, the only way to address the content is using the hash that it generates, right? So we cannot, it's not a full-fledged database, it's a distributed file system in the end, right? So we cannot actually search based on keyword or try to access the content in any other way apart from the, say, hash as a primary key, that's it. So that's a problem. And again, second is the complex content relationships, the flexibility that Drupal provides us to reference content, to reference to, we can create a complex hierarchy of content pieces, right? That cannot be handled very easily on IPFS, but that is something we need to probably, moving forward, might be solved. IPFS has views back in. Now, we love views, everyone does. And that is something which Drupal provides to end users to make things fast quickly. And again, the same problem that we have with searching, the only way to access the files on IPFS is the hash. I don't see a straight up way of doing it, but again, this is a problem which needs to be resolved. And orphaning of content on IPFS is very common. So for example, in the previous blueprint that we discussed, if we get to lose the mapping of entity IDs and the hash mapping, if we somehow lose it, the content which is already on, is on IPFS is orphaned. There's no way to figure that and find that out, unless someone else uploads that again and gets the same hash, right? So these are the few of the problems that needs to be addressed to where it becomes a real problem solver. But yeah, the other use cases, not necessarily around with Drupal, but in general as for archivist, who if you want to store large amount of data and make that permanently available, yes, IPFS is a good solution to store it's cheap. It's cheap and the performance is pretty good. If you're delivering large amount of content, as the example we discussed about the CDN, if you're delivering large amount of content to users or media to users, IPFS is a good solution. It's cheap and where you actually save a lot of money on bandwidth. Researchers are always short of funds, like running short of funds, and hosting is a large amount of research data is a problem. So now they need it to access them really fast and in a cost effective way. So for them, it's a good solution, it's a good way to approach I think. And content creators are just about the spirit of web keeping content out there without keeping it uncensored and at a low cost. I think for them it's a good start. Yeah, so Dave. Okay, so now we understood how Drupal can amalgamate with IPFS itself. So we're running at that almost as a threshold where the cost of producing content will go much higher than cost of delivering content. And Drupal is content centric. So, and if there's an alternative to avoid that, I think IPFS becomes inevitable in that sense. Again, like we said, for content creators, you get to empower yourself, what the sort of content you wanna share, the sort of content you wanna talk about. So because it's content addressable and we're using content signing, so DDoS attacks become impossible because there's no physical location to do those DDoS attacks. So again, you save money on security measures as well. Because there is no origin server and there's a lot of internet penetration that's happening in the developing countries. For them, it becomes much more easier and faster to access the web itself, okay? So now we as a community need to come about and think as to what all possible use cases can add on, okay? And that's about it from us. I think we wrapped up exactly in time. Yeah, okay, awesome. Thanks. Just thanking a couple of more people who are not here but helped us put this demo and this stuff together. So they are for people from our team. And I just met this gentleman from Wipro who they're actually working on something, putting Drupal and Hyperledger together. They're having a BOF, this is a BOF 4 at 345 PM I guess. So if you're interested in taking that discussion further, that is a good point to start. Okay, join us for Contribution Sprints Friday. Yes, definitely. And you can always give us feedback. Yes, we can open up to Q and A. Yeah, okay, awesome. Guys, any questions? Or we can, okay, oh yeah, okay. Nice work guys, thanks for presenting this. Is this module available for community to try? Yes, this is. Okay, could you mention that on your slide somewhere when you post them on your session? All right, perfect. So when we upload the deck, we'll upload it with the module. Perfect, I have a question. So I think you guys are mentioning this file system stuff from a public standpoint. Things are open. How would something like this work when things need to be, there needs to be access control? So I'm a researcher, so this is very interesting, right? So how can something like this work with access control? In some ways you have done a CDN, right? With hashing, right? But what about access control? Because that's the next level. So when we actually, the signature function, the function that actually generates that hash when you upload a file or any content on IPFS, you can actually use a secret key to encrypt that. So IPFS has that possibility. That's encryption, not necessarily access control. So you are saying, oh, our stuff is open, but nobody can get to it. That's, you know, I don't think many of the PHI data people would like that kind of, you know, encryption level access control. The access control means nobody can get the stuff, right? So different. The content is out there, the only way to access is if you keep it non-encrypted, only way to access is using the hash key, right? If you encrypt it, again, you need the secret key with it. But the more, ACL is more on the application side, right? And that's where systems like Drupal needs to be able to talk to IPFS in a way that provides that flexibility. When I was talking about complex entity relationships, this gets into that. Okay, perfect, thank you. Thanks. Hey, guys, thanks. So since this is, you know, sort of like you're investigating technologies that are not really business ready yet, right? Like you're looking very far forward. Do you have a roadmap or do you have, like, maybe some next stages? Where are you gonna build that connection so that maybe we have, like, something as a CDN or something we can use on our sites the next six months, year or two years? So this was the demo that we showed was just something we just put together in two days before the triple con, okay? But we have been researching and doing POCs around other blockchain technologies apart from IPFS. But IPFS was one we figured out that which is closest to solve business problems, right? And I think, for example, if you look at the, one of the examples they've mentioned was about Dropbox, which is using IPFS, something like Dropbox. So I think CDN is, we are not actively working on it right now to build something like that. It is more of a POC stuff. But I think CDNs should be coming up. Someone should be doing it very soon. Okay, great. All right, thanks. I think with the media module, okay? Implementing a simple versioning of, let's say the hashes itself and you can avoid duplicate large files. That should be the first thing that we should be actually getting to immediately after. Nice, thanks. Hey, thanks for your good presentation. I had two, like, related questions about sort of the way hashes work in IPFS. The first one is, I've heard a lot of criticism around, like, for instance, Facebook and other social networks that use hashing for files because when you delete something, it isn't necessarily ever deleted from the servers. It's, the pointer is deleted, but because they store hashes of everything to prevent duplication, the file is actually still out there. So I guess my question is one, what do you think about that? And then the related one is, if all the files are being hashed, it seems like that would actually, while it would make tampering with the files easy to detect, it would also make surveillance of who's receiving those files or the file traffic much easier because all you have to do is look for the hashes of particular files that you as a government or a agency or an internet company doesn't want someone to access and all of a sudden, the metadata has it all, right? So I guess I just wanted to hear your thoughts on those too. So about the first one, sorry, what was the first one? I just wanted to... Like if you delete something, it's not really deleting. So the guys behind IPFS, they are, so currently there is a mechanism where the content, which is not being accessed, so they age out and it's still out there, but eventually they age out and they are overwritten at some point. So I think they are working on improvising this aging out and deletion process, but it is not something that you can actively, okay, I wanna delete this, you cannot do that. It will age out because it is not being accessed. So that's what how IPFS guys are handling it. And because in the end, when we say that, it's permanent, it's permanent, but it could be overwritten because the number of nodes are limited and the amount of storage that each node is giving is limited, recurrently, right? When the number of nodes increase, probably the possibilities might change. So that's what this current state is. And about the hash itself, like surveilling the hash. If we are in a, for example, we are in this room and we have five nodes here and we are communicating and you have a file copy, which I need, it's just between me and you. It's not something where there's no central point where the transaction are being logged. Gotcha. Yeah, so yes, like the standard blockchain stuff that you actually can access the full ledger, right? There is, but not sure if it is being handled right now. It's absolutely like blockchain. So the problem you are seeing, persists, yes. Okay, thanks. Hi, so my question is, is it looks like you're still tying on a base level to the centralized system as far as like, I build a site and I have my images or PDFs or whatever hosted on IPFS, but that on the back end, but that front to end is still a physical location that somebody goes to at least get the base links. Do you have any sort of plan to offer, because hash is not a very distributable piece of information, you know, it's not very memorable, I personally can't memorize 32, 64, you know, 36 character digits. Anyway, so do you have any sort of implementation model for impounding a truly decentralized sort of, like let's say I want to publish a newsletter that would be disapproved by certain governments. Not saying I want to, just picking an example. Nobody here wants to. Is there a way that there could be like, for instance, an address that then goes to the decentralized, a simplified, almost like a bitly, for lack of a better example. Yes, so the IPFS organization itself, they provide IPFS.io, so that becomes your central entry point to the complete, you know, the files stored on the IPFS network itself, but you can actually put your own DNS server, which can directly talk to the IPFS system behind, without actually depending on IPFS guys themselves. Okay, that is one. Second, if you look at this blueprint, there's the right part, I missed that one. So this was something which I was also thinking about, in the end we are putting a centralized Drupal application behind all the content, right? We have all the content on IPFS. How do we promote distribution of that? So in that case, we can think of a prepackaged Drupal profile, which comes with mapping, which comes with the current mapping of the hash and the entity IDs. So all you need is a Drupal application which understands how to talk to the IPFS behind it, and has a first starting point of existing mapping. I think that is a workaround. I don't know if it's a full-fledged solution, that's why I mentioned, if public repositories of that websites, anyone can deploy that and access all the content. Now, within this mapping of the entities and the hash, they can further be put into IPFS of the full mapping, and then that can be accessed by, so you basically just distribute one key and it fetches all the mapping, and then as a next step, get to the content. Kind of re-interprets it out too. Yeah, so these are all workarounds which could be thought about, but... Because Drupal is all about dynamic content, so if I edit something, if I post a new article, and I want it decentralized across the IPFS network, that's the ideal. That would be the beautiful thing. So Drupal, the structure is pretty complex, so there are a few frameworks. So there are browsers who utilizes IPFS, so all you do is install that browser, it creates IPFS node, and also lets you publish web pages on it. But again, the power available is very limited compared to Drupal from an editorial perspective. All right, thank you very much. Last question. Great presentation, first of all. What I see, this is gonna revolutionize the internet, and I'd say about five to 10 years. Oh yeah, absolutely, I mean, because you see all this stuff happen in third world countries, not only from an honest, open interpretation of all the data, but also financially. So it's a two-part question, I'm sure, of business side and tech. Number one, I see this as potentially being the next Bitcoin wave of data, not only currency, but you're talking about data here. And what kind of investment opportunities do you see in the future for something like this business-wise? That's the first part of the question. And the second part is how do you foresee governments and other agencies sort of trying to circumvent this new wave of information that's gonna be freely spreading through the internet? So I'm gonna answer this later part. Open source is free, but... Not always, right? Yeah, so information will always be available freely in that sense, okay? Whether Drupal jumps the bandwagon or not, it's gonna happen either way. People will be publishing content that'll be censorship-free, okay? It's about how we as Drupalers utilize this new technology in place. Juan Bennett, who are the guy behind IPFS, started this with a clear cognition of to replace HTTP. So yeah, of course, there's no governmental control. There's no central control that's gonna happen, but that's gonna happen either way. Because we are moving to a world where information should be available more freely, okay? And there should be no central organizations or corporations that should be controlling it at the end of the day. And we as an open source community understand the value of that. And that's why it just only makes sense to bring them together. Could you give the first question again? Oh, just the business opportunity behind it. Yeah, investment, I mean, cryptocurrency was kind of a thing and I see this being similar, but with different intentions. Okay, so we wanna know, so definitely not being equivalent to cryptocurrency at all. So that's one use case of blockchain. IPFS is another use case, absolutely whatsoever. So if you're thinking in terms of, do you wanna invest something here? I mean, that's cryptocurrency and that's totally separate from that perspective. There's a lot of mining. There's a lot of proof of work that goes on there. Nothing in terms of this is basically a protocol to share and store files across a distributed file system itself. Right. Okay, so yeah, does that answer? Yeah, I don't know if that makes sense. Yeah, yeah, yeah. Okay, awesome, awesome. Thanks guys.