 If you Essentially the if your ecosystem has been growing at a very steady pace in the last couple of years But if you look at Some of these ether nodes websites which track the number of full nodes You'll see that that one is kind of constant or maybe even decreasing and why is that? Well first up because running a full note sucks, but and even though we are trying to solve that The issue is that people will always prefer running a light client opposed running a full client or running Infura opposed running a light client or even just using a restful API versus actually connecting to Infura and This is kind of fine People are kind of more or less aware of the implication security wise that okay if use your light client And the security is not the same if you use infrared and you kind of trust them You can maybe make a few proofs But you still trust them and if use something like ethers can then you know that you cannot trust them But yeah, you kind of trust them So people can make balanced decisions. However, one thing that people don't really Talk much about is what happens for example to metadata What happens when you actually go from running a full node down to running an restful service and how that actually Changes the behaviors or the traits that you leave behind you And that's exactly what I would like to talk about a bit in this talk Specifically, let's start with web 2.0 interfaces Now my talk is relevant for almost any depth that uses a Restful service as a background the reason I'm picking ethers can is not to pick on them rather because it's everybody knows what ethers can is How ethers can works? So what is ethers can just a quit in a two-word summary basically that allows you to check your balances your tokens transactions Events, etc. And apart from that it also allows you to comment on certain accounts if you like Now this is the surface. What happens behind the scenes? Well, whenever you for example, you want to check your balance on ethers can you Try to load the ethers can website which goes through cloud for your because cloud for there is ethers can accelerator Ethers can returns the balance and everything plus it also returns a tracker code for Google Analytics plus the embedded discuss forum and This might sign sound like a really nice design really elegant not too many external things included But if we start digging a bit into the HTTP protocol, we'll see a few surprises Notably HTTP has a concept of refer every time you visit a web page from another web page You get a tiny extra header saying that you came from this website The same happens if you embed something every time you embed an iFrame into your own little Website that iFrame will get a ping that you were actually coming from another website And why is this bad? Well, when you actually load for example in this page, it's a ethers can dot IO slash address slash Zero X whatever When you load that Google Analytics will actually have a ping that you this particular IP address loaded that particular ethereum address Similarly discuss also gets a similar ping that this IP address loaded that particular ethereum address and is that bad Well, if we explore what discuss loads, things start to get a bit weird Particularly discuss has three types of integrations has social logins to Facebook Google and Twitter Which means that discuss actually reveals the IP to a theorem address mapping to Facebook Twitter and Google plus And it also has these other weird trackers kind of like people market AI Mine mining market and all kinds of weird I can't read this bit tiny here, but you get the picture So it it loads a lot of trackers and not only that but it also loads a lot of I think discuss has 11 integrations with YouTube female and all kinds of other services and This is an issue because you are essentially associating your IP to Ethereum address mapping and you're revealing that to a whole lot of external services And again, this is not to pick on ethers can here any depth is vulnerable to the same thing It's just a bit clear to use with ethers can And okay now we know that there's a problem, but what can we do to actually fix it? So one of the possibilities is well first of all if you're a provider any kind of depth or service provider First up do not integrate legacy web 2.0 services and this might sound like a no-brainer, but If you think about it in ethers can ethers can has this possibility to comment on accounts Is that genuinely a useful feature for a block explorer? But if you look through the comments, it's pretty noisy. It's pretty spammy. There are a few scams going on and off So it's not not the best feature So maybe it would be worth to cut it out the second thing is that if you Even if you don't integrate external services or try to keep it to the minimum You should always wonder whether it is worthwhile to reveal identifying information in the URL You should always consider that Anyone can access the URL Let's suppose any if anyone can access the IP to URL mapping Do you want your customers data to be leaked that way or not? Maybe for a block explorer, maybe this is cannot be Circumvented but for another that maybe it should be and last of last but not least Please do use HTTP refer policy restrictions I for example ethers can use as a policy whereby HTTPS to HTTP downgrades Forbid the referral header from being forwarded. There are much more stricter policies, which actually forbid everything being forwarded Okay, we can do that and ethers can is actually they are trying to fix these issues Original ethers can integrated an external ad network that they replaced an internal with an internal one originally now currently They are using Google Analytics, but they want to replace that so they are really open to fixing issues however, the issue is that Providers fixing it is not really enough because we can get ethers can to fix it But can we get random depth number 2000 to fix it? Probably not so users need to protect themselves, too And the obvious choices here are browser extensions that block all kinds of trackers or if you're on a mobile device You could for example use a brave browser But these can only block so many things For example, if I have YouTube integrated know none of the extensions or browsers will block YouTube Which gets us to the second point that let's suppose we manage to block all the external services even YouTube, let's suppose we Handle if HTTP refers correctly and everything even then we have this fairly complex Flow of information whereby whenever we want to access anything related to our data Either via metamask a DAP or via my crypto or my ether wallet We still reach out to ethers can or infura or cloudflare in between and what this means is that ethers can infura and cloudflare still have access to the exact same information the exact same IP address to Ethereum account mappings now you may trust them or you may not trust them that is up to you it's an issue and Unfortunately, the only current solution that you can do is if you use an Anonymizer service kind of like Tor which is willing to hide your IP address behind a mixing network But kind of that's your best chance currently so Bottom line is using restful services is not the best thing so and if you think now my My conclusion would be to please use light clients Yeah, that's going to get even more messy So Let's dig into light kinds. Let's forget the rest of services altogether all kinds of depths and web stuff Light clients supposed to be the the way to access Ethereum so that you don't have to run a full node You can just run this really nice little thing, which is essentially kind of feels like a full node just it's not and It has actually two Significant problems. One of them is in the discovery protocol I'm not sure how many of you are aware of how the discovery protocol discovery layer works in Ethereum or In peer-to-peer networks in general, but in short you have a few boot nodes Hard-coded into the client when you boot up your node It reaches out to those few seed nodes gets a whole bunch of peers in the network and then boom you're connected Now in reality what happens under the hood is that these boot nodes aren't some magical special databases that track the entire network Rather they the network itself boot nodes included and all the other peers included they maintain So-called routing table on cadmlia DHT, which is kind of a fancy way of putting that everyone knows Little piece about the network So nobody knows the entire network, but we know little pieces of it and if you reach out to enough of them then eventually you can discover and you can You can discover enough nodes to have good connections and And have a stable stable connectivity. That's essentially the issue here is that in order to do that so Essentially a routing table requires to be able to tell that this This particular machine is at this particular IP address. That's what the routing table does The problem is what happens if My IP address is changing what happens if I have a laptop and not I'm not running ethereum on my home server Rather on my laptop and I keep connecting to the ethereum network from all over the world So if I for example go last month, I was in San Francisco last week I was in Berlin now I'm in Prague next week I'll be in I don't know London the month afterwards in Shanghai and I keep connecting to the ethereum network via light client Be that the mobile device be that laptop or whatever Or I can even run a full node on my laptop and do the same thing. That's completely equivalent What the result will be is that every time I connect to the network I am actually revealing to the network that this machine which last week was in Berlin this week is in Prague and Next month I will reveal that this machine which last month was in Prague. This month is in Shanghai and this is an issue Because this is public information so anyone in the ethereum network can at any moment in time see Where certain IDs are and not only that but if you are willing to do this for example every day just try to Scan the network every day then actually you can create an extremely accurate history of where each individual Ethereum node was moving over time and just nicely plotted on the world map now I'm not sure most people here would be comfortable having their three years of history usage mapped out on a world map But how can we solve this? Again, it's not an obvious solution one of the simplest thing is that as I said the main vulnerability is that Machines have a fixed IP address assigned to them And if you move the machine around then that ID ID is essentially the global tracker So we kind of made a cookie tracking on top of the discovery protocol So the obvious solution let's get rid of the IDs So just bin it now we cannot really bin it because it's part of the ethereum protocol But what we can do is make it a firmware all so that every time we restart the machine We get a new IP and when you start a new ID and this Has certain implications one of the implications that obviously if I'm running a boot node the ID cannot change So boot nodes need to have special functionality whereby the ID remains the same What happens however if I run a full node and I just restarted I did that update I restarted it Normally if the ID changes, that's not such a big problem in practice last minute we told in the network that This IP address belongs to a certain ID and now we're telling conflicting information and currently the theorem Discovery protocol cannot handle that so if you keep sending in that you restart your machine four times And you're sending in four completely conflicting information into a discovery protocol Then the discovery protocol will just go haywire and you will have a very hard time to this to connect to the network Yes, eventually maybe five minutes ten minutes. Everything will settle down But it is an issue and last but not least if we actually do the same thing with light clients then We're breaking projects that rely on this behavior. For example, one of them is VIP node and They are essentially trying to incentivize running full nodes by providing light servers for payment and then light clients can pay a bit and Then they get the light server services The issue here is that the authentication mechanism they use is that their payment is associated with their node ID So take away their node ID and we just blew up the VIP node business model and This is I'm not saying that this is cannot be solved What I'm trying to say here is that these really trivial issues and trivial fixes can have very very heavy implications on certain players in the network and Often we as client developers don't want to ruin people's days and people's projects Anyway, so that's the discovery. It is an interesting interesting problem to solve now if we go further deep into the light clients then How do light clients work? Let's suppose we forget about discovery. Let's suppose we somehow fix discovery Well, I clients essentially they connect with full nodes full servers Which actually have all the blocks and instead of downloading the blocks and processing them They are just randomly picking the block headers Which are kind of is a nice enough proof that the chain is valid But they do not download the state and then when we start missed or any other depth that we really care about then The light client will actually fetch the balance for example if I have three accounts in mist Then every time a new block arrives mist will actually fetch those balances for those three Accounts I care about and again, this seems like completely logical This is the most optimal way to reduce latency reduce bandwidth and just make all the traffic useful except Now all of a sudden the light server knows that I am This IP address is interested in that particular account. Oh And one more I think I want a bit ahead so two more things that I wanted to say about how the light clients work So one of them was that they are obviously synchronizing header and then retrieving data on demand only interesting data Another really interesting aspect of the light client that I think apart from protocol developers Most people are unaware of is that light clients cannot verify the state So they are verifying the proof of work, but they blindly trust that so every time you get a new block You just assume if the proof of work is valid then whatever state that block is associated with will be valid and this works because if you have a good connectivity to the network and You have many people sending you blocks then it's really hard for somebody to forge a block because yeah They could forge a block, but the entire the network the rest of the network won't accept it So after this after a few more blocks it will just get reorged out of your chain So nobody can really keep attacking you as long as your connectivity is valid So What happens? Yeah, I went a bit previously with the head but essentially with the light clients and on-demand retrieval The issue is that whenever if I have misconnected and it every Whenever I have an updated block it keeps requesting the same balance for the same account over and over and over and over again Then light servers will be able to create a statistically map out that this particular IP address is interested in one particular address And that's a problem because we back get we got back to the exact same issue and exact same behavior that we had with the discovery protocol Only now we don't have a Map a world map of moving IDs now We have a world map of moving Ethereum addresses and again Similarly to the discovery protocol this can be done exactly the same way publicly by everyone It's a bit more expensive because you need to run a light server. So at least you're helping the network Please if you want to attack it and do this and not the other one So how can how can we solve this and again? It's a it's a hard Question to answer because it's not it's kind of this is how the protocol was meant to be this is the optimization so we're trying to somehow obfuscate the optimization and One of them is again similarly to how we try to hide our IP address from ether scan or from centralized the restful services We can try to do the same thing and maybe run light clients up over Tor but What happens if I have an embedded device? So about two weeks ago not sorry two years ago We already had Raspberry Pi zeros as light clients on main net Somebody even had these little Intel embedded chips put on main net and they were really happy that they could They can use the light client But these devices aren't really compatible with Tor so Tor has a huge overhead Cryptographic overhead That's one problem. The other problem is that if we start adding this these IP mixers into the soup and trying to just create Trying to just tunnel the light client traffic all over the place the issue is that you will have connectivity issues So again, maybe you will find less peers Maybe the latency will go up the bandwidth will go down you will keep dropping peers But the issue here is that as I said previously light clients depend on relatively good connectivity if you have an obfuscation layer that essentially kills your connectivity then all of a sudden you Kind of raise your vulnerability against these kinds of eclipse attacks. So again, although It seems that there would be a simple enough solution in practice. It's not so trivial so with that said I Think this was the first talk that I did where I kind of raised more questions and talks than solutions but I still I figure that I would like I would like to Formulate a few takeaways for for you just so that you don't don't leave with with the concept that okay theorem sucks And essentially the the three things that I would really like to highlight as as summary of this talk is that Although people don't feel like full nodes full nodes are actually the best Anonymizers in the theorem ecosystem if you are running a full node then nobody knows what data you're interested in You can poke in whatever contract you want You can check your balance as many times per second as you want and nobody from the outside will know that you're actually doing it Secondly Privacy on Ethereum is bad really really bad And that but this doesn't mean that it's an impossible task to solve So there have been existing projects at least two major ones the tour network the onion routing network I mean and I2p and both these networks try to solve this exact problem of how to anonymize data So that you don't reveal too much about yourself Now I'm not saying that we should all of a sudden put a theorem on top of these networks These networks have a very very broad scope scope, and it might be too much But nonetheless there have been 20 years of research going it going into how to do this properly So let's try to at least learn from their their results and try to fix it and Last but not least I've seen it many times in the community And I see it many times even with myself that as a developer it's all too easy to just say that while the users are doing it wrong and for example, it would be nice to say that well just you store figure it out and That's a problem because most users don't know what Tori's and most users who probably never in their lives you store even though that would be the obvious solution and it's kind of up us up to us to as Dapp and platform developers to figure it out and to fix it and We don't really want to fix it to protect the users from Not only from external attacks. I think it's really important to also highlight that we want to protect users from ourselves too because I Just to give you a really nice example. I don't think Facebook was created to gather user data It wasn't created to to abuse elections. That one just kind of happened when there was too much So with that, I'd like to conclude my talk. So thank you very much And I guess we have a few Possible questions every a few minutes for questions if I may has any Hey, could you talk more about helpful nodes are the most powerful anonymizers the network with that takeaway So essentially I All these metadata leaks are possible because you are doing something in you're optimizing your traffic in the network The only way to optimize your traffic is to request only data that interest you So every time you are only requesting data that interests you it means that the network itself knows that that particular Data for some reason is interesting for that particular IP address and this thing you can completely hide with a full node Because a full node essentially gets everything Yes, so that's again another interesting question that while we are just reading the chain This completely solves it a full node, but when we are trying to transact with the chain then Then if for example somebody manages if I'm Let's say I run a thousand full nodes and I start monitoring where certain transactions are coming from then that can be used to Eventually statistically hone in on a certain IP address that this IP address is usually originating some I some transaction and Here here my solution as I think for example IP FS is also looking into exactly the same problem and Their solution or let's suppose they're their ideas are all revolve around the I2P protocol because it's essentially a message-based protocol that does Tries to somehow mix the transaction a bit in the network before popping it out somewhere and as far as I know Monero does the same thing for the same reasons So that's that's possibly that's again something that many people before us did and people are doing concurrently with us So it's not an unsolvable research problem. It's just an engineering problem that we have to pay attention to Thanks for your talk. It was really good I think probably most people in this room have done all the bad things that you mentioned Throughout the talk. So is there anything you recommend aside from running a full node to either like go forward or Perhaps anonymize things with these services more Yeah Well, I guess my recommendation I can only recommend what I already recommended within the within the slides that If you are running if you are accessing block explorers I think browser extensions are the single most important thing if you if you are really paranoid then tour network is also an essential thing So that with it with the tour network. You can completely protect yourself against Brow typical browser based that Daps from tracking you if you if you are actually willing to run a light client That's a hard problem. That's up to the client developers Would you recommend us like general rule to? Regularly clean out the node key So if as long as your server is stable the note key is fine. I mean Yeah, so if you are on the laptop I think that that would be the first no-brainer solution that we have to do is we have to keep every so for example If you run a light client every time you run a light client you we should reset the note because we should use a formal note keys Even even better. I would suggest that maybe even for full nodes We can use a formal note keys and just If we restart them, yeah, maybe it will have a bit harder time connecting to the network for the first couple minutes So I think apart from boot knows no nothing in the network really requires stable keys Okay, can we please another round of applause for Peter? Thank you very much