 Hi, I'm Olivier Levinin. I work at ANSI, the French Network and Information Security Agency And I'm going to present to you some work we did on SSL and TLS data sets analysis So first just the standard Scheme up to to figure to show you what SLTS is you already know that it's protocol that aims at providing server authentication and optionally client authentication and data confidentiality and integrity So it's a basic block of Internet security today. We are going to look specifically at the first messages of the Of the protocol that is the client hello where the client initiates the connection and and proposes the different cryptographic features and that it supports and then the answer of the server that is the server hello where The choices are made which cycles which we will use which version and where the certificate presents itself with certificate chains with the message that is called certificate and that we will look at afterwards so there are traditionally three types of methodologies you either can do a full IPv4 scan which Allows you to see a lot of things you also can do a scan based on a list of the main names with which allows you to look at the sni extension and look at Virtual hosting and if you have access to a lot of willing users you can do passive observation. So in our case we Launched several full IPv4 scans in 2010 11 and 14 And what's interesting is that with the two first methods you can choose how you want to stimulate the the servers Okay, so This was about data collection and now I'm going to talk to you about data analysis because the motivation for the Concert tool kit that I'm going to present is that when we first worked on these data we used Parsifal homemade password generator so to pass the answers and the Certificates and mostly undocumented and unversioned various scripts Which was cool to to to publish an article but three years later when I wanted to write my phd manuscript I had I wanted to do some work on more recent data and so I had some problems with the scripts, but also with Criteria which had evolved Because the for example the notion of what was a week sweet Had a little evolved between 2012 and 2015 So we also wanted to include a new external datasets like the one you can get from scans that IO so the the the way we built a concert or is a way to go back from the raw data and essentially metadata like the client hello you used if you get it and Automate all the process so the process is to prepare the context the context is injecting the Stimulus you used the client hello you used if you have it and the trust store and the certificate restore you want to use to say which Host is considered to be trusted and we used NSS in our examples Then we have to answer the to inject the answers and to pass them and to extract the certificate to build the To pass the certificate and to build all the possible Chains from the certificate messages and then build produce some some statistics Okay, just To to to look at the data. It's always interesting to to to know that as Celsius data is not always that clean what you get from a Measurement campaign can surprise you and so if you send a client. Hello proposing to Cypher switch the blue one and the red the green one You expect to get either the blue one the green one or another which would mean that the server doesn't understand what you Wanted to negotiate, but sometimes you find something else Sadly this can be explained and what's even worse than the I don't know that the RSA that the RC for and if I is Some servers are willing to negotiate the new surface with which the problem is is what does it mean because it's not specified We also had a case where our parcels would choke on is a several hello missing two bites That is if you look at the message It's just a good message, but you have two missing bites Which might make you look at your pastor and think is it me or it is the message that's incorrect so that's why the passing phase in interesting and we were we now are confident that our Tool using pass if all is robust and the other thing that we find interesting in this phase is to Keep the metadata and in particular the use stimulus to know which When you encounter inconsistency Okay, so just first simple results about what we can get so this is about data coming from our campaigns 2011 2014 and the other three ones so the full IPv4 for from 2015 is from scans.io and the two regarding Tappalex and one million is also from Scans.io so you can see that TLS 1.2 is getting some attention and Is is finally being deployed Just don't forget that it was a specification that was published in 2008 So we could have better results, but at least it's it's starting to to to be used and now I'm going to speak mostly about certificates, so I Told you that the the when the server answers he presents a certificate message that contains the list of certificates It should be in order that is first the certificate of the server and then each Certificate authority that will sign the previous certificate In reality, you will find unordered messages. You will have find repeated certificates. We will also have Use the certificates and sometimes missing certificates. What the EFF called the translate Chains in their in their first Publication in 2010 or 11 Okay, what's interesting is that TLS 1.3 relaxes this constraint on the order, which is a good thing because if you look at the Quality of certificate chain you see that less and less Chains certificate messages are in fact RFC compliant because they are not necessarily and ordered for various reasons So, okay, so that was just another result from concerning certificates Here is another thing you you can get with concerto. It's an example of Certificate message that is not trivial. I will explain first what each stuff is Gray boxes are what certificates were sent by the server So the server sent all these certificates and in particular it's this one is the server satiate and We also have three We also consider three certificate as trust roots. So this these one were sent we consider them as trust roots and What concerto gives you is that it can build a lot of possible Certificate chains. So this one is a good one and is Essentially the better one you can get but it's interesting to see that in this case You have a lot of certificates that have nothing to do with the server certificate. So you can find stuff very Diverse in the the certificate messages that are sent To build all the possible Certificate chains possible. In fact, it's a little complex to do that because you have two two two problems that will Lead to commit material explosion The first one is x5.0 and v1 certificates because as they have they do not have any extensions You don't know if they are certificate authorities and up until recently They could be considered as certificate authorities. So if you have let's say Appliances that all Generates the same certificate same subject same issuer, but they just differed with the Public key you should in practice try all the possible combinations if you want to be exhaustive and in fact all the signature will fail Signature checks will fail. So for this reason, we didn't want to look into x5.0 and v1 Certificates too much. So we only considered Them as certificate authorities when they were in our trust store, which is what browsers do The other thing is crazy cross certification. That is you have a lot of Certificate authorities that do cross certification and sometimes you have a new tool Cross-certification which would lead to cycles, which is not a problem per se but what when you add the fact that some Authorities will emit multiple multiple certificates with the same public key same subject, but with different Validated dates you can have a lot of possibilities to choose from when you can when you want to build all the possible Certificate chains. So there was different possibilities, but we chose to limit the number of certificates we took Outside of the certificate message when trying to build all the certificate chains I have still some figures about Certificates, I will not speak about all of them, but this one is interesting It was in the EFF data from 2010 some server would send you a very very large certificate message including all its certificate trust or apparently And this is to answer some questions we had on Wednesday about the Use today's use of MD5 and X509 v1. So I ran some I competed some stuff with concerto yesterday on top X1 million Recent scan from scans.io. So we still have a lot of sha1 certificates. We also have several MD5 certificates and This is interesting. Also. We have back from the future some X509 v4 certificates Okay Okay, this is about server behavior in our campaigns, we had we used multiple stimuli that is we probed the same server with different Stimuli which allows us to to grasp something into the server behavior Which allows us to to look at the insurance on the full IPv4 Space and also afterwards to look at the SSLV2 support which led us to compute lower bar concerning the drowned vulnerability Because we had the 40 per percent of all HTTPS servers in our campaign from 2014 that would answer with the same certificate to SSLV2 clientele Okay, this is just a way to say that concerto is today's mostly stupid. It's just a bunch of CSV tables and the the real smart parts are in the parsers and in the way we build the These certificate chains and so we we would like to to improve Some of the stuff like the back ends more reports reporting tools and Include all the data that that sources like certificate transparency for example Okay, in conclusion if we want to analyze SLTLS data We believe that it's important to have good quality data, which I believe we We have today or and we can reproduce easily today with the tools at hand and we should use methodologies to Allow for reproducible analysis and that's where we try to to to to propose something with concerto the document then Because the source code is online in github there is a little documentation and don't hesitate to to send me a mail if you're interested in this Thank you for attention We have time for one question Be better than zero questions Oh now you got a race On the SSLV2 slide, I know there's a lot of clients out there who will advertise We'll send SSLV2 formatted client. Hello, even though they don't support SSLV2. They actually want to negotiate SSLV3 The servers here where they is a statistic here that they were accepting those Formatted SSLV2 client. Hello's and negotiating up something better or they actually negotiating SSLV2 We had to we had to stimulate with a selfie to one Which was just the way to engage the conversation and which would eventually lead to a city tree or TLS and one Which was pure SSLV2 and we looked when I say 40% that was servers that will Accepting to Yes, they were accepting and answering with an SSLV2 server. Hello, so We didn't lead the handshake to the to its end because we only looked at the first server flight But yes, it was a real server. Hello