 Hey everyone, thank you for coming to the last session of the day in the best room room 2 Hopefully you're here for the right session. So this is Drupal and the open web in the Australian government so the dot gov dot au space and This conference talk was something I've been wanting to do for a while. I just needed a lot of work So when the talk was accepted, I was like, yeah a bit about me technical account manager and lead a team of excellent people as well and What that basically means is I help to solve problems for large customers with, you know, their problems aren't tiny bite-sized problems They're you know, just slightly bigger or more fun So that keeps me busy You can find me on Twitter and triple org and everywhere else is WIFM and Yeah, I'm from Wellington, New Zealand and fun fact we got puke goes and over here you got bin chickens. So That's a little bit of trivia So here's my problem statement. So I was like I want to measure popularity Which is this tricky There's not exactly an easy way to do this. So And I sure as anything didn't want to pay anyone any money, you know, that's doesn't sound very fun And you because you can probably go pay some rush like a bunch of money and probably get a report That's probably wrong immediately So I wanted something a bit different. I wanted to be open source and packable and you know I wanted to see under the hood and as it turns out this is pretty cool. So Hands up if you've heard of WAP Eliza before and puppeteer So you cover your ears if you enter PHP, but they're all no JS Let's just say I'm not the best person when it comes to writing JavaScript, I wouldn't hire me for that But I was able to hack together a couple of things and get this working Problem number one we is a list can someone just show me the list anyone Yeah, it's tricky There's a few places you can grab it from some more no, you know interesting than others So there's a GitHub repo with them. There's a let's encrypt another fun place to look for them Security trails, there's all sorts of fun things in there Or you could just start from a seed site like Australia dot gov and Directory dot gov But the main thing a list doesn't solve is importance of those sites Because not every site is created equal and I didn't really mean to pick on the Mount Alexander shy council But they're not quite as important as health.gov. Do you so I wanted a way to quantify that to measure it So this actually came about because of Toby. I don't think he's here but shout out to Toby because he was like hey, I found this thing that has got this thing called page rank and you can download the top 10 million sites on the internet and And Yeah, and then I just used a grep and then blood butter being you've got 5,700 and something government sites And then next question is what the heck is page rank so I mean for those that are a bit older in the room you probably Had the toolbar installed and you know, that was what you did back in the day was You tried to get a bigger number Um, it was made by google And you know all you really need to know is zero means Not that important and 10 means Wikipedia basically And it's logarithmic so think earthquakes record scales like earthquake of size seven is you know more Yeah, a lot more powerful than six and then five and so on Um, a probably a big problem though, and there's probably a few people in the room impacted by this Um is mobs. So just when you think you've got it right you've got it wrong And I didn't know websites could do this but websites close They're like shops This website is closed you can't go here But you can go to these three other websites that were spawned in this place I think it's just procreated Dizzy Doesn't exist Instead you've got some other sites So and these sites just hang around and yeah, again this complicates the whole process on measuring popularity so And let's just say I had some fun times crawling I managed to work out ways to Allow me to do this And uh, you can chat to me afterwards if you want to work out what those things are Um, but for those in the room is when we sort of see some pretty graphs, which honestly I got told I pretty much just talk and make pretty graphs for a living. So Here I am Okay, so caveats, uh, this is based on September 22 data It is a snapshot. It's not perfect. It doesn't have all your new shiny sites You've just launched it takes a while for page rank to Start working because it's based on incoming links based on your site So if you've got a brand new website greenfield, probably not gonna be here But I decided to do it by state to begin with so it would lead up to Yeah, the bigger graph so This is the weighted score of page rank for the sites theenden.vic.gov.au And I've taken some liberties Let's just say so Anyone here from DPC with ripple and tide I've counted your site says Drupal Because this is the cms technically under the hood Um And when it says unknown that doesn't mean like it could mean anything but It could mean they've custom written it. I could mean that wappelizer hasn't got the right sniffs in place so I hope I hope to try and reduce that over time And there's a typo there with squizz, but that's all good So what I want you to kind of look at here is maybe the proportion And how that changes because it's kind of quite interesting here And there's a few cms's I've never heard of like open cities And I learned that a whole bunch of cms's keep getting traded like pokemon cards, so It used to be called lotus notes. It's not called lotus notes anymore team. It's called hcl notes in case you're wondering yeah Because hcl bought all of ibm products like as you do New south wales And a bit different here, but Drupal's still In the lead squizz 16 percent higher amount of adobe and for some reason there's a lot of schools in new south wales They need a dxp So there you go And a pretty long tail actually here South australia and this is things get kind of interesting This graph looks a bit different from new south wales. Isn't it fun states and territories? Yeah, so squizz here 37 percent Drupal is way down 6.9 And actually a healthy craft community craft is another quasi open source php cms wa Anyone from wa in the house? You do things very differently in wa 53 percent of you don't know running anything normal 18 percent of you running Drupal Yeah, and then a fairly long tail Including dot net nook That's a little blouse from the past Joomla can't kill it Tassie So squizz here also leading the pack and woodpress And unknown and sharepoint And Drupal barely there Queensland Spicy topic in brisbane So, yep 53 percent. I don't know but squizz 14 wordpress 10 And yeah sanity, which is a kind of like contentful like a cloud based Content as a service thing and I put active for completeness, but you know, it's a pretty much all squizz and I don't think they're supposed to be 37 percent. So yeah, I'll fix that up in post Don't worry about that Um, but yeah dream weaver you just spot that there if you got the keen eye ACT Good stuff You can't you can't be held back Northern Territory And uh, yeah here. We just see a bit of a smattering, but again squizz definitely in the clear So this is probably more interesting. So this is excluding state based sites And including the dot go there you kind of top level And I thought this is quite neat and probably largely due to the woman sitting in the second row, but um We've got 41 percent sitting with Drupal And uh squizz and share point, you know roughly coming up rear And uh, this is every single site kind of bashed together in one super graph And there's an extremely long tail here, but um Yeah, these are the numbers So top 10 sites for those that like stats Health.gov came out at number one bomb Or the bureau of meteorology As I read in the news this morning ABS ATO DFAT Yeah, etc And you can see the page rank there and the score is the Translation to a number So that's how if you take five to the power of five dot seven three You get ten thousand So that's how you can compare and graph what you've just seen So even though five dot seven three just looks a little bit higher than five dot five is actually not Okay, here's some fun stuff So I thought the key takeaway was definitely Drupal powers about a quarter Of the websites that you use Yeah, so that's pretty cool And like maybe gives you some impacts So if you are writing modules and themes and you know Doing all that work, you now know kind of where it's actually going Squiz actually is a top contender 12.1 percent And you can see in four states slash territories. I've got a pretty clear mandate to adopt more bonus graph And this is Drupal by version where I can find the version and I wasn't too Over the top on this. So this is just major version only but Yeah Roughly a bunch more than half on dribble nine, which is pretty pretty good Drupal seven can't kill it either Drupal eight not in support from the stock 4.6 percent of use And Breaking the CMS landscape down by open source versus proprietary And you can come chat to me if you want to know how that's happened, but Um, if if you could download the source code And run it yourself Like and they had a license that indicated you could you know Modify it I kind of that is open source And if you had to pay someone money or there's a big contact sales link on your website Then I can't as proprietary But yeah, the cool thing is that 58 percent of all the CMSs identified Were open source Other fun and scary things. Um, there's 129 sites found with no TLS And uh, yeah, the top one is um bomb good times Yeah, and uh major projects dot planning didn't do very much planning This is literally bomb on hdps you go here and you get a frowny face I think you get redirected back to hdp nice and safe This is uh disaster dot townsville dot queens land It is a disaster. Well, I'm done. They're not even listening or 4th or 4th or 3rd. So how secure is that? Another little fun thing I found was there's heaps of sub-domains out there that people would just like I've run out And then someone's like, you know what? They got numbers too, bro. So Had a one I've run out Two So, yeah, we found 19 sub-domains here And there's um, dub dub dub nine to help the go to you Where's the other eight? This is dub dub dub nine by the way Hello And 15 sites still run Dreamweaver Goblet say you just can't kill it If it works, you know So taking this further in the future Obviously we can go crawl in New Zealand and see what that looks like And uh, you know see how that works out But I really want to publish this data on a quarterly basis So I think having the points in time data is kind of fun and cute and all But I want to see trends over time. I want to see if northern territory can change the ways or you know, whether we just give up, you know So, yeah, I haven't done this yet, but I will be making a site. So if you've got a cool name for the domain You know, because that's where it starts from. So come tell me we'll go buy one and then we'll make it happen Um to get wappalizer to do it's job better I got a few prs upstreamed and I think their list is not complete There's a few more But yeah, just it's pretty cool. You can find a library out there that someone's already started and you just help make it better and Yeah, really cool And uh, that's me sure sweet If you've got any questions Heckels now's your time I might be guilty to some of that You are My question was well firstly, I will talk to help about I follow up on that one services Australia we've been represented in that list of the main sites And my expectation would mean they would be very high up there They're just out of the top 10, but yeah, it's because of MOGs like the human services and services Australia in there Like you can't win like but I was like, um, what do I do? Do I add them together? So I treat them as independent sites. Um It's very useful for me to know which states to go and visit I So lapelizer determines By heaps of things so it runs puppeteer. So it actually loads the site in a chrome browser Executes all the javascript, etc And you can write the snaps to read anything from headers to meta tags to rejects over the actual body content to a javascript variable to a script source to a cookie The world's your oyster really so for Drupal there's several not just The meta generator, but I think give away typically is um Drupal dot settings. Um, so Yeah, it's just you can't get the major version very easily if you don't have the meta tag Yeah Yeah, yeah, yeah, I was gonna write a blog post about this, but I'm I'm lazy. So, um, yeah I'll write one So I actually want to I've got the full csv of the the database so you can um Have a look and find your sites in there and you know, just make sure it's Reporting correctly If it's reporting incorrectly, let me know write a couple more prs get it fixed up make the data better Oh, there's just so much like Guff like everywhere like Yeah, it was I actually started writing this from a pure crawler Like I started on australia.gov and I was going to count all the links between all the sites and I was going to implement my own page rank and I got to 1200 sites and I had 8000 or something left to crawl and as I I don't think there's ever going to end So like the data is not perfect, but Yeah, I think just at least having the 10 million list It gives you focus and gives you clarity and you at least have a fairly well-defined box to kind of operate with them I think if I was to go down the crawling route and identify my own list, uh, I'd be Still probably doing that and not here. Yeah but yeah, and just There's so many cms out there, you know, like and I I think I wrote integration for mod x and The outcome whatever that is and you know a few others which I was just kind of interested like what are these things and Yeah, so it's just a huge kind of smorgasbord of technologies Yep Yep, so through the power of grip we can grip that 10 million domain list There's some magic to turn that into Sites because it just gives you kind of like the top level domain. So you need to do some Testing to see if like the DNS is actually still real Which in some cases it wasn't And then like test for www or apex domains htps htp and eventually Crawl there with Wappalizer and You know see how you go Yeah, let's just say that the crawling Can be interesting Now if someone does do node js development and knows how to keep memory leaks under control Yeah, come talk to me Because I actually killed one of the nodes running in kubernetes because it used 23 gigs of ram or something, you know Whoops Cool, okay. Well come and see me afterwards. You won't have a private chat, but yeah, thank you all for coming