 Now software is a service a good example of this would be Google Drive actually what I'm using to do this presentation for you I just I just log into my web browser and access it. Everything's there. I really don't have to worry about anything Google takes care of it all for me. I'm just there to give it my input to get the value out of the software. Oh Here we go. I put in a slide where I go through some of these examples So it's a bit dated because docs and spreadsheets beta doesn't exist. You'll have to forgive me salesforce.com net suite IBM Lotus live all good examples platform as a service We have Google App Engine, which at the time they've renamed this to to be a container platform In infrastructure as a service we see Amazon AWS VMware now Actually, so this is a bad slide. I apologize because it kind of lumps everything and oversimplifies things because AWS has elements of all three of these categories So at this point we could look at We could look at how AWS kind of came to be AWS being the first one to get to the market they can they control about I'd say 60 to 70 percent of the market share. They're only main competitor right now that's eating up market share is Azure Which isn't surprising because Microsoft, you know is everywhere and their their cloud is you know directly integrated with their software offering Now at first AWS came out with this idea that hey, you know what we have a lot of extra capacity They notice like they buy all these servers around Christmas time and they don't use 80% of the capacity come two months later. So the the bright engineers at AWS said at Amazon Sorry, I said, hey, you know what? We should figure out a way to provision some of our you know private resources Outbound let's take the internal IT practices that we've developed and start monetizing it and profiting off it And they could do it. They're at scale. They're they're an international company. So at first the offerings are very limited we only had EC2 EBS and S3 and EC2 if you're not familiar with it is their virtual machine platform and just just an interesting tidbit When they first came out you couldn't actually restart an EC2 instance. So you turn it on Put your application whatever you want on it. If it turns off it's gone So it's now it's they put the onus on you as the administrator as the user to kind of go through and make sure you Can protect your data now that we've come a long way since then you could restart EC2 instances and Amazon has over 800 offerings at this point Here's just some of the some of the applications AWS has and you'll forgive me that you can't really can't really read this but From the from the management console, which you can just access from the web You can go in right away And by the way, I don't feel if you haven't signed up for AWS You can sign up for there's a free tier like you don't have to pay ever if as long as you're using a certain amount If you're consuming a certain amount of resources Which is usually ideal for someone in like in school when you're just learning you want to develop a little bit and get your hands dirty You don't have to pay out of pocket, which is phenomenal. I myself use it quite a bit I still use the free tier. I've used it for years and even to this day since I do experiment quite a bit with some of the other Technologies I have about a monthly bill of like $10. So it's very it's very small for small price to pay for the amount of Technology I've access to but I mean these are just some of the services And I I'd love to go through all of them, but I just we don't have enough time But I encourage you to go on and online and check out Check out their services and Amazon does a great job of like teaching you a lot about them I just threw in this this slide in here just to kind of talk about some of these companies Which really wouldn't exist without the services that AWS provided I mean Netflix reddit imager reddit imager SoundCloud and Dropbox Especially were founded on AWS. They're they're cloud native websites that started and if it wasn't for Amazon I mean think of reddit reddit and imager alone They wouldn't be able to scale Dropbox was one of the first object store providers that took Amazon's s3 service and allowed made it a lot easier for folks just to upload and share photos without having to muck around with API's and Doing things in a programmatic way and we're getting to the point where more and more businesses and companies are using it And I mean it's not fair to just talk about AWS I love AWS, but there are other there are the competition There are other competitors and competition is very healthy in this space because they it forces everybody to innovate and when everyone innovates we win So we have other ones like Google Cloud Microsoft Azure like I said Google Cloud actually controls a I Think it's very small. It's like less than 10% Microsoft Azure is growing rapidly. I mean they went from like 12% I think a year or two ago almost 30% today and you can see that in newer versions of Microsoft Windows 2012 R2 and 2016 Azure's built in there I mean even if you use Windows 10 you log in you see like little one drive thing popping up there Microsoft wants you start using their cloud It's already there it's native to get other stuff you have to go to their websites and download if you want to get you know If you want to use your Google Drive you have to go and download their app Now this is just a small slide I threw in here just to kind of demonstrate the differences in pricing between these instance types You could see I mean this isn't really relevant until you get to the point where you're consuming so many resources That it's it's becoming very difficult and then you're looking at re-architecting your actual application But at this point you could see that when it comes down to it AWS Still is the cheapest when it comes to certain tiers But again, this is this is looking at what they list. There's their spot instances. There's You can have a reserved instance where you pay every month if you know you're gonna be using it So there's there's ways Amazon in fact actually as engineers will just work with you to reduce your AWS bill so there's there's quite a bit of complexity even to the billing aspect of this so This slide I want to kind of talk about the previous way of building an application, right so Let's say before you were an app developer you worked for a company the company said hey We have to develop this application You know you have to go you have to go to the guy either you manage the system yourself Which you probably hate because you don't like managing systems. You're a programmer So you go and you got to like install windows you got to do all these updates got install your database got to take backups and Then you do all this development and then at the top you get your result you get your business value This process is slow. It's cumbersome. It's difficult It's difficult to manage and really it kind of it only it makes it only easy for those who have the infrastructure to begin with What we have today with the cloud is Anybody anybody can start that amazing business anybody can start that amazing technology that they want to build They just need to focus on the development and they can consume everything else from the cloud as a service Everything is as a service you can serve you can consume database as a service You can do Hadoop could do big data as a service you can set up a messaging queue You don't have to worry about installing rabbit You don't have to worry about Linux dependencies go ahead right away start consuming it orchestration monitoring and at the bottom I mean this hasn't gone anywhere. This is this is still there But your cloud provider is is doing it all for you. So what you have here is you have an evolution of scale You have the capability now to rapidly prototype an idea that you have And that in my opinion is really cool The fact that you everyone now is empowered they have the power of the world's greatest data centers in the smartest minds that are backing them and Once things take off at that point you can look back and maybe grow your development team But anybody at this point can go ahead and start consuming things You don't have to muck around with all of those technologies. I was just describing I At some point during the offering you you look at a basic offering Which it is pretty much still stuck into this to this day like I said a lot of a lot of businesses still don't really Grasp the concept of cloud. They still look at things from a physical server kind of idea Eventually as we kind of continue to move up this this spectrum to the point where we can we can have you know We basically start with VMware and then maybe we move towards our own private cloud Which could it could be proprietary at some point. We want to actually have something on-prem Which allows us to save money and to to have the same capabilities as we were in the plot in the public cloud Yeah, actually, I just went I just went through this These points with you guys you can you can rapidly prototype any application So, um After you know after the software is taken off if you build that you know amazing web app and you're noticing This is what it always ends up looking like there's a marker up here But you kind of start small right maybe you get a few users and all of a sudden something happens Your software takes off people maybe someone wrote a blog about it or who knows you're mentioning the news All of a sudden you're getting all these users, but now from here you're like, oh my god My cost is just it just quadrupled overnight. You're panicking at that point. You can do a few things You could look at developing more of a hybrid strategy where you move some of those things on-prem or You can actually go in and you know change the way your application works Consume other services, maybe consume other services a bit less put up more load balancers, etc Stratascale the firm I work for we obviously sell we we maintain and we sell That that's sort of infrastructure for you to have a hybrid cloud Kind of India hybrid cloud for yourself a lot of a lot of people do that I mean they'll they'll come to us and say hey listen my app took off my developers were rapidly prototyping our application They we don't have time to go back and optimize our code and now our AWS expenses like have quadrupled in the last three months So we say okay, that's fine. We'll move that on to Stratascale We'll move that on to our platform for now and then we'll figure out a way to kind of you know make your Developers consume less and less and less. I mean that's kind of a unfortunate side effect of this All the frameworks and everything that are out there. You have developers who write very sloppy code that consume way too much memory consume Way too many CPU cycles, maybe too much disk space, and they don't care. They're like hey, it works. Let's keep going, you know Betas customers end up doing your beta beta testers in today's day, age But again, you know at the end of the day you still have to go back and optimize it once you get to scale Because because you know poor code is not going to do very well There's now the old way of scaling would be vertical scaling. So okay, you have your web server You're getting bombarded with traffic requests. It's very easy to go. Hey, let's just get a faster server So by call up Dell go buy some more CPUs. It's got some more RAM in this in this thing That that works It's not a good practice to fall into Because really you're not considering the actual core of the problem Which is you need to start considering a horizontal scale. So stop stop building a monolithic application It's works like a black box consider separating everything into a microservices kind of model where services communicate with each other Using a messaging queue what this allows you to do now is you can consider a horizontal scaling method so instead of Saying okay, we're getting more resources We need to go buy more RAM or shut the VM down add more RAM turn it back on instead of doing that You can say okay. It's fine. We'll just spawn more web servers add those IP addresses to our load balancer and go and You know at the end of the day the load balancer is going to be taken care of We'll be utilizing those extra resources that you spawn for it And also allows you to kind of decouple your application into separate services that you can then manage or you can have teams manage So it's a lot more effective methodology to scaling your app And use an example of that actually I just put a little diagram up here So here we have and this is I think using Amazon's Elastic load balancer service So we have route 53 here, which is their DNS service and this can be my website comm requests are coming into ELB You can set up web services across both availability zones Amazon the way Amazon works is it's split into all these regions each region always has two availability zones Now the way Amazon will will do their maintenance is they won't migrate your virtual machines are your data for you They're too big and if your customer of Amazon unless you're like Unless you're unless you're Steve Jobs will be reincarnated himself They're probably not gonna listen to you and make any exceptions, but they'll do maintenance on one availability zone at a time They'll never bring down a whole region. That's ridiculous But the onus is on you now as the administrator as the cloud administrator to ensure that you have an ELB and elastic load balancer set up Providing uptime between both availability zones and like I said, you can configure application metrics like disk response time Say, okay, you know a visitor a user is coming to my site. It's too slow If we hit a if we hit a threshold of over 150 milliseconds spawn another instance spawn another instance keep spawning instances until that threshold comes down and That's that's basically how ELB works and you could you can configure other application metrics like disk latency CPU CPU Ready time etc, etc So, I mean we'll kind of jump into the emerging trends Which is which is actually very interesting and I myself am curious to see where this is gonna go One of those is AI as a service. I mean, we're seeing it now more and more and more These are our benevolent cloud providers as they may Like Facebook and Google are they obviously monitor our communication, right? What we're doing is actually kind of feeding there. We're training their AI and All these AI engineers are working together Because this is considered the next revolution in my opinion revolution computer science and I think in humanity in general Where where we can have computers start replacing human beings when it comes to doing these mundane tasks And that way we can focus on Innovating constantly continually creating now. I'm not gonna get into each one of these But Amazon allows you to go ahead and start consuming some of their own services that they have Spark is a very interesting one. It allows you to How's your process massive data sets along with EMR? Amazon machine learning is and also is a very interesting It's a very interesting product as well where you could feed it your own data And you can actually set up predictive patterns that Could reduce say, you know, you could you could analyze your users traffic the chats say they're having with your customer support text and you can actually Have a robot. I have a robotic tech that is trained in order to help them out And of course, there's other AI engines. These are like Apache MX net tensor flow torch. Tiano. These are all free open source There's there's quite a bit of Buzz and research in this in this field Another another trend I'm seeing a lot more of is people using VDI And VDI just stands for a virtual desktop infrastructure. So this whole time I've been talking about servers application servers But we ignore the fact that people still consume consume the consume all these services through a computer For a lot of people. This is this is difficult for them to to manage on scale Let's see you have 10 tens of thousands of employees. There's a lot of computers You got to buy and maintain it's a lot easier to consider buying like a thin client and allowing them to connect to your VDI infrastructure VMware and Citrix have been doing this for years already. VMware has the M or view Citrix has then desktop There's also application virtualization is where where they just deliver the application to you remotely Amazon WorkSpaces is an infrastructure where you can just consume a desktop literally just as a service So you can literally go to a website login You could you could set this up right now for yourself if you're curious You could literally just log in set up a little Windows VM that you can log in and out of you can connect through RDP you can connect to their website But really the the fun part is that when you start mixing it and melding with the other AWS and Technologies so for a lot of again depending on the size of your company and the security policies that you have you can't just have people Logging in and having that information put on the clear web So you start taking advantage of some of their some of their services Let's say you have your AD infrastructure which controls your your users and your Your your your IM services you can manage this through your own private VPC Which is basically a private network on AWS cloud you can connect everything through public endpoints and You can not this down to your own your own network So you can have a connection between the private on-prem systems and AWS managed systems another in another big innovation in the field which has again in the last two years I'm seeing a lot more and more of it is the network virtualization side, so With the traditional infrastructure as a service with VMware where we have people just coming in and virtualizing just their servers We're at the point now where we can virtualize routers firewalls. We can set up load balancers. I mean everything. We're doing in AWS Over here in terms of this VPC. It's a form of network virtualization But the big proponents of this technology are actually telecom companies because they're all trying to figure out how they can get more Gigabytes into our phones If you consider the the challenges that they have you can understand why this is continuously evolving the ability to provision rapidly Segmented networks is a very is a very strong feature So I mean for that said we can do we can decouple V lens firewalls load balancers We can forward packets. We basically it's at the point now that where that rack I just described the only thing you have to go out and buy now are servers if you want to set up your own cloud Buy some servers by a couple switches Those switches will connect to your modem and then from there You can have a virtual router take care of everything else virtual firewalls will take care of everything else for you And the one of the most popular products. Yeah is VM VMware's NSX NSX is supposed to be their new ESX And the reason why it's very it's getting a lot of ground. It's getting a lot of market shares because it's It works natively with people's existing VMware infrastructure I don't know if you guys have heard VMware is actually coming to AWS now as well You can now actually have VMware instances running on AWS And you have no idea how big of a deal is that for like major industries where they have no Intention of ever using AWS because they just invested a ton of money in training their infrastructure team just to use VMware But now they can leverage their existing VMware services on AWS and NSX can be there in the in the background Controlling things like their security policies where traffic should be flowing Ensuring that there's enough you can say you can you can rate limit certain VMs It's quite a bit of there's quite a bit of capability in this area again I could spend literally just hours talking about just network virtualization as many applications, but Yeah, no, this is this is also this is also very very big again, we started virtualizing servers Eight years ago that really took off people start to trust it and this right now people a lot of people are still hesitant to go This route because they say you know what I really like my firewall it works. I could look at it It blinks I could log into it. I don't know if I wanted to start virtualizing that too But once once you kind of start Showing them what's what they're capable of doing how they can have a completely separated isolated development environment without having to go And actually buy more hardware. That's very powerful, right? I mean, yeah, I could go on forever to me the cloud is a very fascinating topic and It's gonna continue evolve and we're gonna seal that again that scale of evolution is continuously shrinking I mean even a good example is this is just just our phones themselves I mean look at look at a phone built today from like six years ago It's it's ten times better, but does that make sense should it be six times better? No, the scale of evolution is shrinking and shrinking and shrinking. We're seeing applications coming out We're seeing so many services coming out that developers are like scratching their head going a lot of developers who go to AWS Don't even bother with half of those services. I showed you they just go ahead and they spawn in EC2 instance And they start installing their MySQL database on it or their Apache web server Because that's the way they've thought for years But I mean you guys are in school. I want you to start thinking cloud natively instead of spawning You don't even need EC2. You don't need a virtual machine. You can consume Services like AWS Lambda where it's serverless. You just give them your pro you just give them your code your Python code Any code you want and they will take care of Amazon will just run it for you They will distribute it across their own their own cluster Again, this allows you to focus on providing that value instead of that I'm gonna focus on being a IT guy you can focus on being a Entrepreneur an inventor a creator an artist So that's it. That's it for me. I tried to keep it short. I hope I didn't go over because Peter where Peter's gonna give a presentation after it's You're welcome. I think this is gonna be available online and you're always welcome to reach out to me. I love answering questions I love talking about this. I'm sure you could tell And yeah, I just want to thank you guys appreciate it. I think we'll have you want to do QA now or Yeah, sure if you guys have quite like I can take some questions if anybody has any Many security challenges cloud computing The most challenging Because I'm cryptographer and we work on the projects related to cloud services So we imagine some of them, but maybe you can Tell us because as I understand the major problem for the cloud In the cloud you keep all everything in cryptic because you don't trust Everything and then but when you want to manipulate with your files Like this database in cryptic sound so for so how what kind of problems you think are major problems here? I think I think it's a good question And what all what ends up happening is when a lot of people just move to the cloud They actually don't even consider the fact that how they're networking is how they're networking works at all You know if the applications themselves are Like I mean hopefully God willing that they are communicating amongst each other using an encrypted kind of methodology some sort of SSL put in there But most of the time they're not I mean unfortunately That's that's the case especially when you're doing this rapid prototyping of an application So what ends up happening is a lot of people don't understand that they can provision their own internal networks that are completely internal to the cloud So the tools are there for you to make things secure Again the onus is on you to know how to use them because it's unlike what you would be doing in your own data center because you can't see or Touch or feel anything. It's all conceptual. It's all presented through through their through their their dashboard Now the second part is not so much related to the actual security like the data integrity or having making sure nobody's Snooping on you but rather the fact that now the cloud provider you just gave all your application You just gave your business over to a cloud provider I mean who who is familiar with like Amazon Prime you guys have heard of Amazon Prime? So I mean I have Amazon Prime. I use it quite a bit back home. It's got you know two-day shipping That's why I bought it one of the things I found is they have Prime video and they have a Prime music service now Spotify Netflix Dropbox, they all use AWS. What are they able to AWS go do? They literally went and copied their most successful customers businesses and Now they are providing it so Netflix is wondering why did we come to you guys? Like why did we just give you all of our application logic because you even though they can't prove like how can they prove it? If you give me your files if you give me all of your most private photos on my hard drive And I say I promise I will never look at it But then next next week you see me at some party wearing like the exact same shirt That you know I saw a photo of your be wondering. Is there any chance you look through my files? Oh, they might be. Yeah, I don't know So that's that's the other challenge which is very difficult And that's why that's also why companies like stratoscale exists because customers want to consume things cloud natively But they don't trust Amazon just period It's just they don't trust having their data off site in a different data center So that's where the hybrid cloud comes in Where you can keep everything on premises and but consume maybe the CPU and memory of AWS We'll just take one more and then we'll get to you Yeah, so the best thing is keep the your files encrypted in the ABC for example, right? So ABC can look in them Yeah It's not a problem, but the problem is then when you want to manipulate with your files It's only the problem because you want to do some computations But if complete if your file is encrypted higher than these computations. Oh, there's an overhead right now You'll understand this coming from a cryptography background Like the private key is if you want to do any kind of computation or use your encrypted files in the cloud You have to provide that private key, right? So your options are you can pull your file Like the theory is no matter what in the end that private key is somewhere in memory on a server So that's where there's that kind of gray area For a lot of for many purposes Nobody's concerned about security of their like website like it depends on what kind of data you're It depends on what kind of data you're handling if we're talking about like I just use tiny URL Yeah, a lot of the data out there isn't Yeah, there is Yeah And they don't update it And you want any other questions for Aries? Okay, sure Awesome, so I'm gonna talk about ZFS. We talked about all this data. It has to live somewhere It's stored somewhere in the end no matter what unless we're talking about AWS 10 years ago where you would restart your machine and lose everything but So a Little bit about me. I've been doing freelance system administration for a long time and I went to school for a bachelor of applied information science majoring in information systems security It's quite a mouthful How many people are familiar with VMware? I Know we've talked about it a lot. I worked for VMware for four years They sent me to VMworld and I oh it was Really, I didn't know that no, I didn't The emmer is so large there's so many different departments that I could only be familiar with so many of them But I worked out of actually Aries and I both worked out of Burlington, Ontario Good good fantastic and VMware they have so many different Things that they're growing in there's so many different fields that they are attacking head-on. So it's very exciting That was basically designing and fixing thousands of customer environments and in September 2013 I started my own company called secure information systems where I'm working with dozens of environments now so we're going to talk about file systems and Have you guys taken any security courses? yet in school what we learned back home about security is It's a triangle You have confidentiality integrity and availability Yeah, so you're dealing with confidentiality Yeah Availability not so much Yep What most people don't really think about is the availability aspect of security you know making sure that your resources are there when you need them that they are intact and available and just ready to go and That's where file systems are very important. I mean you can't store anything without a file system. It basically translates Your meaningful data your your photos your passwords your databases into ones and zeros so we can store it and get it later They provide Organization, you know a hierarchy a way to search a way to link to things they provide you with access security Ownership who's allowed to see it who isn't and consistency if I take a memory card out of my phone. I Should expect that the same data will be the same on another phone You'd hope so that's where the file systems come in So they're optimized for what media they're being stored on so like tapes very old technology still in use because of Cost per terabyte is very low, but there's a file system specifically for storing something on a tape Optical like CDs. There's a specific type of file system for that Clustering and transactional file systems where you're sharing You're distributing something. There's a very specific way to access that kind of data And I'm gonna focus more on random access. This is what we're more familiar with you know an SSD a hard drive a memory card It's random access. You can just pick a file and it's ready to go whereas on a tape It's got a fast forward find the spot read it in maybe go to a different spot on the tape read that in not very random. It takes too long So it seems like simple like Anyone can do it right? Well, no not really it's pretty complicated if you get into any kind of coding with file systems It's just There's just a lot to consider a lot to think about and it's not fun to work with as a programmer So I don't want to deal with that. I'm gonna use ZFS Where someone else has done all the thinking for me And there's a lot on the line like if you get it wrong, you're gonna lose your data That's not something you want you have a horror story recently a customer just There was a problem and they lost all their data and that's that's catastrophic that can shut a business down So a lot on the line There's a lot of limitations. I'll talk about some of those Typically you and I might not hit these limitations, but as things grow and as things scale in the cloud Limitations become a big factor a big element And a file system has to put up with a lot of problems like My files are on a little SD card in my phone But if there's any problem with that SD card or maybe the connection to the SD card or something wrong with the phone itself How am I gonna get my files? With hard drives, there's firmware on the drive I'll get into all the things that connect to that hard drive later, but it has to put up with a lot of problems So let's talk about some file systems Maybe before I go into it. Do you guys know like maybe what file system is on your laptop? Anyone? I mean it depends if you're running Linux, Apple, Windows Well, one of the first ones and you've probably seen this before is fat 16. It's kind of developed by Microsoft in coordination with a lot of other companies Here's some of the limitations your file name could only be eight characters a dot and then three characters and always uppercase Two gigabytes for a max file. That's not bad for 1984 two gigabytes was the maximum size of your drive and I mean 1984 there was nothing two gigabytes in size, but It could only store 65,000 files and then you're done. That's it. It doesn't really scale very well so Come Windows 95 in 1996. We got fat 32 This one is insanely popular like most memory cards come fat 32 these days. Your maximum size is two terabytes Okay, that's better and I like a lot of these limits are a lot more reasonable now. So You won't really hit big problems NTFS we've seen since Windows XP and old Windows NT machines If you're running a Windows machine Windows 10 Windows 7 you're running NTFS And I mean that's that's a big number 16 million terabytes. I Don't think we have any single drive that can do that yet But you know at least it's ready for the future Now on the other side of things we've got UFS This is kind of what turned into like Apple and Solaris and free BSD and It People back home would understand this FFS It's an acronym for for fuck sakes So they changed the name for some reason, but that's again a very big file 8 million terabytes in 1977 Cool, you're ready for the future. Let's hang on to this one And these goals like everything that UFS was designed to be is still alive today like I said Apple is still using UFS and in the Mac OS Solaris machines and BSD machines are still using it probably not something you see Every day because it's more of a server. It's in the cloud thing that you don't have to think about Now this one's kind of interesting Minix FS was designed as like a learning file system So the programming is very simple and it's it was kind of a school project that actually caught on and people were very interested Because it helps you learn the computer science behind it very simple very easy to use but here's our limitations 64 megabytes and 14 character filenames Doesn't work so well for modern modern-day things, but this kind of turned into ext anyone use Linux Yeah, so you know ext and Linux go hand-in-hand and it's been that way for a very long time the first ext It's called extensible. So ext one came out of 92 It's not bad. That's that's fairly reasonable VFS is the whole concept of how everything is a file like you would mount another file system as a file like So that was the first time we kind of really saw this In the wild very cool feature, but not something you know if you're designing a file system You wouldn't think oh, I'm gonna put VFS in this It does have problems. That's why we don't see ext one like fragmentation where like files are broken up into a lot of different Scattered across the physical disc Mixing slower like deleting files clearing up iNodes was a problem. So your disc could kind of fill up and you wouldn't know why But ext two came along very shortly after so we went from 1992 one year later ext two Seekest he was good, but we need to fix some of these problems. So this one's very extensible very flexible This is the first time we've seen 32-bit timestamps, which means this is You can say a date modified a summer between here and here and this will be a problem in What 20 years? See what happens there? But we can fit a lot of files on this one. This is much better and Your maximums just depend on how it's formatted, but these are all very reasonable reasonable numbers that Are future proof in a way? Anyone heard of this one riser they call it. I say reser And this started in about 2001 it was developed by a guy in California And it was just trying to fix all the problems with the xc2 and we're seeing things like journaling So like I'll actually get into what that means later You can if you have a Like a disc like kind of a virtual disc in the cloud and you add more space to it like a nice guzzly Lund for example you can click a button and take up that space and that's Kind of a future thing that is very important in the cloud these days. We're seeing that now in 2001 Small block it's fast like very small files. It can deal with that easily versus other file systems couldn't A be tree scan you'll learn about that in your computer science courses But it's a way of searching and it's much faster than classic file systems or it literally had to Go down an entire list to find what you were looking for And it had a lot of consistency check issues weird behavior it would do but you know it was in development but this guy got accused of murder for killing his wife and It kind of stopped development of this so the lesson of the day is just don't kill your wife Because this could have gone somewhere Xt3 this is Probably the most popular with Linux these days Journaling again. We'll get into that. It's backwards compatible to ext2 Which is which means you can take an ext2 and upgrade it to ext3 like that done cool Now performance it's a little bit slower than ext2 But acid has anyone come across this yet. It's another computer science concept atomicity consistency integrity and I don't remember it's been a while. What's that? Oh? Durability yes, thank you. Thank you And this is just the concept of like and think of a database like if you have Three or four web servers talking to the same database and they want to update something at the same time if your database isn't acid compliant Weird things are going to happen with that data, but if it is acid compliant It's going to ensure oh hold on you can't touch this because this server is working with that right now So it's just a way of extra consistency And to end across the board But like before we still have fragmentation problems which affects performance Interestingly, I don't think many people know this but it can only have 32,000 directories Kind of weird and it's still based on that old EXT classic design It's just that and they just keep adding to it and keep adding to it and keep extending it But it's still fundamentally has those same drawbacks In the code that the original version did so this is where we're talking about generally. Yeah The UFS yeah, UFS is still like Apple's still using it BSD and Solaris Solaris is another story, but You UFS is very simple and very straightforward and there's new versions of it like Apple is using it primarily these days. I think Awesome and It's all based on UFS like it's it came from that if you you can look up the Wikipedia article and learn a lot about it It's maybe it's interesting. Maybe it's not But there is no source code of that No, so HFS is specific to Apple. I think it's closed source, but it's based on based on UFS. Yeah Did anyone use Windows like 15 years ago ever seen any of these this is old scan disk on like a fat 32 or let's say you plug a memory card into your Computer and it says oh we need to scan it something's wrong That has to do with journaling. It's a consistency thing So if if you pull the power out of your machine and it was in the middle of something It knows that it wasn't shut down clean So it's doing a consistency check because it doesn't really know where things left off when you have a journal. No a journal is a way of We tell the application We're not done yet Until it's written to the journal and then we kind of know we can replay what actions it was trying to do on the file system and We have that with NTFS now so Windows you'll never see this on Windows these days because it doesn't need to it has a journal so ext4 this is the latest for Linux and Like just just to give you an idea like this is a good feature list. It's Much better. This is a lot more future proof It's interesting because you can take something that's ext3 and mount it as ext4 and take advantage of all this but then you could take that same file system and mount it as ext3 again and That's it. It still works kind of interesting Performance is better. The journal has checked something now. It helps with the consistency a lot better We have nanosecond resolution with time stamps We can figure out when something was modified down to the nanosecond now Might be useful in the future And the dates go up to 2514 so we're future proof for a little while, but it's still ext it's still the old design It still has fragmentation problems Performance could still be better like there's just a lot of problems with it that we can't it's fundamentally coded in there so We were expecting a lot from a file system. We wanted to survive power outages system crashes You know when Windows has a blue screen You don't want to find out that you just lost all your data just because of some stupid problem or some bad driver terrible Geometry changes are very important these days like with cloud technology where you're Expanding the size of a disk or maybe you're shrinking it. We need to be able to do that on the fly without downtime We're expecting a lot here Consistency checks. We don't want to have to reboot to do a consistency Consistency check. We don't want to have to shut things down like it's all about the availability. We want to make sure it's acid compliant We want it to be fast fast fast can't ever be slow It has to be efficient. We want to make use of our storage abilities as best we can and We still want it to be cheap Cuz we're asking for a lot here and that usually comes at a cost. So what can we do? Let's talk about some cost things here in some complexity At first at the very bottom layer is your actual storage your hard drive your SSD your memory card There's firmware involved on this device that talks to an interface You're probably familiar with SATA or like the old IDE standard SAS is very popular in servers. SCSI is older. It's integrated in SAS now, but We're still talking to a controller somewhere else and it's another layer where something can go wrong and these controllers usually have their own firmware, too There's the controller Raid anybody know what raid is? It's like where you have two hard drives and they're mirrored so one can crash and the other stays going You can have like five hard drives that the data is split across all of them with with consistency and Check sums to be able to rebuild your data in case a drive fails This is a lot of software a lot of firmware and a lot of things can go wrong here, too And then your actual kind of structures your file system itself With Linux you'd be you might be familiar with LVM GPT MBR this is like the boot code and how the disk is partitioned up And it's just the kind of logical structures and again a lot of software involved here and a lot of things that can go wrong So these days this is kind of what we're seeing in these layers SSDs they're expensive. We're seeing storage arrays with hundreds of these Just for speed and it's not cheap SAS backplane is kind of as good as it can get these days Multiple raid controllers like you'll have two or three controllers in the server all talking to the same hard drives Just because if one of these fails, we still want to make sure our data is still there and Proprietary operating systems close source file systems close source software this open this close this Everybody has their own way of tackling this problem Here's another one This one 2009 it's supposed to be the savior of file systems It's been developed by IBM and Oracle, but it's still unstable to this day We're almost 10 years in they've been developing it like crazy, but it's just not trustworthy yet But it has everything that we kind of want we look at these features. There isn't even a journal anymore They got around that we can defrag it on the go. We can grow it shrink it We can rate it we can add remove devices. We can check it for consistency compress everything. There's snapshot It's crazy. It can do everything, but it's unstable. So no one wants to use it so Now we have ZFS And this has been developed for some time with Solaris And it fixes problems that you might have never thought of bit rot consider your hard drive. It's magnetic There's ones and zeros, but if one of those bits flips, which is really easy to do I mean anywhere along that layer something can go wrong One bit flips suddenly your data is inconsistent ZFS has ways of dealing with that and detecting it and correcting it Block sizes. This is a performance thing where like a small file can take up a small amount of space Physically on the drive and a large one can take up just what it needs. It gives us good performance CPU never gets used sits there idle a lot especially in the storage server ZFS will make great use of your CPU. It'll compress your data Which has no impact to performance and latency before saving it on the drive We're using the CPU and we're saving space pretty cool Makes very good use of memory as well like caching your memory. Sorry caching data in memory management is It's kind of simplified. It's That depends on which vendor you're talking to and If you're familiar with RAID there's this concept of a RAID 5 write-hole which It can cause disastrous effects if if the timing is just wrong when your computer crashes If it's writing that last bit of information in a RAID 5 and it crashes You won't even know that there was a problem until maybe the RAID gets rebuilt and then it's reading Data that was Consistently written, but we have no idea that that happened gets rebuilt wrong. Whatever data was there is now useless Which one beat road is about controlling Not necessarily ZFS when you store a file it will also store a checksum Since yeah, but a CRC code isn't gonna help you rebuild You can't rebuild the broken bit with a CRC Right this can correct Yeah, yeah like an X or hash Yeah I don't I don't think NTFS keeps a checksum of the blocks and this is block level by the way It's not file level. I don't think NTFS or fat keep any kind of Record of what it's supposed to be now in a raid a raid will try to But that's back to the raid 5 write-hole anytime you're doing Raid 5 or raid 6 Across devices raid 1 won't suffer from this because it's Two drives that are supposed to be exactly the same, but it won't be able to detect which one is correct It'll have no idea. It'll just know that they're Inconsistent and you could get data loss. We don't want that By default sha one To detect problems Yeah No, sha one's a hashing algorithm And you can select if you want sha to the sha 256 sha 512 Just depends on how crazy you want and that all just lands on the CPU when it comes at no cost because CPUs are faster than storage Yeah, it's really nice and then when it detects that then it'll go down to like the raid layer That's built into ZFS to determine how to correct it The file system and the device management is all integrated together So it can it has the ability to talk to the raid and say Okay, this isn't right like we detected a problem with a file. Now. Let's figure out how to correct it Whereas NTFS even if NTFS is on raid all windows can do is say Something's wrong. The CRC doesn't match. It has no way to go down to the raid level and try to correct it and a CRC like CRC is not very thorough. I mean two bits it might not even be able to detect it if it was the right two bits that flipped Whereas a sha is going to be extremely Unproblematic like the probability of having a hash collision is insanely insanely low So some things about ZFS Sun Microsystems started it in about 2005 That is a big number That's the maximum size of one volume. That's I mean, I'm sure we're going to get there someday I don't think we're there yet This is estimated how many terabytes of data there are like in every server every laptop every computer in the entire world today Well, maybe like a year ago, but you know, that's we still got a ways to go Your maximum file size is also quite massive And this is how many devices you can have in a pool and how many pools you can have in one system Aries talked about vertical scaling versus horizontal. This is some insane vertical scaling But in a storage server, you can't always scale horizontally You're forced to scale vertically and that gives us a lot of options. I think the zeros keep going Not sure but either way that's gonna work for a while and it's open source depending on who you talk to you There was a whole rift within Sun Microsystems Oracle bot sign and then closed off open Solaris and yeah, it's a whole thing but it's still an active development as Of the time that that happened they lost a lot of developers that just forked it started their own and it's still very much alive today So we just had a great discussion about the data integrity that's built into this. It's great Inline deduplication where let's say you save the same file twice It can detect that on the fly and only save it physically on the drive once Native inline compression that we talked about a little bit where let's compress the data before we store it on the disk. Why not CPU is cheap? Copy on right is a type of journaling when we want to write a file or a change to a file instead of actually modifying the original file we just Copy it to a new place and write the changes there and then it can figure it out later And that's great because if this crashes while it's doing that, right? You still have the old file there on modified The very variable block size we talked about just for efficiency purposes We can I talked about how it uses RAM a lot The deduplication table if you learn about that it's takes a lot of RAM, but we can cache it all in memory It's smart about that so it helps with performance And there's a kind of a right cache like it's kind of like a journal except It's still kind of following this copy on right policy where you write a new block Let's say your disks are very slow because we want cheap, but you can put one very expensive very small SSD that's nice and fast Into the server it'll write the changes there first and it won't confirm back to the application It's not going to say okay. It's done. It's safe until we know it's on stable storage Which could be a very very fast? SSD and we've just saved a lot of money And that's important because if your application thinks, okay It's safe time to move on to the next thing, but then something happens now your applications inconsistent Maybe there's some data loss Results may vary, but it's not something we want to have to worry about So yeah, we talked about the checksums again The Shah 256. I think that's the default. You can do Shah 512 now Instead of raid raid 5 raid 6 they call it raid Z. We're from Canada. So we say Z And we can do mirroring we can do the striping which is the equivalent of raid 5 Raid 5 is where there's one disc of parody raid 6 is where there's two disks of parody ZFS can do three disks of parody which I just like to call raid 7 raid 7 is not actually a thing ZFS calls it raid Zed 3 for three disks of parody That's a lot of redundancy and a lot of safety in case one of your hard drives fails and hard drives do fail Especially if we're talking the old-style HDDs that have spinning parts and moving parts and they fail like crazy It's very easy to like I can take a directory on ZFS and say I want to keep three copies of this directory and At the lower label layers. Sorry. It's automatically keeping that data in three different places on the disks and it's just For extra safety if you're really really concerned about the integrity of your data Resilvering is like kind of rebuilding your raid. It can do it very very smart and very quickly Classic raid it has to go through the entire hard disk from start to finish because it's it's rebuilding a lost drive But ZFS because it has visibility into the file system. It knows where the data is It knows it doesn't have to rebuild like all of these areas of the disk because they were never even used it all in The first place. So this goes very quickly And scrubbing is a way that it goes through the disk and it verifies those check sums of every file to make sure everything's consistent They say for a SATA disk, which is your typical Off-the-shelf kind of hard drive that you should do that every week because they're more susceptible to problems And this is just a great way to know that yeah, you know those those family photos from seven years ago that I haven't even looked at They're still consistent because there's nothing worse than like You go into that album from seven years ago and some of the pictures are corrupt you had no idea your backups are Maybe they go back a couple of weeks But the corruption happened years ago you'd have no idea but because of this scrubbing process We can prevent that bit rod and just have that Extra we can sleep better at night knowing that Even all that old data that we haven't looked at in the long time is still going to be there And we can do shit stuff like snapshots cloning. We can take the entire drive and say just take a snapshot I just want to keep that in the back in case I need to get some old files later We can literally clone the entire system over to another Array, it's built in if you have two of the same server we can say yeah, just copy it over to this one Go done and it's quick because it knows where the files are The performance is awesome like CPU is never used in file system storage servers essentially so we're gonna make use of it with ZFS De-duplication a lot of people are demanding that these days. It's just very efficient and ZFS does it quickly The ARC is the adaptive replacement cache. It's just a way of keeping track What files are accessed most and what act what files were accessed the most recently and it's kind of a combination of those two queuing styles An L2 arc is a way that we can store anything We can store more of that on a disk because you know memory maybe you don't have enough of it, but We can put another SSD in this machine and say well just use this for more adaptive replacement cache And it just makes the reads go super super fast, especially for things that are accessed all the time just because of the style of Q It is it knows all the state is right here. Here you go and it can worry about Writing things to the disk later as soon as we know it's on that SSD Which is it's essentially a right cache, but we know it's there and it's safe And if the disks for your array are busy, they're doing something else more important We're going to write the data to it later because we already know we have it and it's safe Whereas like a classic file server it might keep it in memory But memory is volatile. So if you pull the power on that server everything in memory is gone Including your data that wasn't safely written to the disk yet And variable stripe size has to do with raid again. It's kind of like the block size In old style raids your block Sorry, your stripe size is fixed when you build the array that might not be efficient for the types of files or The different demands of the files that are going to be stored on the server So with a variable stripe we can just easily Kind of keep things more consistent and more efficient Down on the base disk an administration of a ZFS file system I do a lot of things from command line, but it's very kind of straightforward V devs are your disks essentially The ZFS pool itself is your construct, you know We've taken these four disks and this SSD and we've handed it over to ZFS There's your pool and then the data sets are kind of your virtual Kind of file systems you can build on top of that and all your data sets is where you get to Explore these options. Maybe you don't want to deduplicate all your data You don't have a lot of RAM on the server or or whatever You could have different compression options like we can do quick compression or we can do very heavy compression At the expense of more CPU power Lots of flexibility lots of options It kind of looks like this like this would be This machine has one two three four five six seven eight Eight drives rated together There's a SSD mirror It's two drives mirrored together for rights because they're very worried about consistency and two drives that are just For caching for reads some of the things like these are the different Data sets and these would have different options on them Like maybe this one's duplicated and this one is not compressed and this one has snapshots taken every week There are some drawbacks of ZFS today, especially because of everything that happened with Oracle and the forking of ZFS codebase, but you can't reshape this like once you've set it up in this kind of format This can't be changed. Whereas like a Linux LVM style Raid you can work with it and change it maybe add another disk to the raid We can't shrink this. I mean it only grows and it's With cloud stuff sometimes you want to be able to shrink things There are extra performance considerations Like if we're turning on deduplication, you need to consider how much memory that's going to take up for how much data You want to store or think about like well does my data deduplicate well or not? Like how much duplication is actually happening in my data? Maybe it's not enough for you to want to turn it on and like the design considerations just because we can't really change it later We can add to it. We can grow it I can take this and add another four drives into another raid and just add it in and have all that extra space and it'll It'll take it. It'll work with it. So you do have some flexibility, but it's always good to get it right from the start This is the thing about what happened with it and Oracle and Sun Microsystems they started Java they supported open source I mean like I'm sure you guys have programmed with Java and countered it at some point and When Oracle came in and bought Sun they murdered all this kind of open source love that that Sun had and They went and they closed source Solaris open Solaris just got killed and that causes some problems with licensing So what happened is all the developer it will not all of them But a very significant number of very significant developers just said I quit I'm done. I'm out new companies were started and They started a Lumos which is kind of the Linux of Solaris. It's pretty cool But it's kind of it's a fork it goes down its own path but it's all the guys that basically designed and created ZFS now they're working on an open source and Under licensing they're allowed to take that source code from 2010 when this was still open source and continue developing it And so Lumos is an open source kind of Solaris that has a lot of active development and And it's true because Oracle has now actually shut down their Solaris project. It's done. It's over not supported anymore I guess they just lost all their developers. I mean, they're not very smart about you know Managing their people and making good decisions And the best part is all the work that they're putting into ZFS today Oracle can't touch it because they would have to open source all of Solaris To even touch that code under license But it doesn't matter now because they killed Solaris Good job guys So if you want to play with ZFS ZFS on Linux is now considered stable a bond to is considering making ZFS the primary file system in their next release Illumos is kind of the Solaris the open source Solaris and there's just like Linux There's lots and lots of distributions and flavors and depends on what you like Fuse drivers This isn't even necessary these days because ZFS on Linux is stable Like this is a kernel module and this is like a user space driver that just doesn't have the same abilities and There's a Mac ZFS as well. I Don't know much about it. I'm not a Mac person. I think that's it So if you have any questions, please feel free Well, the best thing about the cloud is you don't have to worry about it It's hard to know for sure like Right It depends on which layer of the cloud you're getting into whether we're talking infrastructure as a service platform as a service Application as a service pizza as a service. It just depends because This the file system is such a lower layer that if you're if you're subscribing to like an application as a service You're just kind of giving them your source code. You don't have to deal with it. They've already they're dealing with this in the background But if you're doing like infrastructure as a service, this is very much infrastructure So you would have more selection and more to think about and more to deal with With how you want your files kept or if you're building your own file server even but the yeah The best thing about the cloud is that usually you don't have to think about or worry about those kinds of things But the thing is that different file systems eat different computational power rate Kind of like performance wise there's benchmarks of like okay Let's copy this this big file from one to another and they do it on each file system when they see which one's fastest Performance wise that FS is pretty good EXT is a little bit slower in some regards. It's a little bit faster in some like it's There's always benefits and drawbacks to every file system and for me and my customers and the work that I do The integrity of the data is way more important than the performance And it's a bonus that this performs very well for what it does It has its own because it's copy-on-write there is a lot of fragmentation but because of the read cache you always have like Kind of copies of files that are accessed the most often ready to go maybe on an SSD or in memory Because it's a read cache. We don't care what happens to it, you know It could crash and burn and that's fine because everything is still safe on the base disks So that's why you can just throw a cheap SSD in or just throw a lot of RAM in and your reads will be very very quick But the rights will get very fragmented, but you don't even have to think about it Yeah So I said about the benchmark so say I'm working with big files like videos I'm always editing them how much a CPU and RAM can I expect to can consume with that if I say I'm working like say half a terabyte file What I would do in that case is don't De-duplicate it because it's a video file and chances are there's not a duplicate of it somewhere on your file server Like because if it tries to go through a terabyte file and look for duplication, it's going to take a while It's going to have a lot of overhead so you would store that somewhere on a data set without de-duplication on You may even consider turning compression off because it's a movie file. It's probably got its own compression Depending on what format you're working with so in that case. We're just kind of talking straight down to the disk and it's Just the same performance you would get from from the disks on a different file system So the main use of CPU and RAM are for de-duplication and caching on the CPU will be a lot of compression like compressing a file is CPU heavy and Like consistency checks like computing the Shaw 256 value of that file so it can store it But the disks are always slower than the CPU by like thousands of times You know the CPU latency is very very quick Whereas even an SSD might have a one millisecond latency, but the CPU we're talking microseconds So the CPU can do a million things in the time that it takes the disk to do one thing