 It's going to sort the MySQL user table by the host address from most specific to least Specific and then it's going to take the connection you're coming from and it's going to match that up on The most specific so if you have a bob connecting from able to connect from anywhere He has the wild card host name if he's connecting from a local host They're still going to look at local host and then they're only going to look at the user names Under local host. That's it So Bob from wild card has the potential to connect as The anonymous user from the local host because it matched the specific local host value first understand that gotcha Okay So once it figures out what user account it's going to match up From the host value to the username that you're connecting it as it's then they're going to ask you. Oh I totally screwed that up This is the user account information. That was what I was talking about earlier with the host name and the user names This is how you create them Specifically you're going to notice that over here in the create the shot 256 at local host Those are a user account names that you're actually connecting under so when I said earlier You're going to connect under the host name That's the local host that you're going to be connecting under and then it's going to look at that shot 256 to try to match it Quick things you need to know about the user accounts If you're in 5.7, you want to use the create user and the alter user to create the user accounts If you're on an older version of that You could use create user Alter user wasn't the alter user did exist at the time But it has very limited functionality Prior to 5.7. You had to use another command that we're going to be getting into real soon See, this is what happens when I'm not able to look at my presenters desktop I forget which slides are coming up next Anyway, as I was saying once it figures out what user account that you're connecting as it's going to actually to prove it And that's where you're talking about your passwords In my SQL a blank password for a user account is does not mean that any password will work You literally have to give it an empty password string Also in my SQL we now have the ability for you to expire your passwords in the 5.6 and 5.7 the expression of passwords are available in 5.6. You have to do it manually So you have to explicitly say expire this password for this user in 5.7 you have the ability to set a policy for a password to expire like after six months 180 days That forces your users to have to change their password because the next time they connect once their password expires They have to change their password My SQL is not very smart about this. This is still a relatively new feature So if they change their password, there's nothing right now to prevent them from changing their password from their old password To the same old password, but it doesn't make them change it officially My SQL also has a hashing of the passwords. We have multiple Authentication plugins that you can use for the hashing you have the original native hashing algorithms. You have the What was it the shot 256? We have Pam available now. You have the windows native Authentication is now available for you to use with your minuscual user accounts We also have a password policy that you're allowed to use now using the validate password plug-in Which is a community version. You can totally use it Basically with that if you use that for your native Passwords and using the clear text supplied checker It allows you to set up the strength of the password that you require the user to have it has Three levels of password checkings that you can be modified and a whole bunch of settings from how many lowercase letters You require how many uppercase letters required you require numbers to be used you require special characters to be used Do you require a specific length? All of that can be set by you to force your users Into using a strong password See I told you we'd be moving fast, didn't I? Okay Next on access control you're talking about stage 2. This is where the custom the client is coming now It's matched up the user account is verify that yes, they are that user now. What do you want to do? Do you want to select something do you want to create something do you want to whatever it is? What are you doing? Are you allowed to do that? And this is where my SQL grants come in MySQL grant Defines the privileges and account characteristics for the user accounts within my SQL MySQL does not have the concept of ownership that other databases may have Everything belongs in a global scope and the individual users have individual permissions for those database objects So it's a little different conceptually than a lot of other databases if you're using those In my SQL you have multiple privileges that can be given to a user So you can give them the create privilege the ultra privilege the select privilege the super privilege the insert privilege The cult privilege the function privilege whatever you want to think of there's probably a privilege for it So you can be very specific about what the individual users are able to do You can do it at multiple levels. You could do it at the global level where they can do it to anything That's in the system you can do it at the database level you can do it at the table level You can even do it at the individual column level if that's all you want to give them for least privileges You also can give them This was the remember when I was talking about the create user and the older versions You use a secondary command to give them the account characteristics and stuff like that you use it with grant In 5.7. We're trying to move it all over to the create user But if you're on a version earlier than that you do that with the grant That's where you can tell them that you require the client to connect using SSL You can also set restrictions on those clients. You are only allowed to query this database X number of times So that way you can have flooding from the user There's a various other command other Characteristics you can set to them. It's all available in the manual. There are a number of pages going over the grant command specifically so if you're starting to handle user accounts and Giving them access control and stuff like that Those are going to be pages that you're going to want to read really early on to get a grasp of what's going on Here's some examples of grant commands Uh Show grants will show you the grants for the user that is connected On that one in that case it was root and it gives you all the grant commands that they have so For root it has grants all privileges that means I have access to everything on star dot star Which means I have global privileges on everything with grant option means that I can now give somebody else grant good permission because that's another one of those permissions that they do Here's down here I'm asking for the grants of a specific user in this case. It's test that local house which their usage is Totally different than the root because if you look they also they have the grant all privileges on test So in the test database the test user can do whatever they want, but on no other database or they have it available Now if you're dealing with grants you got to be able to revoke those privileges as well That's what the revoke command is for to take away the grants that you gave The big thing about revoke that you have to remember is that it is not smart. It is Absolutely stupid. It will not be able to extrapolate out any of The permissions that you're trying to revoke. I'll go over that in a little bit so you can understand what that means Revoke also does not remove a user from the user table. You actually have to drop the user to remove them and If you give revoke without giving a host name, it will always use the wild card host value and Then when I say it won't extrapolate if you have Bob at local host and you say revoke permissions for Bob With no host values Bob at local host will still stay there because Bob at local host is not the same as Bob at wild card. Here's a little bit of the examples for revoke We have our test user again. I'm trying to do revoke the delete Permission on the test dot t1 Test database table one for the test at local host and I get it a kickback again error on it Remember I said that it's not smart because my privilege up here says grant all It will not be able to extract the delete out of the grant all You literally have to revoke the all and then grant it all the permissions back except the delete Told you it's not smart. This is kind of it's a sucky part about the access The the user accounts, but there you go. It's something to help you understand if you end up doing these kinds of things Quick note usage. It just means that you're allowed to connect. It doesn't give you any access to anything But it allows you to connect to the account Next diagnostic information as a DBA you're gonna want to collect diagnostic information about your system MySQL has a what's called the show command and It is a specific my SQL command to get information about the my SQL system You have commands Using show to collect diagnostic information Sorry metadata information as well as server status information and there are a metric crop ton of them There are commands for Just about anything So if you want to know what databases are available within my in your my SQL instance Using the show command you would say show databases You want to know about what tables are in a specific database you use into that database and show tables Let's see now. I got show databases so triggers show plugins Show create procedure if you want to see the create statement for a specific table a show create table Show grants we showed that earlier Show indexes you want to know the indexes that are on a table show indexes for the table That's why I said there's a crap ton of them. You have status show slave status if we're using replication show open tables Table status tells you things like how many rows are in the table? How big the table is things of that nature all of these commands have their own manual pages That'll give you information about what the different commands go into the idea The reason why I'm telling you about the show command is so that we as a future administrator You can go up and look and say there's probably a show command for that And you can hit the manual and go look them up Now along with the specific show command for my as well We also have the information schema if you're familiar with any with other databases You're probably used to the concept of these things Information schema provides mostly metadata about your system So you can get things like the process list the variables which are your server settings the global status which is minuscule keeps counters of everything that happens in the system. That's part of the global status and Files table spaces data files all of that stuff is available as part of the metadata But inside the information schema They also have a couple of other tables that are very interesting Specifically the inner DB transaction table the inner DB locks table and the inner DB lock weights table Those are big ones if you're probably having problems with your transactions as well as your locking You're gonna want to look at those as well as inner DB temp table info if you're using 5-7 and using inner DB for your temporary tables As a beginning DBA I highly highly highly can't stress this enough highly recommend you take a look at the sys schema Sys schema comes installed by default in 5.7 You can however manually install it in 5.6 if you have that It was originally known as ps helper and it was originally created by an old boss of mine by Mark Lee who actually works at my SQL still Here's his website where it was information There's the link for the downloads if you want to install it in 5.6 And you do want to install it in 5.6 and the really cool thing about sys schema Is that it takes the information out of the information schema and something? I'm going to talk about in a little bit called the performance schema and it actually makes it into something human readable and easy to use for you Now the sys schema Has what's called paired views? So you have for example one of the views that are available in sys schema is host summary by file I owe So that lets you find out which hosts are hitting your IO disk most often You also have the same name started by X and dollar sign the one without the X and dollar sign is done in Values that are easy for humans to read and comprehend The ones that start with X and dollar side is the raw data. That's actually like timers are in picoseconds. I Want something actually in seconds or minutes something that I understand so I always use the original names But if you have some kind of Tooling or automation that you want to throw on to or monitoring you'd want to hit the raw data. So that way you could use that Examples of the views that are available with in sys schema that you as a Dba would be interested in Statement using full table scans so that way you could look at your queries to find out who's doing full table scans for query optimization Statements with run times in the 95th percentile that is queries that are running slow Longest the longest running queries. I owe by thread by latency Which connections are hitting the disk the longest? Why are they that would be something for you as a DBA to investigate? Memory by user by current bytes. What user accounts are using the most memory? Are they something that you need to investigate and? And Another one schema redundant indexes if you are doing dynamic if you are changing your tables over time for tuning your queries You're probably going to have redundant indexes along the way Why take up the extra space? Why have the extra maintenance involved with a secondary index that you're not using or is redundant? This will tell you which ones are so we could remove them help optimize the use of your your resources All of scheme is available through the command line just like any other database if you prefer a gooey kind of Interface minus you all workbench Has an interface community addition doesn't even have to be entered about prize edition to install and then look at Cis scheme of views and tables So it'll list all the various views and reports that it has available and you can just click on them And you have the information right there cool That's why I said CIS is fantastic for Beginning intermediate and even advanced level DBAs because it's so easy to use and it's just built-in right there Now if you want to be a hardcore DBA You want to look at the performance schema Generally speaking this is for your knowledge because as an entry-level DBA You're not going to use a whole lot of this unless your back is totally against the world and you're totally screwed And you have no other options Performance schema allows for the monitoring of the MySQL server at an extremely low level Which is good because it means that you have access to various times and information But it may due to the way it is it's by nature extremely complex Performance schema uses the poor former scheme of storage engine It has information available on the current events that are occurring in the server and by current events That could be anything from a statement running to how a mutex is doing You have event histories Summations all of that is available through there the configuration is dynamic There is no change in behavior by turning on the performance schema on the server It's all built into the server itself and you can query the performance schema using SQL When using the performance schema, we have an entire manual section of course available for you to read and learn about So here's your link for your general manual if you want to help want information on helping diagnosing problems using the performance schema There's a manual page for that and if you want to query a Profile a query for whatever reason you have a query that's running slow And you don't know one know why and you want to know the nitty gritty of how the query is Executing in the times involved. That's when you you want to do your query profiling to be able to get that information There's a number of blog posts available on the performance schema through the mysql server blog and Many many presentations and webinars available as if I remember correctly there is right now a webinar on the on demand mysql's on-demand webinars curving at least in part performance schema if you want to get an idea Again, this is Performance scheme is more for the advanced database administrator, but as an entry-level database administrator administrator You need to know about it because if no other reason than to have at least a general idea that's this is built on top of it So that's where you find diagnostic information Next log files as a DBA you had better know where your mysql systems error log is if you Remember nothing else about this entire session The single thing you need to know as a DBA is that you need to set your error log in a specific location And you need to know exactly where that is I can't tell you the number of times that I have had customers come to me and say I have problem X How do I fix it? And I say I need your error log and they say where's that? Okay single greatest takeaway Set your error log know where it is by default the error log is located in the data dictionary just out of curiosity How many people even know where their data dictionary is in their mysql instance a? Lot of you people are going to be looking up things aren't you? Just saying So by default it's in your data dictionary The type of information that is logged in your mysql error log includes things like certain stops of your servers if You have assist if you have a server crash That will be logged in there if you have critical errors on your server That'll be logged in there if you have my isam tables that are who here's still using my isam Oh, come on fess up fess up. Who's using my I saw one person Okay, if you're using my isam tables and they crash the only place it the only place It is notated is in the mysql error log. It is not notated anywhere else and If you have a server crash, you're gonna find Mysql will always try to provide you with a stack trace for the server Reason why I note the stack trace is because if you want to know if it's a bug that Crashed your server you want to take your stack trace and go take a look at the various bugs and try to match it up He's just a way to find out if you hit a bug Now some people don't like having their mysql error log in its own little thing They want to keep it in the syslog for maybe company purposes, whatever as at 5.7 now You could automatically turn that on there With the log syslog to send your mysql error log to your syslog and in 5.7 We also have a now you can set the verbosity of your error log as Dba you will always want it to set that almost as high as you possibly can take it Because nothing worse than a problem with your system and you have zero information on the problem So you want to have it as verbose as you possibly can take it error log If you remember nothing else you remember this Okay, next thing. That's a good thing for a DBA to know is about their slow query log Their slow query log is your first line of defense in tuning your person your server. I don't care How much tweaking and tuning you do with your mysql configuration Tuning your queries will give you the single greatest performance boost on your server than anything else Think of it this way if you have a race car with a driver that is afraid to go over 35 miles an hour No matter how fast that car can go guess how fast it's going to go no more than 35 miles an hour Your queries are the driver Okay, so why would you turn on the query log again? It's usually done for performance issues Generally speaking as a DBA you're going to want to be ahead of the problems You want to be proactive not reactive? so it's good to occasionally especially after you have a a New functionality applied to your application if you I'm totally missing that word Refactor your code or something like that. There you go You're gonna want to take a look at you're gonna watch your query logs after those things to see if anything has changed And if you have to tune anything you can turn on this low query log Dynamically, or you could use it on the command line when you start your server The default location is within the data directory again Another cool thing you could do with a slow query log now is instead of it actually having it being a log file You could actually make a table out of it So that way you could run SQL queries against the table to find these things you have multiple options for controlling it everything from queries that do full table scans automatically write it to your slow query log to queries that take longer than 10 seconds to run two seconds to run a point two seconds to run Worthen to the query log. That's all available for you now when you're working with a slow query log Your best friend well officially according to my SQL policy your best friend is my SQL dump slow That's the utility we tell you to use to aggregate the information in your slow query log to actually find information instead of just raw data out of it Now since I don't have time. I can't show you that but we could always talk about it later if you guys ever want to back at the table Next thing I want to talk about is the general query log. This is my SQL's general record for the community version Why would you turn on the general query log? If order is important the general query log logs everything as it comes into the server Not as it executes, but as it comes into the server So if you have deadlocks occurring and you don't know why Maybe they're coming in maybe these two queries that are deadlocking are coming in in a different order than you thought they were General query log would help you figure something like that out If you have a query that's throwing a syntax error, and you don't know why But it's dynamically generated through your application. This will catch the exact query that's coming into The server so you can take a look at it from there and then work backwards into your application It also provides a minimal audit to find out what the various connection IDs did on the server It's enabled genetic dynamically again You can set it at the server level and there are multiple options for controlling it The biggest problem people have with the general query log is because it logs every single thing that comes into the server it can grow very fast if on even just a moderately busy server So that's something if you're worried about you can potentially hit space problems really quick So generally speaking most people if they know they're having a problem They'll turn it on for a few minutes run the problem and turn it off Okay Next log you need to know about is binary log The binary log is different from the general query log because it only logs the change events that occur in the server The general query log records everything The binary log is change events. So the binary log is not gonna have your shows It's not gonna have your selects or anything like that. It's gonna say it's gonna show your insert your updates To delete your creates things of that nature Why would you turn on the binary log? if you use replication you use your binary log and The important one data recovery binary log should be a part of your backup strategy. We'll go over that in a little bit You enable your binary logs using the log bin And it has a lot of options for using it and controlling it And when I say a lot, I mean the manual page is like Mondo Mondo long So Take a look at that You want to read your binary log if you need to read your binary logs for whatever reason you're gonna use the MySQL bin log Utility you can also disable the binary logs for your current session using the SQL log bin So for example, you're running a long running query. That's a one-off for a quarter something and you don't care about it Or you're testing something and you don't want it on the slave. You can turn it off for that individual session any questions so far Yes, sir. Oh, wow, you guys do microphones. I mean people just yeah, so Sorry, I was wondering if there is a way to get cumulative reports of connections that led up to the database crash To give you an instance. We had a surge of connections It went all the way to max connection limit easily No, not unless you have your general query log I see my SQL does not remember things unless you have a log to remember it and even if you have general query log enabled It doesn't give you any reports or cumulative stats or anything like that. No, it tells you the individual serve the individual connections Okay, and what is the performance overhead of having I mean pushing your log verbosity all the way up Let's say I'm logging to a table just just whatever your IO would be for writing it Okay, so that's that's that's that's Relates to the disk. So if you have your log file or any of these log files on a separate disk separate from your data directory your IO implementation would I'm not done yet. Just to let you guys know I still have more stuff. I just wanted to catch any quick questions in here So it the performance implication would be whatever it is for you to have to dealing on that disk Okay, so my last question is that you said just General query log is my only option. So would event histories and performance Schema help me in debugging crashes Catch me on later because that's a yes and no answer Okay So Sorry about that. I just want to see if anybody had any quick questions the next big thing We have a whopping 11 minutes to cover backups And backups are the single greatest thing that you as a DBA have to implement and You when I say you have to implement it you have to have a rock solid backup Because I can't tell you the number of times that people have said my system has crashed I have corruption in my files or whatever and I can't get it back up First question is do you have a backup? No Stop your server pull your disk depending on how bad it is. That's your only option Backups have to have people get fired for not having a good backup Now there are two types of backups you have a logical backup and you have a physical backup We're gonna go over logical backups first logical backups are save the logical structure of your system and It's content logical structure. I mean like your tables your triggers your views your procedures all of that stuff Logical backups are machine independent so you can put them on any system and rebuild your system from them But they are generally speaking slower to generate They usually acquire the server to be up and or warm while you are making that those backups But they have full granularity if you ever have to restore anything for them Examples of things you could use to make a logical backup is my SQL dump. This thing has been around forever It is rock solid What it does is it makes the logical backup from a command line a client and It makes it generates text files of your data Most people use my SQL dump to generate SQL files so that way they capability into any system Although if I remember it you could also do see it also is available for you can output it in XML and tabs limited as well I can't remember, but I think you can Very very flexible because it's a text file. You can edit it in any way you want it to You can copy and paste sections out if you only want to restore individual things If you want to switch over from your storage engine for a specific table to be from my SM to NLDB It's delete type done It has questionable scalability though because if you have a large system My school dump literally queries the system grabs everything from the table sends it to the client and the client then has to write it in a text file on The system that all takes time So larger systems it's going to take longer to do these but again, but these are rock solid though You will always be able to restore from one of these if you are one of those people that like gooey interfaces instead of the command line Workbench actually has a data export that has an interface for you to actually back up Using my SQL dump for it. Some people like that. I want to make sure it was known. It was available If you have a larger system, you might want to consider using my SQL pump My SQL pump is only available within 5.7 and it requires a 5.7 server to use it At least the my SQL official version. I should say It's similar to my SQL dump However, it allows for the parallel processing of the backup Rather than the single threaded nature of the my SQL dump It dumps the user accounts Instead of dumping the my SQL user tables It dumps the user accounts as creates users and grants which is kind of nice By default the information schema the performance schema if you're using cluster my SQL the NDB info and the CIS Schema are not included in your dump because it's assumed they're trans their transitional or Ephemeral they're not gonna be around very long. You're not gonna need them when you restore and Reloading it has faster loading of the secondary indexes So if you're on 5.7, that might be a better option than my SQL dump If you only need to do a logical backup of your content rather than your structure You might want to check out a select into out file and load data in file Select into out file allows you to put your information into a text file wherever the heck you want it to But it's data only so you're not gonna have your create tables You're not gonna have your create procedures. You're not gonna have your triggers or any of that stuff but if you need to do a dump of Or a backup of just the data itself. This is actually pretty relatively quick to back up your information with Big thing you have to remember about it though is that it will not By default give you a consistent backup Because it'll only depending on upon how you do it You can have one table in a different state than another at the time So if you use that pay attention to make sure you get a consistent backup You can specify with the line and column terminators as you want and there's lots and lots of details Again see your manual if you want to use that option Physical backups physical backups are Raw copies of the actual files on the server It's much much faster than logical backups Can be extremely compact because depending upon what file system you use they work that way The big thing to remember though is that it's file based granularity, which may not be the level of granularity you need and They are usually taken the physical backup is usually taken while either the server is down or locked So What types of physical backups can you take you can take a file system snapshot of that if it's available on your file system for your OS and You will basically do a flush tables would relock so that way you lock your tables in one consistent state Take your snapshot so that way everything is consistent and then unlock your tables So that's what that is you have to It's pretty good I know a lot of people that use it the biggest drawback is if they actual files themselves are corrupted you have corrupted backup Since I work from my SQL. I have to mention the MySQL Enterprise backup anybody here in MySQL customer Okay Get with me later if you want to know about MySQL backup because we're running out of time But it's our official backup solution. It's I have to admit it's actually very very good So if you're a customer or that might be enough to make you a customer, I don't know It's a fan. It's actually a very very good product. So that lets us skip two slides Oh, if you have MySQL backup as the customer, you also have an enterprise edition and workbench You have a hook into a Ted use a GUI interface to use it, which is actually quite nice Now part of your backup system, this is actually my last slide part of your backup a procedure Whether you use logical backups physical backups, whatever you or a combination of both You should also be backing up your binary logs The reason why you want to pack up your binary logs is because after you make this backup of your whole system You have a time between that backup until you serve a crashes That time means that that's how much data you can lose By backing up your binary logs You're getting the change events from the time of that backup to whenever you want so you can do a point-in-time recovery You could literally reload your backup for whatever type it is and then roll your binary logs forward So right before the crash and your bosses will love you for that So that's why I said earlier when we were talking about the binary logs You always want to have them turned on even if you only have one server if for no other reason Then data recovery is part of your backup system Um You can do a physical copy of your binary logs You just basically have to rotate your binary logs and take a copy of the individual files You can do a logical copy to a remote server using my SQL bin log Remember that utility I mentioned earlier. You can now either do a streaming backup of your binary logs to a remote server or Static copy whenever you want them to the big Gotcha potentially about a streaming backup of your binary logs is that bin log was never originally designed to do this So if for whatever reason if you have a break in that streaming it won't automatically restart again So you're gonna have to regularly check on it to make sure it occasionally Is still running depending upon what your requirements are is how often you might have to check that it's actually still running so That was all my slides. I told you it was a lot of information really fast. We're running out of time any quick questions Hi Have two questions. So there's an option called I know to be force recovery And that has six options from one to six. That's gonna take too long for the time We have okay, hit me up on the table. Am I a totally out of time for the next person? Yes, and actually yes, I am Okay, I am sorry about that if you have questions. I apologize hit me up on my booth I will answer any questions that you have up there. I don't want to take time out for the next person Thank you very much, and you guys have a good day. Okay. Oh And don't forget. I have another talk later on this afternoon. It's called MySQL troubleshooting TLDR too long didn't read it's basically gonna say if you have a problem What information should you collect to try to figure out what your problem was? That's what that's gonna be this afternoon Thank you very much. Thank you. Legger. Thank you. Legger. That was a great talk learned a lot So when we wait for the next speaker to set up, I'll introduce her quickly to on the to save time So our next speaker is perhaps the youngest speaker in the whole lineup of speakers That's Ruchi Singh and she works for snap deal. She's been working at snap deal right out of college. She works She has been experimenting with Programming languages since college days, but she completely found Very interesting thing called cloud as soon as she joined snap deal two years back or so and she will and in this day of Microservices, she is going to tell us her story about how she helped migrate Around 300 my microservices from AWS to snap deal cloud check. Am I audible? Yeah So thanks Pradipto, all right, let's get started. Thank you everyone for having me here My name is Ruchi Singh. I work at snap deal as a software engineer in cloud and infrastructure team and in this presentation I'm gonna talk about the journey that snap deal took from moving from a public cloud to its own private cloud Which is based on open stack So, what are we gonna cover in the next 15 minutes? I'll tell you who we are What do we do to get you an idea on how why we migrated our complete infrastructure from our public cloud to our own private cloud? Why we migrated our 300 micro services are more than 200 data stores and their dependent applications Then I'll tell you how did we plan for that migration? what runtime gotcha we found and how did we fix them and Then we'll talk about some rule by strategies. We figured out in case of failure and after that I'll share some key learnings So snap deal we are an e-commerce company. We are one of the largest e-commerce marketplace in India We have more than 30 million products in our catalog and we have more than 300,000 sellers who come to our platform to offer their products to the customer base We have and in the last few years We have seen a phenomenal growth in our business in the amount of transaction that we are doing in our platform so This is we then what did we do regards to open stack? We have built our own private cloud. We call it siris So siris is our private cloud based on open stack We have more than 16 petabytes of storage that we have built for our infrastructure We are running in our two different regions with the more than 100,000 cores of capacity Our networking is 40 gig from servers going to the spine notes and 100 gig on the top of it so very fast networking we have and We are doing all of this with hundred percent automation and deploying everything through Ansible and some automation scripts that we have written on our own and We just came out when we when we launched our private cloud and We got to know that we are one of we are on the top four percent of Glove of in the list of global open stack deployments in the world. So very happy to know that We are running in a very high-core density rags of more than 3,500 cores per rack in a completely broad-based architecture So this is our cloud. We call it siris So here's the story begin. I'll make sure that you guys don't sleep So because I'm gonna tell you our story that how did we migrate our complete infrastructure from? AWS which is which we used to use earlier and how we migrated to how we migrated this to our private cloud So before starting anything we always do this we always make a checklist kind of checklist before Starting anything big or small so The first item in our checklist was understand why you are migrating we also did this we figured that Why we were migrating why we were what made us to start such a big project on such a big scale And we also saw that where we were lagging behind where we also saw from a business point of view that Where we spending a lot in our in the terms of our infrastructure need We also were we having any performance issues any security kind of issues and and many more so after that We having strong reasons and factors that we had to do it and we had to inch We were insured that we had to migrate our infrastructure strong planning was needed for this project as it includes lots of risks and failures Because this is the project where you can fail Where you can when when you are migrating all your services in the new data center You have to ensure the compatibility between the applications and the and the new data center So strong planning was needed for this project also after that Regaining knowledge gaining knowledge about the infrastructure was the main and huge task for us some applications may be decade old So we decided to have a complete knowledge about our infrastructure first about like how many we figured out that how many Services are currently running in in our existing infrastructure. How many data stores they are using and how they are communicating with each other so We figured out that how we can basically Learn everything about our existing environment before starting migrating from from a from a different cloud to the different cloud After that risk reduction and ensuring compatibility was also the main concern for us before starting this so Extended or unplanned down times if I if I talk about risks Extended or unplanned down down times were the main risk that we had to recognize them and we have we we try to Mitigate them and to what extent our business can tolerate these risks will depend on the Importance of the application. You are migrating at a particular point of time. So similar calculations was needed to Was needed to apply on the risk of data loss the more important sensitive data The more safeguards needed to be in place to prevent its loss during migration So we thought of some kind of a disaster recovery processes some kind of extensive backups that can be work during migration then for all applications for ensuring compatibility we Thought of try our migration on our staging environment before doing it directly on production so that we can ensure the Compatibility between our application all our applications and the new data center And all these things like networking and all these things The fourth list at item was network particularities and limiting latencies in terms of network We had to ensure that that each service should have a pre-determined space in the new network And for this it is important to consider all the aspects of firewall settings domains and trust certificates Are to ensure the full compatibility with the new network If I talk about limiting latencies so controlling latency was the important and main concern for us before starting this Especially for those services which were business critical So knowing exactly which all services work together and their frequent and the frequency of their communication Can work in terms of controlling latency and Last but not the least getting everyone on board was also the huge task for us because we are we are a big company We have more than 5,000 employees in our company and we have more than 70 components running in our in in our Company so we have we had to tell everyone that we are doing we are starting such a big project and you had to be You or you all had to be with us during this project So getting on board was also the huge task for us So this is the main items in our checklist. We listed down before starting such a big project After that why did we build our private cloud? What was the main reasons that we what that made us to start such a big project? So cost was one of the major factor like I said before that We were running in a hyper growth mode and as our business grow our infrastructure requirements also continued to grow and Our bill on the public cloud was phenomenal So we had to find ways to control that bill or to reduce that one It was very clear to us that at a certain scale a public cloud stop being cost effective They are okay when you are just starting up But at an inflection point you need to start some some kind of alternative Some kind of alternatives and for us it was building our own cloud But how did we made it make it cost effective? Stabdy cloud is completely running on open source technologies We did some analysis on some enterprise technologies like VMware and all and we also calculated the operational cost of managing our data center and we came up with this plan that we had to do this and we can We also have a small in-house team to build automate and control And manage our data center platform So the next big reason for us to do so besides cost was performance and security We wanted to get more performance from our infrastructure site and in public cloud You are restricted because you are kind of a She you are kind of in our shared tenant architecture and there's only so much performance you can That public cloud offer they can offer you more but again on a very high cost But it is restrictive for the large scale that we were looking for so by building our own cloud We are now able to optimize it for our self use We are able to put some some advanced security appliances some DDoS preventions in intrusion Detections into our data center. So definitely our a step up step up into the security that public cloud offers and lastly Data sovereignty as I said that we are an e-commerce We are an e-commerce system and we are an ecosystem. So we also have a digital wallet Which required us to make sure that all the money related data that we used to store in that application Remains within the boundaries of India and at that time when we were using our public cloud They did not have any region in India at that time So for at least that for at least that particular application to be hosted within the boundaries of India We had to look for some alternative reasons to to at least to to host that particular application and to remain to keep that data within the boundaries of India. So these were the four main reasons Because of which we built our own private cloud lower cost better performance security and compliance and data sovereignty So we started before started our migration we Started it started this by gaining our knowledge by gaining the knowledge about our existing environment We found that we have more than 300 micro services running currently. We have more than 200 data stores Which includes my sequel MongoDB Aerospike Redis elastic search rapid MQ Active MQ and many more we figured that how these services are being deployed Are they being deployed manually or through some automation for our deployment purposes? We use chef. So firstly for those services who which were deploying manually We onboarded all those services to chef and made them deploying through automation so that in a new infrastructure There will there will be no manual Diploments or no manual thing we have to do so we first onboarded we first make all our applications Auto deploying automatically After that These were the after planning steps. We did we like earlier in our private public cloud There is no center place We are an ops person or a dev person can find that okay on in this particular server This particular application is running. We have like like I said before that we have 300 micro services So what we do we what we did? So we actually listed down all our services in our in a central place and in that case In our case it is a YAML file. So what we did we actually listed down all the services in a YAML file all the critical informations for a particular application that we needed like on which port that particular service used to run which all Applications it used to communicate with each other what all databases it uses So we listed down all these information in our YAML file and that YAML file also acts as an infrastructure as code in our case So we actually using that YAML file to apply some infrastructure as code Technologies in our infrastructure after that we divided all our services in some small small group so that we can Migrate them together those services which were tightly coupled which were which are used to communicate frequently So that we can basically reduce our latency issues. So we divided all these services and smaller smaller groups We also made a dependency graph to facilitate Our migration so that we can we can have a visual type of thing where we can see That these services we already migrated and these all services we have to migrate We kept all our data stores in a replication mode so that we can basically avoid the Data corruption issues or data loss issues So we were keeping all our data stores in the sync mode Including my SQL aero spike and MongoDB and we might after that we migrated all tightly coupled services together to According to our business flows like we have three main flows seller flow Buyer flow and supply chain flow. So we migrated all those services according to these flows After that Planning is the most important key for every project But there are always some runtime gotchas that you get only at runtime like some runtime exceptions We get every time. So we also did some mistakes. There were some scenarios one of them Was when we were in the middle of our migration We used to put some security checks or some security groups in in our old cloud on the ELB To serve the traffic to the actual servers and there were some services which used to communicate Frequently and one of them was running in the public cloud and when we were migrating our first service So traffic was supposed to go on public cloud according to our code that we were written because The service the old service was running in the old cloud But we forgot to place the security groups there and there was no traffic coming on the public cloud service Where the actual service service was running So we had to check all these checks before migrating before serving the actual traffic in the production environment The second one was we used to keep our databases in sync so that no data corrupt things happens But there were some scenarios where our data has been had been corrupted and we were getting some serious exceptions in our application So in that case we took the data and we dump that data with we dump that data to the new servers But there was kind of pain for doing so. So we had to make sure that all our database servers are in sync Correctly so that no data loss or no data corruption thing happens after that Another scenario was we when we were migrated one of when we were migrating one of our service the lost machines were not able to handle the load So we had to extend our infrastructure at runtime. So to avoid these type of risks risks We had to ensure that the number of machines needed for that application to handle the node So we had to ensure a layer before starting this migration and to avoid all these issues Monitoring of your infrastructure is one of the most important thing you had to do We were doing monitoring of every individual system as as well as our applications in our case order count was the critical thing That we had to ensure that our order count does not drop during our migration activities So we use Isingha and EFK stack for our monitoring purposes and we used to store Views to monitor our infrastructure every time when we used to do any migration activity There are some rollback strategies we also figure out So our traffic redirection we actually figure out that in case of failure We had to ensure that within milliseconds We can redirect the traffic to the old servers to the old cloud so that we cannot have The exceptions in our new cloud after that database re-sync It is also the most important thing when we used to migrate our service What we do is basically we before like handling it to QA we used to keep that database in a read mode so that No, so that we actually We can we could actually avoid some exceptions. We actually we were facing them So we also set some cooling periods is that Like we don't shut down our old service that was like that was around a month Until we found that services running flawlessly in our new data center These are the technical tools we were using for our application for our Data center migration our YAML file, which is which also acts as a infrastructure as code in our case We were then using dendrite for our service discovery. We wrote nerve and synapse code for this We use salt stack and chef as an Orchestration tools we use kit and Jenkins for our CICD pipelines and we also wrote some automation scripts Archie sorry to be rude, but we have to stop here. We are over your time. Sure Can you just recap quickly and you can talk to the people in the booth? Yeah, so actually I had only 15 minutes to like talk about our migration, but there are a lot to know about that So those who actually wants to know more about our migration. I'm around you can talk to me anytime I just want to share some key learnings. It just it just takes five minutes or say two minutes okay Like a plan for you my plan about your migration understand your services your infrastructure better before starting your migration We also created a live dependency graph to see that don't migrate as it is fix the problems that you you have and after that like migrate that application to the new cloud and Ensured that you have you said the strict naming conventions and make sure that all launched services are registered with all your orchestration tools and Automate and monitor everything before starting such a big project on such a big scale Thanks, Ruchi. We like to stop here Ruchi and a colleague Aditya are also going to be around tomorrow at the Icinga camp if you're planning to attend They will be talking about how they've used Icinga for monitoring and of course Ruchi is also going to be around at the conference So we can take more questions Okay, thank you While the next speaker setting up can I ask the hall volunteers to distribute the feedback forms? Please do fill out the feedback forms the feedback forms that are being distributed are for the dev contract There are also separate forms being distributed for the root contract. So please pick them up And and do give us your feedback While Baiju is setting up. I just want to say a quick. Thank you to the dev conf editors. Can I have? Nigel and Ramki up here very quickly Run run run Nigel is a long-distance runner. This is too short a Quick note on how we curated talks for dev conf as a policy At root conf and at all has the conferences even our talks that are in a sponsor track or in a In an open track are all curated and rehearsed in the same manner There is no distinction made because the idea is to ensure that the audience gets maximum insight I'd like to thank Ramki Nigel and our absentee editor Kushal Das who's in the US somewhere right now For giving up their time from regular day jobs and rehearsing with the speakers We did the editorial process over a period of three weeks Where our goal was to ensure that we cover as much depth and and variety across the platform technology spectrum So thank you very much to Nigel and Ramki. They were available all the time to do rehearsals We've rehearsed at least twice and sometimes thrice with every speaker in addition to giving a lot more feedback So we'd appreciate if you feel the feet if you fill the feedback form Tell us honestly what you felt and what we could do better next time a big round of applause for both of them Okay, since they don't have anything to say we let by to take the stage. Hello What I will try it Hey guys Our next speaker by Joe has been a long long time open source contributor in the in the in this In a whole of India. I think he has been leading the python community for a long time He runs the Bangalore Golang community. He does a lot of workshops around it and He has contributed to various things including to including the SMC a bit that is Satyantra Manilalam computing thing project and in this talk of his he's going to show how fabricate makes your Makes developers life easier with this end to end development platform. Go ahead. Bye. Thank you Good afternoon all so fabricate an end-to-end development platform That's a topic for today. My name is by Joe Before starting I just want to ask some question like how many of you actually added a new issue or Comment on some issue maybe in GitHub or Jaira or bugzilla somewhere through your mobile or laptop or things like that I mean on lab while you are traveling. Maybe in bus or train something like that. Just raise your hand Yeah, so you don't that through your mobile right now imagine Something going beyond that you are doing an end-to-end development your Some maybe in an internet cafe or a key or something like that. So this is There's just a clue what I'm going to speak about okay an end-to-end development experience. So Before going to that few things about me I'm a senior software engineer at Red Hat I be as Pradeep to mention I have contributed to so many soft free and open source software in the past for last almost 18 years now and Yeah, so let me not take much time there But you might be here this kind of terms in recent articles or websites or talks like this, right? Now every company is a software company Software is eating the world. You don't have to be a software company to think like one Every business will be a software business. So this kind of course you are hearing a lot This all giving an indication that now everyone needs some kind of software development in their organization Along with that you also hear things like micro services Containers Moby Rook rocket for Docker and things like container orchestration Kubernetes things like that right and pass open shift now We know that software is then becomes such an essential part of every company and this kind of complexity for developing application is also becoming very Difficult with all these things innovations everything. So this is where fabricate is coming to picture Imagine so what's happening here is that it fabricate is solving the problem of Developing an end-end application with this all large complex micro services all all this kind of deployment strategies in everything you do it in a Setting up everything from a web browser. Okay, so you can do this From your ideation phase going beyond production things like Moby monitoring and getting feedback things all things You can do now. So basically As I mentioned before the containers or you know things like that is something now people are talking about cloud native application So you can do cloud native creating cloud native applications and micro services through fabricate In simple terms, I can tell you that you can do planning building testing Deploying your application through by pipelines through fabricate all in your browser So we can run and manage your application from the through continuous improvement through this platform Okay, so this is how a typical schematic of this platform underneath technology that we are using here is Open shift, which is the past Platform and then we have built our own platform on top of that and then there are other components like some authentication layer and then planner Which is basically, you know, this planning part and code base where you can manage it and then and full-fledged ID at pipeline Which is built on Jenkins a code generator. This all are provided by You know this fabricate So the first part is the platform, which is the foundation for the system. So basically you can You know, this is the providing all the restful API Which is built-in a go programming language, which is I'm passionate about and then it's front end is written angular for and pattern fly which is again built-on put built-on bootstrap things like that and So this is how a typical as soon as you log into the system This is how you the you know this main dashboard looks like now. It's going to change soon Then the next component is an SSO where you can if it provides a identity and access management system Which is again built on top of another open source comment called key clock and this is a you can check it out from the key clock dot org website and The planner component is basically you can imagine like a project tracker or an issue tracker Where it can again, it's provides a restful API With the front end again back end is written in the go lang and front end again the same technology which I mentioned angular for and pattern fly bootstrap, etc So this is how a typical the planner looks like where you can do this idea your discussion your to-do list or items work items The next thing is the code editor. So you need an idea right or a text editor So basically we are using eclipse a as the code editor where you can through the browser You can edit and this project is part of the eclipse foundation. You can go to the eclipse dot org slash j That's what we are using here the next comb. So this is how a typical Editor looks like this is a browser actually May not looks like a browser, but this is a browser and of course I moved that part So they were it's a full-fledged idea where you can do all your development with a shell everything and you can check in Your code through the system itself Now the next part is a build pipeline morning. You might be here to talk from my colleague Washek. He was mentioned about the pipeline. So this is also built on top of this I should open ship pipeline which is fabricate. So Underneath technology is nothing but Jenkins again. It's an open source project, which you can check out from Jenkins.io I hope everyone know about this project and This is how the U.M. Looks like from our fabricate platform Yeah Where you can approve your build promote your build all the things you can do through these things, okay? They and now I'll show you a small demo how this ender system looks like Okay, right now actually I'm going to show you the demo which is a hosted version of this fabricate That is openshift.io, which is a source offering from Red Hat But soon you can be able to set it up locally also later Okay, so as soon as you go to this by the way the site is called openshift.io that is where it is hosted and You'll be able to log in and then right now This is and you'll be in waiting list if you try it out But later once you log in you'll be able to you'll get all this which is that I just mentioned now So you can do this create something called a space where you can do all your activities So for every organization or individual they need to create a space and this is the you know The planner where you can do drag order issues Do all the adding what we're calling it as work items. You can prioritize things and then it also has some kind of Kanban board kind of things Maybe you are seeing something like a trailer where you can you know move things around all the things you can do through this fabricate planner and then it provides a way to kickstart your project Basically, you can start a new project from scratch Which built on some templates? Here right now we are supporting this five open shift. I owe it supports few things based on Java like Vertex or spring board And then while fly things like that, but soon we are going to support more other maybe Python not just other applications Also, and once you Okay, with your selecting your stack, you can click on finish or you can configure further And then that's going to take you to the your project will be pushed into the GitHub Right now this supports get up. So that means your all your code is going to the GitHub that you are whatever it is displayed here and Afterwards you can edit you can click on this open the editor So that's going to check out this code from GitHub and you got a workspace there And now we can click on that it will take you to editor your IDE Which is nothing but eclipse day and you can make all your changes code changes modify the code And then finally commit your code. Yeah, it's going right now It's doing and then and here just one more point here is that just we also have a Recommendation engine which actually pointing out some kind of vulnerabilities or issues in your system that Gives you hints so you can modify your code accordingly in this case It shows something like there's a vulnerability for this particular version of this package So you can modify that instantly and make your changes and push the code into GitHub Yeah, it's pushing the code into get up once you push your code That's going to trigger the build pipeline. So here it can say immediately. It will start here the pipeline That pipeline is going to take you can based on your strategy It may push into your dev staging environment or maybe directly to push or you can choose to make a manual trigger process An approval process. Yeah, so that's a brief demo about this fabricate. So So what what I tried to show in this demo is that basically fabricate Is providing a streamlining the software development from ideation to production everything through the browser Okay, so this all are opens off the softwares and we welcome contributions. That was the intent of this talk We are in the very early stage of development. We are still actively developing this project All these things you can download from I mean get it from the fabricate IO and the source code Everything is there in this Github repositories fabricate IO and other few other repositories there So we welcome contributions And if you want a real-life system, I can maybe I think I have few more minutes I can show this really running there in my browser. I hope my internal connection is working So this is open-shift.io You can see this report. Maybe I can click on something. Maybe stack reports. Yeah, it looks like it's coming up Okay, that's the stack report. It says that recommended you some package and if I want to create something called a new I mean, maybe this is some issue in your what the I remember that analytics That analytics and you know the recommendations in which pointed out that there is something wrong with this package So you need to fix that maybe vulnerability or something like that So I can create a work item directly from here by clicking on this button Okay, yeah, it looks like it's created a new work item issue So maybe I can click on that and later work on that issues Okay, and this is the eclipse day Editor where I can I just already open that otherwise I need to navigate and open I can make modifications This is how it is. So maybe I'll stop right now If there is time, maybe I can take two questions or not two questions Yeah, there's a time for at least one question J is a full-fledged IDE which support almost every programming language that you can imagine So the real question that I have is typically if I'm using I'll give you an example So typically if I'm running a Django application and I use PIDF I can set breakpoints and hold a lot of things with the run server and things like that, right? Yeah, so do I have that kind of facilities as part of chase the question because I've not looked at you Yes And is there a possibility where I can pick and plug in another idea like a sublime or something like that instead of chase Yeah, definitely. By the way, she has a feature to do SSH in the system So you don't necessarily use to you need to use they you can do your coding edit code editing locally And that's still you can use the rest of the power platform Okay, hi, I have a question about the recommendations that were appearing in the yeah the code editor Yes, they were only for Java, but do they exist for Python other languages? Yeah, we are working on this Python also By the way, if you want to check out this every this analytics you can go to fabricate Analytics, this is the rough organization. It's all written in Python spelling maybe mistake fabricate Sorry fabricate iPhone analytics, that's where you can get all your code and it supports right now Java, but it supports not just Python other languages also Yeah, this is the organization where you can see all the code for the analytics platform Oh Thank you by Joe that's you can talk to by Joe about fabricate.io and openshift.io at the care that had both Cool, thank you. Thank you. Thank you by your great presentation and good demos. Our next speaker is Suraj Deshmukh His hat says that he's from Red Hat it seems Anyway So Suraj Suraj Is open source enthusiast? He is basically a hardened and very hardcore Kubernetes Advocate he I think speaks at Kubernetes meetups every day. I think so because I never see him at work Anyway jokes apart He really knows his Kubernetes and he is on a mission He and his team is on a mission to make Kubernetes simple for developers and this is what is the talk going to be Go ahead Suraj. This talk is about making Kubernetes simple for developers and Yeah, this is something about me. I work at Red Hat as an interpreter for introducing me and Also contribute to compose open compose Kubernetes and sometimes open chip and yeah cool So I just want to set a stage and I want to show off hands. How many of you have used Docker? Okay, Docker compose And Kubernetes know at least basics of Kubernetes So yeah, then like you'll find out like what what all things are there so so this is a story of a developer A normal developer who does like application development for mobile back-ends Yeah, so like my manager goes to a conference and finds out that Microservice is a hot thing and then he comes back and like it shows like we need to do everything in containers There because that's the microservices and containers. That's that's that's the thing we're gonna do so so I start with like I start with this application which is So for for for starting out with containers, I do Google search and the first thing that comes up is Docker because That's the most used container technology right now. And yeah, so I figured out Docker installation I do it locally and it's it's working. So what do I need for my for my application? I need a database, right? So so that so I run local commands Docker run and Docker and postgres because that's I could I could I could locally do the postgres client and so it's it's working fine, right? so and I start with local development like the local Python virtual ENV is there and The code is working fine. My I can talk to postgres everything is going fine, but so yeah So that I mean going going ahead. That's that's not how you do containers containers is packaging mechanism That's that's that's how you will distribute your application. So I create a Docker file and So that everything all the code that I write will be into a container in itself And so so the code repo is more or less like a Docker file the application and the whatever that application needs so So every time I write a code I have to do Docker build and then Docker and then curl on to some some some URL So that so that I can I can know whether it's it's running fine or not And so so after that there is one more service. So the application is growing. I'm writing more code Now I have two Docker files like Docker file dot the application that I had and the new one Which is Docker file that API server. So so it's it's growing, right? Now I have to run like six commands for one for starting the API server one for starting my application one for postgres then I have to Curl on that and then fine everything is working fine. So so yeah So this is like really complicated writing six commands every time. So that's where Docker compose comes in it It it helps you write everything into one file like this and then all I all I need to do is do a So so once I had the Docker compose file like this where I have specified what ports are there what? Volume mounts are going to do what the ports are what Docker files everyone is going to consume so I so now I have a door to docker files for each service and Docker compose file and So this is all I need to do Docker compose up and curl on that thing. So so yeah, everything is going fine I'm happy like things are working But like how long this is this is good for local development now I want to take it to a production like system. So what do I need to do for it? like I Need I need something that will take care of all of this containers running all the time. So so yeah more more Google Search and it leads to me. It leads to something called as content container orchestrators But like what is container orchestrator like it it will so it says like it will take care of your containers It will keep them running all the time But like yeah, it's it's so so much thing so much confusion like you have Kubernetes Then there is open ship from Red Hat or something and then there is Docker swarm There is mess us marathon and like all these things right like I mean I am a developer. What all things I should learn But then yeah, that's where Since I have I'm already using Docker compose I see that like Docker swarm is first thing I could use because I could directly I could directly give the Docker compose file to the swarm and then maybe it will run But also reading more I find out like I'm doing bills locally But since doing you're doing bills locally the question might not find how to do the bills I see that I have to change some things in the fire. Docker compose file. I cannot use it as it is I'll have to like Remove the build and make sure I have a container registry where I'm pushing it and and things like that So also then yeah reading more fine I find out like Kubernetes is more robust like it has more community And and it comes with the Google's whole experience of running containers for more than a decade now and like a lot of other companies are putting their efforts into it and Yeah, as you can see it is like it's starred and all that thing and it has huge community behind it So yeah, why not like I try I go and find out why not explore this technology because It's exciting right like a lot of contributors are there a lot of Everyone is talking about it everywhere everywhere you go So, yeah, what is this Kubernetes? So this is what you get when you see a Kubernetes one-on-one talk like you are explained that there are so many Components in it. There is a master. There is there are nodes and then master and bunch of servers Well nodes those are serving your containers will also be a some kind of server and cubelet and so so it's it's It's a complex big system and and then Kubernetes in itself comes with a lot of new things like there is parts There are services then ingress and replica sets and there was a replication controller and now you use deployments and stuff like that So there's there's a lot of things going on at the same time like what all things should I look at like I Want to get my application running? So so yeah, and I already had this thing so like what do I do with this because and And how do I get distinct on Kubernetes on a production grade system like that and this is what I'm doing I'm banging my head because There is Kubernetes and it's so mind-boggling a lot of things are going on right now And that's where compose comes in I do like so a few Google searches, and then I find out like Docker compose is For you if you are developer Docker compose and it can help you get to the Kubernetes because it creates some It does some conversion magical conversion, and I can feed it to Kubernetes and then it will bring up my application somehow So okay, so I find this thing interesting because I'm not doing much and I don't need to learn much like writing all the configurations and Existing thing I had I could like feed it to Kubernetes. So yeah, so I had a sample application so this is a So I'm running a local cluster here just to show the demo and then it's a Here I create a wordpress application a project in the in the in the cluster and then So yeah, this is the Docker compose file. It's it's just two services a wordpress and a MariaDB and then it has all the All the all the ENVs and the ports and the volumes and Yeah, so yeah, I created I created directly and that's where I'm gonna convert those artifacts or create those artifacts out of and Yeah, so it basically generates all the all the artifacts that the Kubernetes needs like okay There is service there is deployment and then there is there is a persistent volume claim So that I could like the database could have a persistent storage for it and Yeah, so I just feed it to the cluster and so basically this is running in a in a in an open shift cluster, which is basically more or less a Kubernetes thing and Yeah, so I see that the I See that the apps are coming up and like they start they start listening on the service and Yeah, opening these things It's up. So Yeah, that's the demo So yeah compose is quite helpful because it's generating configurations. I don't need to do much I can still work with my doc compose file the defaults it has taken are quite okay because It it created a PVC which was like 100 MB So because and yeah, you might not know you might not want 100 MB for the as a storage for your database And then like yeah, it's it's good But it's working for me as long as I have some generic use cases But but my application is not just like two or three services. It's like grown into many things now so that's that's where like compose starts like getting going bad because It's it assumes a lot of things like you cannot have service types Define and then you cannot do covenants jobs or the secrets and the config maps things Which come in covenants are hard to get from from docker compose and then the volumes info as I said It's just 100 MB So you let go manually and then edit that for the volume size and then the liveness and the readiness things and then How do I put multiple containers into single pod because that's a big thing because they the whole the entire Unit the the basic unit in covenants world is part and it could run like multiple containers and Like yeah, generating generating templates directly. So all these things. So so like what what composed it is to retrofit the lot of things that were missing in the cover docker compose and Want and we want and wanted it to be generated in the Into the Kubernetes. They added few things like they added their custom labels to expand the docker compose Pack itself, but yeah, so it's it's okay. It might not break your application But it's just that I'll have to add like more labels to in my application So yeah, as I said job secret secrets and volumes and then ingress routes like that How do you how do you specify ingress of routes doctor compose was never created for a clustered application It was mainly for the developer who is doing local development on single machine where they could like directly host mount the local directory into the container directly port map the Container sport onto the machine and things like that with with cluster everything changes, right? It's not one machine where one container is running the container could run anywhere It could you don't you never know. So where do you hit it? So things like that like star started creating a gap in there and Yeah, liveness probe. I said then service types and so yeah, how do I get around it? I? Just generate it once and then start editing those configurations like yeah Create once and then start minting these things and then dump the docker compose file So so like it's it's not it's not that easy to do. So So yeah, so taking a supporting on the docker as a compose developers hat on so As a compose developer when we are adding these features So it was like it was tempting to extend the docker compose a spec in itself, but but as you see like It's not a good thing to do because it might break their application locally or might not add value to the existing thing So so we need something that's that's not docker compose and is native to the Kubernetes world or very what Kubernetes can Can understand directly. So yeah, what's what's next? That's where open compose comes in like open compose is a spec which is as easy as docker compose and And and covenant and native to Kubernetes. So all those gaps we found in docker compose While mapping it to Kubernetes. So that's that that's what we're trying to fill out like things like defining multiple containers in a pod defining Defining something that should be exposed outside in the form of ingress then Defining volume how much volume you want it to be and and things like that. So so so and and and if you see the Kubernetes artifacts, they are more of they're more of like the Like like how you how you want to deploy the application? So so we don't want developers to do that we want developers to just define the application And that's it not how to define the application. So so yeah, doc open compose more of this looks like this it has like Services and then we can define containers in multi in a single service and then env's and then you can do volume mounts and stuff like that So it's we have tried to get the similar user user workflow with the open compose as well like we did with compose and Yeah, let's see a demo of it So, yeah, I'm creating a new project again. So this is also running in a local Mini-shift cluster. It's a locally running open-shift and That's so that's the that's the file. That's the same WordPress container, but down below you can see there is volumes There I've defined size and all which I can modify which was not not possible you when using doc compose and Yeah, similarly, I'm creating configurations And yeah, and then converting it and then feeding it to the Pudding it to the Kubernetes cluster similarly. So the same workflow and the same application So, yeah, it's it's it's coming up in this you can see the You can see my sequel has come up and Yeah, so here here I'm exposing the web web into to the outside cluster. So OpenShift has something called as OpenShift has something called as routes So using it similar to ingress in Kubernetes world and yeah, that's how that's how you expose it to the outside world And yeah, WordPress has come up And I can access it. Yeah, so This this thing that we are doing it's out there in the community We are doing we have proposed this thing to the people to the community on on Kubernetes channels They know this efforts is going on. They are also giving their feedback on this And yeah, so this is a general chart of comparison between Docker compose And the compose how it's converting and open compose of how the Kubernetes features are mapped there so yeah, these are some references and Yeah, I can take questions Hi So I am in a sort of a unique condition that I have quite a few bare metal servers with me And I want to use Kubernetes What would be your suggestion from having those raw servers and getting the cluster So, how are your applications defined right now there are they are on the bare metal These are extra servers for me. I want to add them to the whole cluster the rest of the cluster is VMware But I'm willing to move out of it. So for doing that, I would recommend like I mean You you deploy it and write the configuration So compose might help you for generating those configurations as a first first hand because writing those configurations by hand for the first time or might be Very erosive and you might miss out on a lot of things, but once you generate those configurations out of Compose you can you have to add things like resource quotas and secrets and all those things by hand So just to clarify your problem is running Kubernetes on the nodes that you have or running the application on Kubernetes that you have because his problem is solving running the application on Kubernetes Both, okay. So there are two different problems just to be clear One we have question time for one more question then have some announcements and then we can obviously run for lunch One last Nice talk, Suresh. Can we run Kubernetes cluster on power PC architecture? You'll find some IBM people around in the in the conference as well. They are doing a lot of work for making this happen Would it be easy for to find the required boundaries for power PC for Kubernetes about binary? I think the the distributions the respective distribution should create those Rpms or devs or whatever and I think that should that should be possible Okay, thank you. And so one more announcement we do a Kubernetes meetup on 20th Maybe are doing a one-on-one hands-on on Kubernetes. So if you are new or you know, someone who wants to learn RSVP at bit.ly slash K it is one one Yeah Cool. Thanks, Suresh Just a reminder, please fill up the feedback forms and hand it over at the help desk, which is near the same the internet Booth and we should meet back here. We have lots of storage talks and IOT talks Right after the launch. So see you soon. Thank you The next speaker is Raghavendra. He is a Gloucester there for the last four years He's passionate in C development and file system storage He currently works in a position where he helps with developing file systems for file system and storage for the container Ecosystem and let's hear what he has to say. Hi, everyone Yeah, I got the post lens slot and the challenge is accepted. I Have a spider-man slide coming up. So who those who are sleeping will still get up and it's a 15 minute talk Who am I? I'm Raghavendra Thalur. As he does that I work for Redat. I Have been working with Gloucester. It's a software defined storage for about four years now and for the past few months I've been working on containerizing Gloucester and also making clusters the de facto storage system for containers The talk today is about This Basically how not to do this So when I heard the term persistent storage for the first time I was curious and a little confused Isn't storage always persistent? Why do we need to call it as persistent storage? so in container world in the container platform that's required and the reason is if your ports are your containers Then your nodes from Kubernetes or any other container platform are different ships When you port a container from one ship to another you have to ensure that the storage is also ported Otherwise you risk putting it down into the water. So we'll tell you how not to do this and What's our solution for it? Why is this an interesting problem to solve? So survey which was conducted recently we asked admins what's the most challenging aspect of setting up a container platform and 25% of them said that it's persistent storage. We don't know how to get this right in container platforms Another survey said that it's the complexity of setting up position storage, which is Most compared to even the scale or the cost Problems of persistent storage The same survey we also asked. Okay, so it's a problem and So, what are you trying to do? How are you trying to solve this problem of persistent storage? They said, okay, we considered everything But I think we will go with the existing sand or the NASA plan says we have At least a 33% said we will try to use some software different storage But a very low number only 18% replied saying that we will try to have something which is hyper-conveyed With my container platform something which is embedded into my content platform And this is a very low number in my opinion. A good system is a system which is Complete by itself. It should be coherent And if you're trying to glue in two different systems to get a solution It's probably not the right way to do it and that's what this solution will try to solve for those of you Okay, so persistent storage as you looked at the slides Why is it different? Why is storage? That much of a problem when it comes to container platform because storage is not a commodity like CPU cycles and memory are So in Cuban it is it's already solved You will get the same number of CPU cycles and memory when the pods move from one container one node to another that's not the case with storage and Because it is not yet an integral part of container platform You also have the problem that it requires some extra management effort sometimes even a extra storage expertise What happens in this case is that you have two admins one admin who is maintaining the Kubernetes cluster or the container platform cluster and Other admin who is trying to do only the storage part And as we saw in the previous slide already most of the times storage is kept external to the platform This is again a problem So before I go into the solution I will have to recap through how you do persistent storage in Kubernetes. It's a very simple thing when you want persistent storage for your application you Claim for it. It's called persistent volume claim your right aspect Which is I need this much of space with this access mode and it should come from this storage class This is a request that you send to Kubernetes It will satisfy the request based on the inputs that you gave once the request is satisfied It's called that the PV is now bound once the PV is bound You use it in your application part. This is a sample application for spec for engine X all we need to do here is Mention what is the name of the PV claim that you want to be bound with this and where you want it to be mounted And that's it. So this is how persistent volume works in Kubernetes. It's similar for other container platforms too Once you know how to use it that is going to the solution Let's take example of a typical Kubernetes cluster that you have with three nodes. I have one master three node setup Now depending on whether it's a private setup or a public cloud setup You will surely have some local disk on the nodes or you might have some block instances on each of the nodes So these green shapes circles are the local disk or the EBS instances Once you have this You run a cluster pod on each one of them You create a cluster cluster out of it and now the cluster cluster will aggregate all the storage from all these nodes And will make it look like a single storage system We also deploy a AKT pod here, which is the intelligent cluster volume manager That can be run on any of these nodes and it is now made aware of the cluster cluster Lastly, we have a cluster provisioner. It comes with Kubernetes. It will be running on the master So whenever a PV claim comes in it is given to this provisioner The provisioner knows how to talk to a AKT pod and satisfy a PV claim This is all you need to get things working to use it say you deploy engine export somewhere here The engine export will now talk to a cluster volume plug-in That knows how to talk to the cluster cluster and the data is available everywhere If your engine export moves from node 3 to node 1, it will get the same data there too So this is the solution. Yeah, I know it looks a lot many parts with so many connections And we have solved that too So let's recap the components before I show you how to set up So from Kubernetes side, we have a cluster provisioner which knows how to satisfy a PV claim We have a mount plug-in which knows how to provide that storage to a application pod Gluster is the storage component. It is the manager for all your storage Which can integrate all the disks. AKT is that orange pod that we saw That's intelligent enough to translate a PV claim coming with a storage requirement into a cluster volume And lastly we have GK deploy. This is the tool which will simplify the process of setting this up You don't need to run so many pods separately. All you need to do is run this tool with a proper input And it will do all the setup For those of you who are not aware of Gluster, Gluster is a storage system. It is scalable But this is a better definition in my opinion So what we do is we aggregate local file systems from where from multiple machines We aggregate them, we call that cluster of machine as cluster storage pool And we let you aggregate in multiple configurations If you have redundancy requirements, we satisfy that If you have resource constraints, you don't want to waste too much of storage space Then you can use erasure coding So the configurations are many and you can use it as per your requirements There are some added features like snapshot You can take snapshot of a volume You can also use encryption on the wire And we have bitrot and protander stuff And it's not only our Gluster native protocol which you can use to access it The well-known other protocols like NFS and SMBR supported We also support Swift Okay, so now let us move to how to setup As I told you it's a single command to run You just give an input saying how your cluster is made up of the container platform The setup is similar to what I showed in the diagram there Single master three-node setup There are no ports running as of now and this is the input file The input file is very simple You just need to tell me which are your Kubernetes nodes What are the IPs and the hostname And what devices on those nodes are free to be used by Gluster So in this sample file I have three nodes each of them having three disk devices We also have a concept of zone This is similar to the failure domain of Amazon So if you have VMs running in each zone and which zone Then you can specify that here And that will be taken as an input when we create a volume We will try to make sure that the replicas are spread over different zones So that you have better availability You can reuse the same concept for private clouds By mapping your zones to the power racks that you have Okay, so that's the input file You give that input file to this tool as input And the option minus D is used to specify whether you want us to deploy Gluster parts or not If you have already deployed Gluster to some other means You can skip this part You will have to open the right number of ports It varies from each setup So that you will have to do manually before you run this tool Okay, so we now started deploying Gluster parts We labeled the right number of nodes which are meant to be running Gluster Gluster is here deployed as a demon set because it is using disks It cannot migrate from node to node We have to make sure that the parts always start on the same node Now we told AKT what the devices are AKT is deployed and it's now running So you can see three Gluster parts up and running You can see one AKT board up and running The video was here played at 2x speed But it's only 48 seconds So the whole deployment happened in less than two minutes And this is all you need to do to have a hyperconvert storage Up and running in your Kubernetes cluster That's it So there is one small task that admin has to do Before he can ask the app developers to start using it Which is common to any other storage system in Kubernetes You have to define a storage class So the cluster cluster is up and ready AKT is up and ready You have to now tell Kubernetes this is my Kubernetes cluster This is my Gluster cluster And you have to connect them As we saw the connector is called provisioner That's then using a storage class We create a new storage class here We name it as GlusterFS We use the provisioner GlusterFS We name it as mySC And we say the provisioner needs to talk to AKT at this URL The URL here is the only important thing That you need to make note of when you run the previous tool The tool would have given you a URL You have to use this to create a storage class That's it So here the admin's work ends completely Now he is free to relax He or she is free to relax No more work from the admin You have to just let your app developers know What's the name of the storage class that you have created From an app developer perspective If he or she knows the concept that I explained About PVClaim and PV from Kubernetes They will be able to use it They need to know nothing about Gluster here So this is my PVClaim I name it as myPVC And I refer back to the storage class That the admin has created by mentioning it here I want to use it from mySC I want 2GB of space And the access mode will be read write menu As I told you when a PV claim is satisfied It is said to be bound So now we have it bound here To show you that it really used Gluster in the back end What we can do is we can use AKT CLI to list the volumes So the volume had come up This is a typical application part that you have Here all you need to do is mention the PV claim That you used to create your volume And you say where it should be mounted Let us create the application part And see if it really uses Gluster in the back end So the application part my app is running now And it has no data in industrial test Because we gave it a empty volume from Gluster What we do is we write hello world into it And see if it shows up when we do a curl There it is And this still does not prove that it is coming from Gluster So let us do all the investigation what is required So what we do is we now look into the mounts Of this application part We see that fuse dot Gluster first mount is here So the Gluster first volume is found inside the application part We can also verify from the Gluster part here We go into the Gluster part And look into one of the brick directories A brick is a unit which makes up volumes So we go into one of the bricks and see what it has In index dot html It has hello world So this is how it works I have not told you how we did this But this is sufficient for admin or a DevOps guide To set up a Gluster Gluster hyperconverse with Kubernetes In less than a few hours With all the investigation you need to do And it should be easy for any application developer To start using it We have many more features coming up We have file support already This is what I showed you as a file support We have block and object store coming in soon In the next release We have more volume management features up We have also reduced memory footprint Because obviously if you are running storage parts On the same nodes where you are running application parts You want as much money free for the application parts So we have done a lot of work here to reduce the memory footprint Depending on how many bricks you are going to run Or how many volumes you have created You can get up to 5x benefit There were suggestions for auto discovery of nodes Because many of you might be using the auto scaling feature From AWS So there are plans of using the node labels To automatically take in the nodes for Gluster Gluster And we are working with Kubernetes For the API for snapshot Once that is ready we will also have it in That's all I had from the talk Any questions? When a new node is added to Kubernetes If application part runs there It can always use cluster automatically We don't need to do anything Admin does not need to do anything If you want storage from that new node To be used in the Gluster cluster As of today It's one single command that you need to run A KT CLI add node And you give the IP of the node And it's inherited With auto discovery it will be made automatic You can pre-define what label we should be looking at And based on that we will automatically take in the nodes Whenever it's joined Damon said because Gluster ports Were running on all the 3 nodes here So the Gluster port which was started on node 1 Even when node 1 is rebooted Should not move over to node 2 and node 3 Yes the Damon said need to be modified now So the Damon said uses labels to identify Which nodes the Damon said should be running on So all we need to do is when node 4 is added We have to label it appropriately And Kubernetes does the rest of the work If a node is automatically labeled It should start the It will We won't label it automatically You have to tell us to label it Or you have to label it yourself Once you do that it will automatically Start the port there Yeah hi So I had a question I have two questions right So one of them is regarding the performance impact Of snapshots in relation to Gluster FS Because I am coming from a background of ZFS In which snapshots are like instant Now in this case since it is spread across the network What is the exact Yeah so Gluster uses LVM snapshots In the backend to take a snapshot The barrier is created for a very small amount of time And we take a snapshot in time It has no performance impact After the snapshot is taken When the snapshot is taken there will be a My point was like for example If I am running it in a production environment And I have access to data I do not want at any moment in time That my data is not accessible by my application So in terms of snapshot What happens at that moment Will I still have access to my data all the time Yes you will have The second question was regarding Partition tolerance in case of ZFS In Gluster FS Suppose one of my nodes Goes out, crashed EBS volume corrupted whatever What is protected What protects it You can always configure it in various ways But the demo I did today with the default setting What we do is we create in replica 3 mode And with a quorum of more than 50% So what that means is if I have 3 node cluster If one of the nodes goes down You can still do read and write using the other 2 nodes And Guys just take this Outside Yeah Our next speaker is Amy When she isn't flying around the world She is the cluster community lead She has worked as a Drupal Community lead in the past And today she is going to be talking about Open infrastructure And the church of the Shaven Yark God hello Real hot on the mic there Looks good, so I've got half an hour here To be able to talk about things that are kind of Related to Gluster But are not necessarily anything about how the code Actually works So this talk is making more open Creating infrastructure for your open source project As introduced I am the Gluster community lead This means that A great many things about The project in and of itself are my problem The code not so much How things actually work You'll see a lot of talks this afternoon around Versus storage and containers You just saw and we've got plenty of people Up at the red hat booth to be able to talk about that My focus as far as The project goes is to be able to feed And water Gluster Everything that's not code becomes my problem In the past I've given this talk With a continuous Integration architect So you'll see bits of that come in through here And previously this talk There's been an argument between Both me and the continuous Integration architect Who may or may not chime in We'll see what happens And the focus around this is really How can you create more people Coming in and working on your project So I'm assuming That most of the room is developers Some developers, some ops Okay some back and forth But most of you all work in open source Yeah With Gluster As a project I'm delighted to be here in Bangalore To talk with basically a hometown crowd About one of I think is probably the bigger Success stories overall A storage project That became part of the red hat family And has been continually fed and watered And changed and grown Over the past ten years But a lot of people I think have worked on Gluster Anyone That's currently working on Gluster.org On the upstream stuff in the audience Come on you're out there, I know you're out there Alright for enough Work that we're doing around this is Taking an open source project And being able to expand it It means being able to bring in More people maybe not through code As a developer but if you're working on it now And there's something that you want to be able To come back and contribute Well you want a way to be able To come in and give that in a way that's going to be safe and sane In the work that we're doing As far as between the community Integration team and what I'm working on I want to be able to say that your patches can come in And they will be tested And they will be sane and everything makes sense When we say that we have feature works We want to be able to make sure that we have backed it up with testing We want to make sure that we have gone through All of the proper things as part of a project To be able to make sure That all of this is working So that's the angle that I'm talking about this from I want people to be able to Come in and consume Gluster as a project So the things that made me care about this Are making sure that When we actually have Real releases right now We're currently in the 311 release cycle And We want to make sure that everything that's coming in Is going to be accepted Is going to be things that we can actually work with I'll move on about this because realistically I think you can probably see why exactly You'd want to be able to care about your infrastructure Things that we've changed around this When you start All that you really care about Is just shipping software You want to make sure that it just comes out Everything just gets taken care of I don't really care as much about the nuances The idea about Being able to have ownership and access control Over who can put things where And what makes sense No, we don't do that when you start You start from a place of Everyone has access for everything This also means that you can blow away your entire infrastructure And someone pushes something wrong How many of you have taken down The production environment due to a mistake That shouldn't have come through from staging Okay, yeah, that's where you start You start from that place of I just want to get this out the door And I want to be able to have the 3AM code Just be something that I can ship That's actually a fine place To be able to start There's nothing wrong with that But you have to be very careful Because you'll end up in a lot of firefights On being able to undo those changes That you've made from that 3AM morning code And this has become Kind of a piece of structure That you want to be able to put in You want to be able to put in locks And you want to be able to put in access control Because what's the reason for a lock in the first place It's something That tells you That only certain people should have access to it So when you think about it like that You know who the keys should be Given to as part of Who should be able to push production Who should be able to come in and see those logs There's a circle of trust that happens As your project gets bigger and older And more effective This is in some ways a natural way Of being able to make this work So one of the pieces that we've moved towards And This may be shocking as far as A devops speed Is We now look at our infrastructure as code It should be as easy to contribute To the infrastructure as code It should also be just that hard As well We now do things like code review For infrastructure changes with And downtime windows We want to be able to know who owns all of this As well And we want to be able to make sure this clear path to Contribution makes sense Again this is an ideal world And the obligatory kitten slide Because this is usually a devops talk And it sounds a lot like devops because it is We want to be able to combine Both the work that your developers are doing With the real world knowledge that your operations Team has You want to be able to make sure that both of those pieces Are working together And when you describe your infrastructure as code It becomes a lot easier to be able to contribute As part of this People may be angry about this You may have to deal with a lot of people Who are frustrated and want to be able to have More say In And why This is actually a fine conversation to have The ways that we've been able to work with this in terms of The project that we're working on right now Is being able to set clear deadlines and timelines For what changes will occur What the likely impact Is going to be within the entire group A period where people can come in and basically have I hate this Here's why I like this, here's why What are the two pieces that we can change And then a timeline for being able to revisit that decision again It becomes iterative It becomes something that you have to work with But you also have to understand that sometimes In your project there just may be a man with an axe And the man with an axe may just want to be able To keep things exactly the way they are There is not a whole lot you can do About the man with the axe Other than the fact that you either have to be able To show them this is not the project for you Or be able to channel that Into a place where they actually Might be right The man with the axe might be telling you Don't change this This is vitally important for reasons that you don't know yet And I have more mistakes that I've actually made in here Things that we did fairly directly Access control changes We did a subset First, you start small You make it iterative You see what other things might be connected to that And if you want everybody to set up two factor authentication Which you definitely want to Start with the admins One other piece that we did As far as being able to make our infrastructure More like code is we converted Just one set of jobs To Ansible and Puppet Chef Choose, but start small One little thing and then see what changes from there See who comes up and says something Do as part of this Communicate Announce your work to people who might be affected Announce it in as many time zones As you might have people working on the project in Typically You probably won't have everyone working all in the same room So you'll have to be able to announce this Early Three weeks out is way too long For people to be able to remember what it was they were doing Three days might not be enough time You might know best for what your project Will need as far as timelines Other pieces that we've done Is implementing more Of a culture around I think this thing is broken I think this infra-change That happened over there might actually be a reason Around this and part of this Is also being able to establish a culture of that In the moment documentation If you don't have good documentation For where your system was previously And where all your legacy builds are This is a good way to be able to kind of get there On the other hand it's going to be painful Because you might not actually find All of the things at the same time Pieces that we've changed as well Is making sure that everybody knows What your infrastructure team is actually up to Um The change that we've made in the last three months Is that we have also let people know What we are not doing This particular piece of being able To tell everyone you can definitely come To us and we will help change things around this This has become so popular in our team That people have been leaning over And asking for changes that we can't possibly do In a timely fashion so What we've done instead is start publishing Our roadmaps and when people come And say I want to be able to change something Well tell me where this can fit Tell me what this ties into And what other pieces are In a large part we've started this by finding Out the pain points and easing them first The things that hurt the most As far as your project Because when you start with that You need to get the team on your side For us when we started This particular project we had Jared issues so that our review Process was not smooth We spent a lot of time fixing that first And then using that momentum Using the good will for the more difficult changes We've now started talking Towards other people as well And some of them are pretty easy to solve But their big thing is to change A lot of people's work within Their project So this is the part where I get to talk about My own mistakes that I've done as part of this We did a thing, typically And then you announce a change And you ask what you broke And I got annoyed one evening And decided that we needed to be able To get the application turned on for GitHub And I didn't tell anybody about it I just did it. Actually, no, it was Rackspace It was a Rackspace of servers Which is almost as important as code But I turned around And decided to be able to say Nope, we are turning on this for everybody And if you've not turned this in You're getting thrown out This was not the way that I should have done this So treat this within your own space Of what will work and what won't I then spent the next couple of days And documenting it better Your mileage may vary You are allowed to make these mistakes But understand that doing something Announcing the change and asking what you broke Is probably not the best way of going about this I did promise that we were going to talk about The Church of the Shaven Yak as part of this So a challenge around this Particular focus Of being able to move your infrastructure to code Is that you will never be done And everything that you touch will turn into A Shaven Yak Everything that turns out to be A longer process that you might never expect To be Something rewarding But also leading in towards other pieces That you're going to do The way that we've dealt with this internally Is that we've started building in More task lists We've built in more roadmaps We've built in more conversation about where it is That we're going longer term And more about this is that you actually want To know where your infrastructure is broken Or you can acknowledge what it is that you're actually doing What you're not doing And everything that you touch will probably Turn out to be this Yak Shave Consider this more of a discovery piece as well More or less in conclusion as I'm wrapping up here You'll never get everything right As far as this particular approach Is concerned For project infrastructure But your reaction and how you actually Work towards Things not being right, things being broken Will be critical If something fails You want to be able to look at what went wrong where Even if it's your mistake And you want to be able to own that Every time that you have to look at Something like, you know, you've pushed a dev You've pushed from dead to prod at 3am And everything is broken Try to write down what it was that actually went wrong Even if it was, that guy didn't get enough sleep And he did something stupid We've now started thinking about How we can do this for All in the project as we look towards Being able to do more releases And I can certainly talk about that If people are interested And the last piece around this is What does done look like? And unfortunately there is no done But what's handy about this With those road maps is that you can look back Towards the last three months of what it is That you've done and compared to what the Problems are that you have now It's very important to be able to recognize That you have success That's part of this But it won't feel like success It'll feel like I have a slightly less to do On my to-do list today than I did yesterday But they are new and interesting Things versus the piece that I've felt continually Stuck by So pieces that we started working on Since I first built this talk together Was Once we got to the point where Our code and infrastructure Were more tied in together We started looking at testing And focusing on what automated testing would look like And I think that's probably something We're still working with Release management became another thing That we were very focused on As part of being able to use this structure But other things also came back On the list Things like our website can Always constantly use a little bit more tweaking So that went back on the list Documentation became Something vitally important That was not necessarily initially Part of infrastructure and code But became part of it So I tell you all of this To be able to say that there are priorities That you can look at And there are pieces that you can all put together But thinking about it In terms of the work that you're doing every day Becomes a lot easier when you're looking at it Like infrastructure On the community side I also started doing more stakeholder reviews With the rest of the team to figure out What was important, what the priorities were And I'm happy to go into that But I think at this point I'm happy to be able to move over into questions Because we do have a few minutes in here And I wanted to make sure that I had time So thank you This was fun Good afternoon My question is around Develops and Failures which have a point Of no return So basically you do something And you don't have a road back So don't have a roadmap in terms of Are you talking about being able to do An infrastructure migration Going one way No return, point of no return So there are a couple of different pieces Around that that we can probably look towards One This better be something really important This better be something like You were going to save so much money Being able to make this work But also consider what the other options are If there's no point of no return What other things can you fail over to And when do you know That you've actually passed that point of no return If it's a DNS flip, well That's something you can work with Being able to keep Three or four different pieces around That point of no return Might be more helpful in this instance But if you've got something more specific I'm perfectly happy to speak to that No, yes Any more questions Next speaker is Aravinda He is a C and a Python programmer He's also a Gluster dev I don't know what we did when we did the scheduling But we just managed to put all the Gluster talks together Aravinda does eventing And He maintains a couple of components for Gluster In his free time, this I learned today He is a font developer He maintains two Canada fonts I can't particularly remember the name But the names of birds That's all I know He's going to be talking about real-time monitoring And the challenges of real-time monitoring When you have a distributed system Over to you I'm edible, right? I'll be talking about real-time monitoring GlusterFace Myself, I'm working In Red Hat Bangalore I contribute to Gluster Geo-replication, E1CP and other Areas Introduction to GlusterFace And Raghavendra Talor Gave beautiful introduction of the GlusterFace So I'll not go in detail So it's a scalable network file system So it has different configurations Like distributed replication And different types of Volumes So When it is distributed file system Where we try to monitor the stuff So it is very difficult because Multiple nodes And it's not real-time So typically we end up Holding the system So we query the status Frequently we query the system And try to get the status So wasting a lot of network bandwidth And still it is not real-time Even though we hold it 10 seconds Or 5 seconds And some of the status are available Through CLA but not all the status Like Suppose an incident happened in a particular time We don't know what time it happened Or something goes faulty Or some node went down We don't know what time it went down So let us assume Like this is a typical web interface Maybe like any other You can consider any other project also Like Overt-Engine The new UI is coming So in this UI The left side is showing the list of volumes And how many bricks are up And all those things So in the top right corner So we don't know the current status So unless you Start polling Get the status of Like query the status Get the status every 10 seconds interval So then you can refresh the same UI More real-time But if you do the polling In 10 seconds interval So you will end up doing Around 8000 network calls per day Just to get the status Sometimes even though it is not changed The status is same as previous status But still you have to do the network call And get the current status So how Events API is trying to solve this Problem So there will be one demon running In all the Glaston nodes called events B So which is Which collects the local events And push it to the External monitoring application via web group So in this Diagram it is showing Whenever event happens Like say volume is created It can be user driven events Or it can be asynchronous events Like brick going down Or split brain is happening Or your application going to faulty So all the events can be pushed into Web book So how to enable it So new demon is Available which can be started Using any of the system Service manager like in this example It is showing system CTL You can use service Glaston events To start So to check the status It shows The status from all the nodes of the cluster Like localhost is where We are running this command And other nodes from the cluster In this example there are only two nodes In the cluster So no web books are registered That is why it is showing none So to register a webhook So in this example It is the 9000 it is listening on the 9000 port And We can test it When we run the test command it shows It is reachable or not from the cluster nodes And if it is reachable successfully Then we can add it Using webhook add command So this is one example of How output How webhooks receive what format it receives Like in the When you create a volume called gv1 So it Sends immediately sends the Notification to the registered webhook As in the JSON format as shown in the below So which includes NodeID NodeID is from where the event is Originated and the timestamp is Unix timestamp and type of event Like is it volume create and So we have many other event The message is which is Specific to the particular event So in this case the new volume is created With the name called gv1 To show an example So this is a simple prototype Which Shows list of volumes So in the Below we say we are seeing the terminal We are creating the volume So that is immediately showing in the UI And other example So the previous example The volume create example is user driven Event so user is creating the volume So that It is somehow easy to track Because it is the command At the end of command we can add the Trigger event So some of the other events like the brick going down Here in this example The brick process is getting killed So that is immediately reflected in the UI So We added the event push Event in the cluster Brick d1 So that whenever brick goes down Or So even Glastady knows whenever brick goes down So that can immediately send notifications The web book So similar one more example to stop the volume And it immediately shows up in the UI These are geo-replication sessions There are multiple sessions Two sessions are running here So when you stop a geo-replication session The event is immediately Catched in the web book and shown in the UI So geo-replication session again So whenever a session Goes to 40 In this example it is The worker Worker PID is Killed So that causes the worker to go to 40 And it is immediately shown in the UI So status As of now we have More than 120 events Already covered Split brine, volume And geo-replication Brick events and many other Events Like quota events So there are some requests About filtering the events So if you register a web book Now you receive all the events But if you are interested only in Some kind of like say Which affects the geo-replication session So we can filter the events like I am interested only in the geo-replication Event so we can have Allerting mechanism there So now it is We have the infrastructure ready And we need to integrate with More external monitoring tools like Nagios, Overt-Engine and many other Tools And other requirement is The web UI to see the notification All the UI which I showed Is just the prototype So we need a standard UI From the Gloucester project And the email notification which Allerting mechanism like say If the split brine happening So immediately notify it to admin So those kind of notifications So all those things are in Plan so we have to Get it done And documentation is Available in the Gloucester Read the docs website Yeah that's all I think I finished it early Any questions? Just give us a minute as we set up for the next speaker Hi While we get started So my topic is About moving from How we were Manually deploying and managing us Hi I am Mehul Ved I work as a system admin For a company called NexSales And I will be Speaking about today I will be speaking about how we went From manually managing Our systems To bringing in some kind of Automation and scripting To be able to Bring our systems up and running And working much better So I will be speaking about that While we set it up How many people still do Manual work for Managing your systems Anybody here? 1, 2, 3, 4 Okay My talk will kind of cover About Why we were still doing manual work And what do we learn To Get into some automation And what are the tools though I would not really want to focus too much On the tools really because There are a lot of awesome tools out there And we picked what was more relevant To us, something more Made me definitely relevant To you guys So I won't be focusing on the tools for sure But I will focus on What were the scenarios and Why we do arrive at that decision Well While we are still Getting ready One of the things that we Worked upon was Ansible And how many people here Use Ansible in their production Chef Puppet There is a sizeable number who is not raising Up their hands Anybody using salt stack No Chef engine Okay Ansible was one of the things That was the first tool We started using And it's one of the things That I really love And I will be speaking quite a bit About it today Nice, the laptop is ready I will get going in a bit Sorry about that It's not a confidence until you have one person Who slides down work We should be on shortly As Mehul said he is a sysadmin He is very well known In the Bombay tech ecosystem And the running Sort of world And the swimming world And the cycling world He is going to talk about Moving from a completely manual deployment To how his company does automated deployments I will be speaking about The I will be speaking about how we went From doing all the things manually To bringing in the automation So some of the tools That we did pick were Packer, Terraform, Ansible and Jenkins These are not the tools that I am recommending They are awesome tools But not that I am recommending Because not everything fits for every scenario But They are the tools we picked up And why is something I will cover as a part of the talk But you are free to pick up whatever You really love Or whatever you are used to Or you are already using So who are we I work for a company called Nexales It is a B2B marketing and sales solution provider We were into providing Services since the last 7 years And in 2013 we started working On our first product And that was I was one of the first I was actually the second employee to work on the product And that was the first time I started working on a product I used to work as a sysadmin before But it was not really On any kind of products I was not exposed to things Like the DevOps environment Or the corporate environment Where you have very complex requirements I was used to places where there is a small setup And a lot of things were done manually So after voice reach We have just recently Started work on A new product called write leads And we are also Working on some api Re-designed for voice reach And it is called voice reach api platform These are all B2B Solutions that we Designed for some of our partners And This is the introduction Of what we have and we actually get Into the real part That I want to talk about today So when we Started off with the product We were just working We did have a cloud infrastructure And we did start on rack space cloud But we Had two machines and We never really did Things were just manually on these machines And we used to just Go around sys into the machine And edit whatever we want That's how we used to manage it But at that point it didn't really matter The only people who were using the system Were all within our company They were just testing out the systems And they were using it some kind of a production use But it was all within the company So if something breaks They would just come running close Hey our system is broken please do something And well we had to go in And fix things The volume of usage was Much lower so We were still transitioning from the old systems And if something is Critically broken they went back to the old system Thankfully we had A good way to migrate data Between the systems at that point So we could manage But we realized very soon that This is not something that's going We can stick with for too long So we moved into the next phase But while this During this phase We had two people who were really Into the development part of the application So nobody was really focusing On the ops part It was just development Happening and we were just Focusing on that no one was bothering About the ops part We just like Some of the things that we were doing Was pretty laughable and pretty basic We were manually hand editing the files And whatever edits have happened Just used to go into a wiki Okay this was the edit done and This is when it was done It was But as you realize once you start doing things Probably some of the Times it happens that you are Doing this under a very Short time frame you need Management is pushing you to do something And saying okay release this today And this doesn't You don't get the time to document things And you have a change Which has been undocumented And that spells disaster going ahead Once you start these things Start piling up it starts Leading to disasters which We did end up on We used to Actually deploy applications by Running a get pull on the Server and Again we Like our deploy used to happen Like a month or two months maybe Sometimes even two months apart so It was just piling We were in no way we could Have done frequent enough deployments So it used to be like we are putting All together we put The code together and in a Traditional way we used to just Release them and fix everything that comes up At one time Some of the problems document is not updated All the time if we Had to Reach a scenario where Our server crashed or some it Was compromised and we need to Start over from scratch we would have Probably taken good 10 15 probably One 20 days to rebuild Things and this was something that was A very scary situation once we Went live into production Code deploys We probably at times we Actually there was a release where We took three days To debug all the problems that Came because of all the configuration Errors while deploying the code Tracking Of all the changes For the config like we have The code changes being pushed to get But the configuration changes were not Really tracked properly anywhere except For the wiki where it was not really reliable Nobody was Handling the ops task as a Dedicated thing we Were doing the development work But then just pushing things through the Server and let's forget it nobody Else is looking after things at all Whenever the problem arises somebody Will go usually that's me because I was the one who's more Associated with the ops task I Understood the servers better so The what happened was we had one Of the Places just before the production Release we Went to a place where We had a major major failure We our configurations were Not right our code we were Having get conflicts and We probably ended up having A like a week's worth Of work where we were doing overtime And we were running into major issues And management was really Pissed off at us what is this This should not be happening what are You guys doing so That point we really realize that We need something We need to figure out what is there So after the release We set up together And said let's analyze Our failures and look into what is The next step we can do So Along with that we started Having the increase So we realize that one of the things Is that we have a single environment Everything is going there so We decided to have Two different environments one Was a development where everything Would be pushed more regularly And So then we made another environment Where it would be much more Stable and that is what users would Use so there wouldn't be a problem With doing a deploy And if the deploy Is broken then It wouldn't affect what is being used We started going from Monolithic application there was One PHP big PHP application At that point we started breaking It up into a few Microservices which were there And that gave us a lot more reliability Because some of the things had to be Constantly processed in the background And we realized it can be handled Much better with Node.js We had a JavaScript developer with us By that then Still We were Doing a lot of Configuration what I tried is To have a homegrown script Kind of a thing where it's Repo and I'm trying to organize Configuration files within that But With a little bit Work I realized it's just Still becoming pretty complex With various things trying To update together So what would happen is at times They were making changes At one place but The same change needs to go At another place like there is a service Which is a dependency of another service Both needs to have a same value And this was not being reflected correctly When I was making these changes So while doing that One of the first things That I came across Was here at rootconf 3 years ago Somebody introduced me to This tool called Ansible When I visited the website at first I was like I don't know it didn't really Make too much sense to me and I was A bit skeptical about that tool But I still started using it And it's one of the things That I love and I still use it To date So with Ansible The biggest change Happened is now we have a very good Tracking of what are the configuration Changes which are happening to our machines They are much well Reflected there It's in the code, it's there in the Git repo and all the Regulation changes are in one place And we have the dependency between Them mapped out correctly This has been The biggest change that we Could have and That made our system The reliability aspect that we are facing Was way way More simplified just by Introducing Ansible at that point There were things like we also added Some deployment code so we were Doing Git pull but the Git pull would Happen and would happen via Ansible So what that did was Earlier I was doing things Manually here now anybody can Just use the Ansible script And do it by themselves So it didn't really depend Only on me I can Hand it out to the others and Where people can Work with the same Code That allowed us to Do some delegation of the Task and also There was a lot more consistency With doing these things But of course this was the first Step and it wasn't Solving our problems we still had Some really major problems as we were Moving with things We are doing deployments via Just doing a Git pull, Git push And we couldn't Really track what is going into Dev and what is going into The production machine And when was it being done It was not too like a Git push may be done today But the deployment happened two days later When we see it more appropriate So there was no real way to Get the data we didn't have Anything back then So Again the deployments We use we are We have a complete JS stack We have no JS in the back end We use Angular in the front end So one of the things was we used to do A NPM install while doing the deploy And at times we used to Observe that Maybe some network issue Maybe some issue with the NPM At times our builds were not reliable All our dependencies were not there correctly So the same build Which was there in working Completely fine in Dev Would not work in production Just because the NPM install failed And we realized one of the Things that we need to do here Is to have a reusable build How I'm sure pretty much everybody here would be familiar With the CI systems And once you build in your CI system Every the whole Artifacts are available to you And if you can test it on one system At least you are sure that Your build artifacts are working correctly If there's a failure then it's somewhere else Which was not the case At this point We managed to sort out the whole Configuration issue That itself allowed us to build Our systems back If we had to do that Would be in a couple of days Which was a big big difference And we would be relieved Because the business pressure would be high If we were to take some 10 days And not have something up and running That would really be a huge Huge business loss But besides that since we went To microservices architecture We Started initially with just a monolithic application It was just one place Where things used to happen Now with Now with the various microservices The code integration Wouldn't really always happen correctly So this was A sort of a bigger issue that We needed to sort out That took us to Next step One of the big changes that happened In this was My role shifted from a developer To a full time ops guy So now I could actually look at the system Then manage the system properly And we needed to do that because We started having a higher Volume of Usage in our systems We had more users We actually had more systems There So We needed somebody who would do The ops work full time And we did hire a couple Of other developers to manage things So this brought In Sure Besides that we started experimenting With Things like AWS cloud because Even though Rackspace is pretty solid The costs were already escalating Even with we had about 8 to 10 Machines and with that the costs Of machines were really really high As compared to something like AWS And we were Looking at things like having Infrastructure which can be Built up whenever required and But you just have it running while You still want like a dev environment Doesn't need to be always running When you are doing certain parts During this phase We started looking at What are the things that we can Actually Start doing Sorry It's moved around So here is where we started Being Having a little more complex We started getting a little more complexity In our environments So Also the frequency of the releases now Increased now since we had More developers and we did Have things working and Moving and Also with new frequent Requirements coming in we were Actually Doing a few releases Every week and Or maybe even 2-3 days We were still nowhere close to having Something like a configuration or Being able to quickly move So much bigger difference from Phase 1 where we were just Pushing Things together and having a big Release at the end of 1-2 months It was much better place where we Were maybe 2-3 days To a week Then we Started working on having project management Tools to tell us What is happening, what is release What is to be released, the tracking Of the features, tracking of when Releases happen, we introduced Jira, Confluence, Bamboo, Slack chat And Slack Which was for These things We did try to introduce CICD pipeline at this point But miserably failed With bamboo We didn't really have anybody With expertise in that And we didn't know really Where to go with things We tried to just connect Tencible with bamboo to do all the stuff And it didn't work at that point So Yeah, one of the Even though the builds were there They were not getting consistent Builds across the environment With the multi, like now that We had about 10-12 machines There was something in production Something in beta Something in dev, the consistency Tracking of the deployments Across the environment is what CIE tools should have sorted out By doing the CD part correctly Which we didn't get right at this point So that was the biggest thing that we had And we were still manually Managing the infrastructure Just the configuration is happening But the base system was still manually If you had to install a new package Or update the package Probably updates would happen with Ansible But setting up the base system Was still manual So with the next release The focus was on improving the code quality And we hired a person To do the Q&A And the same with the Q&A We also Realized the person Who was handling the Q&A Is also doing good with Jenkins So we moved to Jenkins During this release And That was one of the big things That came into a picture With Jenkins being there And some kind of Q&A tests being there He wrote some integration tests At this point But there was still a big issue That now with frequent releases And A lot of work happening And more developers We were running majorly into a lot of Git conflicts So I did also Have a complete workshop With all my developers And we went through the whole Understanding of the Git And sorting out the Git conflicts Locally But we were still facing certain kind of Issues while dealing with the Deployed on the server environment So started working on Ansible for doing Dynamic And destroying the infrastructure dynamically We were looking At AWS at that point And AWS has Very good support within Ansible Where you can bring up the infrastructure Configure it and have it up and running So I started playing around with That So since we replaced bamboo With Jenkins during that period That was one major change And we started focusing on Getting that part of the setup right So we were doing the Business test having all the builds Being done in bamboo And building the deployment pipeline And the same Reuse those builds which have been done In each In the dev stage We can do that and reuse it For our beta And if that works Then push it to the production Around the same time In mid 2016 Where we started working on New As I mentioned we had Three products now And we started the work on the second product By then so we started Applying this from day one on the new project And that did bring us A lot more consistency And we realized where we Were going wrong with the first product And also since In the first product we started Applying this things much later into the thing Building the things on top of What is already existing Which is a pretty humongous infrastructure And also has a lot of things Which we don't know right Even pretty recently We didn't know some of the things What has gone in so there were inconsistencies And unless we were in a place Where we could do everything from the start This things would still remain So we Ensured that we have Full automation and full control Of the systems Of the builds Of the deployments Everything right from day one In the new project So with these More projects coming in And more developers being there And bringing in more automation Also brought in Increase in the cost now Because we used to have more and more machines And with these more machines Costs started just escalating big time So management Had a push that You guys really need to Start working on how do you reduce The cost for this So Also Building time for the infrastructure was We brought it down from Days to like we could probably By then start doing it In a day probably 8 to 10 hours Which was a major achievement For us to get from somewhere With 2 to 3 weeks to down to 8 to 10 days 8 to 10 hours sorry But again one of the things Was I was trying to do too much with Ansible I tried to even Start new machines Bring them start the Bring the load balancer set everything Up together started writing the code To manage the infrastructure within Ansible And I realized that Somewhere I am not doing This right I am creating Something way more complex than Is manageable and I am taking Too much time And Maybe luckily or some Unluckily I don't know but the biggest hurdle Happened when we decided we are going With Google cloud and not AWS And Ansible uses Something called lip cloud To Talk to Google cloud They are working on With the native apis api calls But that was not there I I am not sure if it is still done or not But when we were working on it It was not there and we couldn't have L7 load balancer we wanted L7 load balancer ahead of Our machines to load balance All the STDP All Traffic that was coming into our machines And we couldn't do that with Ansible So I started looking Out for solutions I reached out to some My sister had been friends And One of the That is where we Moved to our next level of Automation that came in So The scenario is we Have three projects by now We were almost close to Having 20 to 30 Machines it should be 20 not 10 But we were 20 to 30 machines There and We didn't need all these machines running all the time So We started working on how to Cut down the cost and how to manage the infrastructure Such a bigger infrastructure Compared to the earlier things How do we manage this So I was Told by some of the friends To bring in Packer and Terraform With Packer I Start doing my builds and Having image Ready with all my base Things built into it And with Terraform It will just take the Packer image And have virtual And your whole infrastructure will It will start a machine and also With Terraform I could Actually define my Infrastructure where the load balancer was there My networking was there My machines were there My database was there In a much much simpler way I didn't have to write a logic on What to start first, what to start later Terraform Just understands how do you want to What are the kind of dependencies that come in When It's also more Declarative code Which is much simple I realized that it's much much simpler to manage And I was I didn't have to write a logic on How to do things To define what do I want And Terraform would give that to me Then We still keep Ansible We just reduced the use of Ansible to do the We are still doing deploys Using Ansible and we are doing Configuration using Ansible And it's working brilliantly And what we have done is we have Tied all the three things together And created some Task within Jenkins So now I don't need to do Like they don't require My involvement, everything has been Fed into Jenkins Only there are certain kind of Parameters that My release managers understand And they will just define these parameters Whenever they need to do a release Even our dev can do that They just need to specify what is the release number What is the environment And the release can be done right from Jenkins environment For every release, only when things fail Then I need to look into it That has allowed us To reach a stage where We can actually do multiple Code deploys a day We are doing that in the dev environment And in the basically In the non-production environment We are not yet at a phase Where we are so confident Of how the builds we are trying to Work on that but hopefully In next couple of releases We will be there where we are doing Multiple push to production in a day With This whole thing We can actually start off a new If I want to build a new Infrastructure, I can actually do it In some 15 to 20 minutes Now I have the Code written in Packer It can It creates a machine image And stores it already in the Google cloud images And Terraform Can just take that image And start building the machine Probably it will be done in another 5 to 10 minutes And Ansible just kicks in And configures all the things Once the infrastructure is ready This has allowed us to have the complete Devop cycle that you really need Where the dev Creates the code It goes to our CI system Does the build The build is pushed into The system Which can be used Then brought up in minutes And there are also Once the new infrastructure is brought up And a deploy has been Done there The tests run on that And give us the immediate feedback And We can have this Completely tracked into a Couple of central places like Jenkins Has a very good amount of information Which is already there We are pushing things to Slack And the next step is to push it into a monitoring system Which gives us a complete dashboard Of what has happened At what point Of course this is today But even today we do have a few more problems That we will be sorting out In near future One of the things is that We have Code and data on the same machine That certain places That means that we cannot just bring down The machines at this point And have a new machine brought up Unless we handle the data correctly So we are working on Bringing in that separation And so The data machines can be separate And they will be having backups And All the things handled On the data separately And the code machines would be much much It will become very trivial to scale up And down Just destroy and bring in a new machine Whenever you want Whenever we are doing a new release We can just bring up a completely new machine And whenever we are ready with the new machine Bring down the old machine We want to get into that kind of a phase So one of the factors Which is stopping us from doing that Is this part So Another thing is I have written a lot of code But it was written in a place Where we were still figuring things out So one of the things I would Really love to do is Rewrite my code such that I can If I have to start a new project at any point I can just reuse the existing code And be able to Do this with Very less work So that is still remaining In bringing into our automation So idea is to Be at a stage where If any If any eventuality occurs We are ready to be prepared for that Like having Being there in multiple regions In today the cloud providers Provide you multiple regions We are not using that We are still in one region And if that region has a failure My code is not ready to Immediately be able to switch to a new region Our data is there That data needs to be backed up That can bring it out from a new region And use it there So some of these things still remain And stopping us from having A very reliable infrastructure And having a fully automated system Which can really work without Me having to look after it all the time Yeah So that pretty much concludes What I have In my presentation Anybody would like to Ask any questions here Yeah Yeah So my question is Currently the scenario inside My company We have migrated into monolithic Into microservice And we are doing deployment By Jenkins But the problem is Currently on a daily basis We are having 5 millions of ad requests Which gets double On weekends Right So now we are on a stage We are thinking to have auto scaling But in Jenkins We need to configure All the servers Manually So is it any plugin or a way Where something like auto discovery Can happen in Jenkins When we scale up We can deploy the code on the scaled up VMs So this is something that I am The next release is The deploys don't happen to the systems The deploys will happen To my packer build So my packer will build and keep Everything The base ready So when Jenkins has a new build That build will get baked into my Image which is there So using that image I can start the new machine So auto scaling I have not used auto scaling yet But with the autos I am guessing that you can specify Which is the version of the image that you want to use So say if you want to go from one Machines to two machines You will start one machine with the new image Once you are sure it is working Then you can basically Say that take the old machine out And bring Start one more machine with the new code So you can So the whole thing is baked Into your You don't need the Jenkins To know which are the machines Jenkins deploys the code to your Image instead of the new machines And what I do is after that There is of course configuration required So my configuration which are the Machines to be picked up is I use dynamic inventory in Ansible To handle that So Jenkins doesn't need to know again It's Ansible which knows which are the Machines which are relevant to this Basically you need a Management tool so Ansible is one Probably if you are in AWS Then you can use their own tools Or if you are outside and Depending on the size of your company And your requirements you can figure out What are the tools relevant to you Ansible is what we use You can use any configuration management tool That is relevant to you If you like to speak or Ansible I am always happy to do that Probably there are some red head guys around I am not sure if they have anyone from the Ansible team Okay the next question is from here Hey Hi I have a question On the tool whatever you talked about So consider that you have a Cluster already created using Your system So tomorrow I want to upgrade The same cluster so how Is it going to be With zero downtime Sorry what is this Cluster of what The application You spin up instances So we have Node.js Let's say we create a cluster of Node.js Machines which we have not really Experimented with it but what I can think of Is that We would go is Instead of Sending all the requests We will create one machine With the new version of the code And only push Have our Load balancer only push a certain amount Of data sorry certain amount Of traffic to that it can probably Be a certain user that We can Specify or it can be certain of Rules and once we are sure of it Will bring up a completely new cluster Which is Which is all the new The new release let's say we want A cluster of we We want release will have one machine Of v2 release Once we are sure that it's working correctly We'll start sending more like We'll build a whole cluster of v2 release And send all the Data to that now and get the v1 release now So we'll take down the whole machine We are planning to basically get to what is Known as a immutable infrastructure So there you don't once you have The machine up and running you Don't modify anything on that you Just use that machine as it is And whenever you want to do something New you bring up a new machine and Once it's working take down the whole machine Sorry can you just wait for the mic Yeah how do you handle Capacity in this case It is something So you have a constraint on capacity So how do you Capacity would still not be affected Right because you're Like say you were just Sending your to like Essentially you will only send a little bit of traffic Let's say 2-5% of traffic to the New uses so if there's a problem With it it won't everything is Still going through the old version You need to ensure that there is some kind of Compatibility or We are looking at things like Feature toggles or The things to build that kind of Compatibility the new release where You can have both the version of the code Living side by side on different Machines of course or it can be different Containers if you prefer that But basically we have Two versions of the code running side by side So still majority Of the traffic say 95 to 98% Is still going through the old version Till the time the new release is up and Running completely so you like say If you had a cluster of five machines On the old release You started one machine on the new release And once you are sure Of everything is working on the new release You build that cluster of five machines On the new release and take down the old cluster Because you are sure That everything is working so you Start directing the old traffic to the new cluster And once you are sure of that You can take down the old cluster That's one of the reasons to be in the Cloud and building the dynamic infrastructure Yes for a short period you do Have a much higher Users you are using 10 machines Instead of five machines for a short Period but that's a tradeoff That we are willing to do For having a better uptime And better reliability Wouldn't it be better if we Do this on a strategy If you have a strategy for this Instead of spinning Completely a replica of the existing Cluster and just wire it off And wire it on to the new Version of the cluster Will it make more sense Sorry can you Instead of so you said You will test on one box on the New version right So instead of you know to continue On the same logic So why don't you just go and Do it on a batch basis Instead of doing it at once If you have a thousand nodes you are going to create another New version Depends on the size I am probably thinking in terms of A much smaller size where we are right now But if you are much higher Like you said maybe in thousands Maybe in hundreds You will probably start with Like if you are 100 on v1 You start a cluster of Phi on v2 And then start Make it into a cluster of 10 And release 5 of the old machines So slowly start You can do it in batches Where you start Once you are sure of ok It is killing better with 10 So let's take down 10 of v1 And bring up 10 of v2 So you have 20 of v2 And 80 machines of v1 So you still have 100 machines But 20 percent of them Are on v2 and 80 percent of them Are on v1 So you can do that kind of a strategy Of course your strategy would be determined On basis of your use case And the constraints that are there What you have so I have not covered everything probably There will still be something that is more relevant to you And you are in a much better place To determine what is the correct thing for you Ok everybody We have plenty of time for questions We are running a little early But a few announcements first If you want to ask questions stay back But there is an OTR session Of the record session on disaster stories happening It is hosted by Trouble and Todd Highly recommended And it started at 3.45 It goes on till 4.45 We will reconvene here at 4.40 So that lines up nicely And those who want to ask questions Can stay back Question I have a question here Over here Just wait for the mic or else we can speak Hello I am speaking So you said at some point In time you are running two versions Of the application at the same time And I assume that this application Is connecting to the same database Because you need to show the data Change the database schema for the new version Of the application So yes this is something that we are factored in So basically what we are working towards Is trying to At least so far we have not run into any issue And we are not Completely doing this But what we are trying to work on Is that where we run into issues Are like when the column We are using MySQL So in MySQL if the column is deleted Then the new version is trying to find that column So instead of that we are Refactoring our schema In such a way that We don't delete the column anymore We mark the column as unavailable In the new release So we have something called a data model Which is the code which interfaces with your database So in the data model we mark That this thing is unavailable It is not to be used anymore So yes In a way we are kind of building a bigger database Than we should Migrate it completely to the new version That column will remain And later we will take a snapshot of this And then phase out the column Can you explain The usage of Packer and Terraform You are using Packer to So Packer is a tool Which kind of Defines the rule or it will take The basic image like we have We use Debian so it takes the Top level Debian image What has been released And we specify what we will go into That image to create our own custom Image Do we have The next speaker is Jim Perrin He is the Centaurs Arm Lead He manages the Centaurs CI team I think When KB is interfering And He is going to talk about Creating IoT devices with Centaurs Since before the Y2K So if you are young enough Old enough to remember that I will let him start off You are a terrible person Thank you So most of the talk That seems really loud to me Is that loud to you guys? Yeah don't do that So most of the talks we have heard today Have been talking about why Containers are great How we are using containers for new technology Or improving things for the server Side and OpenShift makes it all fantastic And Vashik will absolutely Stand up here and say OpenShift Will make it all fantastic See But we can also abuse this sort of technology We can take it and twist it In ways where it wasn't quite intended To go in that fashion But we are going to do it anyhow And it actually works out pretty well So how do I know this Because I Have been assist admin since around 1999 and thank you Nigel For making me feel old on stage Fantastic And I've been a member of the Sentos project since 2004 I decided that I wanted to dabble In a little bit of evil in my career And so I started out I moved from Education consulting into oil Consulting A little bit of Other things in there as well And about two years ago I started Getting involved in the 64-bit ARM effort On the server side Development for ARM workstations Or ARM workstations ARM server John Masters actually Was the person who got me involved In that And I've been doing that now For the last two years Give or take And there have been a lot of lessons The ARM folks to stop thinking Embedded and start thinking server And the more we played Around with that idea the more we realized It actually goes the other way as well You can take a lot of the server mentality And transfer that into the embedded space And it does make a lot of sense So the first thing we have to kind of look at Here is the idealistic Approach for IOT And for how Linux Distributions are actually done And one of the ways that we do that Is look at What drives IOT Most of the folks here are either Ops or engineers and that's great That means that we want to play with this technology We want to tinker around with it We want to build and develop it But we have to come up with a way that's convincing And pitch this to management and say Hey here's why you should let me do this In my spare time, here's why it's a good thing And really it comes down to money Because a lot of the money that's listed In IOT over the next 5 or 10 years Gets pretty ridiculous When you start talking about the amounts The low estimate For 2020 Was given at $470 billion That's a pretty Significant chunk of change Now that does encompass everything IOT related so that You're talking about the low end RFID tags all the way through gateways Through Amazon with Alexa Through Google with whatever the hell it is Google's plugging in with Nest A lot of various things like that In addition to all of the individual devices You've also got All of the data that comes out of that Somebody has to process that data Somebody has to look at collecting it Figuring out if you're going to track The actual data real time If you're going to look at trending data If the numbers matter or if all you care about Are we going up, are we going down Is it cyclical, that sort of stuff But these are basically The drivers for why people Want to get into the IOT market The other side of this Is in addition to Abusing the Data analytics side of it You actually get a chance to look and see How quickly you can respond To something with your business And if you are Using triggers based on the Data then you don't have To make a lot of the low end decisions Anymore you can sit there and say When I hit this amount of Widgets in my company Order more, you don't have to have A person who sits there and tracks that Inventory and says okay we're getting close I'll go ahead and put that order in You can just program this automatically And essentially start doing CI on your Business as well as On your code Everybody stands up and says IOT is horribly secure For the most part they're right And this is one of the places Where I think that A Linux distribution Be it fedora, be it rel, be it sentos Actually makes a lot of sense In the IOT landscape Because right now you've got companies like Samsung doing all of their Well I won't just single out Samsung But all of the IOT companies Are doing their own stack Their own Linux deployment, their own software Deployment and these are primarily hardware companies Lots of the hardware vendors Don't necessarily know the right way To deal with software Who does? Ostensibly software developers So I mean IOT is Kind Hey it doesn't like that slide Okay there should be Yeah alright It won't do that That's interesting We don't need Typical IOT type stuff It's skipping a thing So Basically what happens When you have companies When you have IOT companies Who are developing their software in a silo They'll oftentimes Especially for hardware companies Skip putting out updates Because once you've bought their product You're a customer Everything after that you become a cost center And this is one of the places Where having a distribution Model actually helps To worry about each of the individual CVEs Or each of the individual Security problems that come With building a Linux distribution Up from scratch You can farm a lot of that work out Onto a stable Base platform like Fedora Like Sentos like RHEL And then just focus on what you want That piece of hardware to do So if you're making a camera You can just focus on The user interface for the camera And then let the Sentos team Or let the RHEL team Or let the Fedora team worry about That pesky open SSL update That just came out or dealing with bash Or any of the other things And everybody looks at IOT And says they're talking about small scale devices We're talking about chips The size of your fingernail Well, that's not We'll see if it comes up with a slide That I want it to We're talking about the servers The ones on the right The one on the bottom is a legitimate product You actually can go get that They make bluetooth heated Insules for your shoes Why? I don't know But people buy it The equipment on the right is An HP Enterprise IOT Gateway It is no different than Your typical x8664 server The fundamental difference For this piece of hardware As an IOT platform versus anything else Is the way in which The software is delivered These gateways are done as a Firmware style operating system So there isn't necessarily The login That you would expect You're not going to sit there and run YUM update Or DNF update or anything like that It is a one shot software piece It's taking a generic CPU style system And making it task specific And this is pretty much When I say I want to put CentOS on these systems I'm talking about the industrial gateways I'm talking about the cash Registers that handle your point of sale transactions When you go into a store These are all The IOT style Servers that I want to see CentOS running on I want to see God help me Fedora running on One of the Last week at Red Hat Summit One of the use cases that we had brought to our attention Was an MRI machine And that one caught me off guard I wasn't expecting to see medical devices As something talked about for IOT And the reason that they were brought up As an IOT style device Is these are One shot installed systems That ostensibly move around Throughout a lot of countries We don't necessarily have the ability To stop in any one place And get updates The people who brought this up to us Were part of the Doctors Without Borders team And they were moving an MRI machine Around Afghanistan For some of the mobile hospital type work They're not going to be able to plug in a system And do a YUM update These are high end medical devices That have to be certified At each step of the way You're doing An IOT style update Or the RPM OS tree Who's familiar with RPM OS tree Let me ask that question before I start About half of you RPM OS tree Is a very good way To let you generate The tree of updates That's going to be pushed out To every single system One time So you don't have to log into each individual server And you do the batch transaction Once and you push that out Primarily we've been using that As a container platform Because it makes a really nice way To drop containers on top of that It also works in the IOT space For exactly this Because I can have a golden image That gets pushed out And most people should be familiar With the concept of golden image So You generate the image And you get that image certified For a medical device And when it's time to push an update For that, you go through that same process Again, it's a read-only install That doesn't let you move From one to the other ostensibly And so for MRIs It actually made a lot of sense And the guy who brought this to our attention Really wanted to see this done So this is something That I think we could actually push Along with automotive Management systems Most of you aren't going to connect your car On your home wireless network Tying it in through a bluetooth Cell phone or 3G connection 4G connection via cellulers Pretty much the way that's going to happen And the last option is basically the home gateways The Amazon Alexa style devices That act as Brokers between what you're doing And the cloud as a whole If you have a bunch of light bulbs In a house that are IoT connected You don't necessarily want to have to talk to a server In China to tell your lights to turn on In your house. If your network goes out You can't turn on the lights, that's kind of a problem So a home gateway Acting as a broker for that makes a lot of sense And a distribution That has a Method for tracking And updating CVEs Or other security vulnerabilities almost immediately Helps a lot in that plan And I answered my question Before I got to that slide So The reason that we want to use The enterprise style operating Systems or the Long lived operating system is basically Nobody's going to replace the refrigerator Every two years Simply because somebody stopped Producing a software update For the in-dash panel on the refrigerator Or The dryer has an application So that you can be notified When your laundry is done When they stop pushing the updates to that You lose some of the functionality that The hardware manufacturer Is using as a selling point for it Having a distribution That can continue that 10-year Life cycle makes it easier For these appliances To continue to be useful And productive despite what The manufacturers think At that point you now know that It's a forcing Function of the business rather than We don't have the time to put into first software On this So it means that it's easier for the user To upgrade to do things right And a lot of it comes down To it's the exact same tool chain That people are already used to using A lot of businesses are already certified On rel or on sentos Or on fedora They already have that workflow down They already understand how that operates And this allows us to continue that Into the lower end for the The lower end chips as well And into the IOT space It's the exact same process Things behave the same way It gives everybody that consistent platform That they understand already Does anybody have any questions so far? Does anybody think these ideas are Completely and totally crackpot yet? I have no problem heckling The audience feel free to give it back So if you have a question stop me and ask I have no problem with that Basically this is what we get out of it The vendors can focus on making the hardware Do what they want the hardware to do They don't have to care about the operating system So we don't need to have 18 vendors Coming up with 19 different ways To produce an operating system We can give them the platform And they can focus on making the device Do what they want Within the sentos project We've got a few vendors Who are already trying to talk to us To get us to do things like this I single out Samsung They're fresh in my head for IoT They actually have a few devices That are using I want to say it was built on Fedora 21 That was their embedded model For this And they came to talk to us At Red Hat Summit Asking for ways in which they could Get updates to the system for this So they didn't have to constantly refresh All the time And the RPMOS They actually seemed to work For what they were doing Because this was basically a DIY TV chip It was an embedded chip That they were using in a lot of their different technology And they wanted the customer base To be able to play around with it This gives everyone a way to do that So if it's running a standard distribution You get to play around with it You get to do what you want with it You can try and break it if you want to And contribute features back And again, the atomic updates Are basically similar to The typical firmware style So it makes things very easy To just push the update See if it breaks If it does, it rolls back automatically You get stuck in some of that Who here is in ops Or has been in ops In an RPM distribution in the last 10 years A few of you Anybody ever had a System lock up midway through a YUM update And have to roll back through all the transaction tables And everything I know you've had to do that a few times The RPMOS tree style Prevents a lot of that headache So you don't have to go through this And I keep harping on that point But it's one of the biggest selling points of it And it actually That aspect seems to make the most Sense to me So how does this actually work One of the Current IOT Workflow All pull comes from the automotive industry A large chunk of the automotive Industry is either standardized on QNX or Yachto Linux And I Don't want to take away from the Yachto folks too much It's a great project But it's really cumbersome to work with And we've had a few folks come up to us To complain about Their workflow process with Yachto And what they describe looks something like this They do a get clone of the Yachto code They do a base build of Yachto They start to add in The changes that they need They put their application code Inside that same directory They cycle their builds And their configuration files They copy it off, they test it And they iterate through that And every single time they do They get a fresh build of Yachto Which means they can't go back Because they're already into the next tree By the time somebody files a bug They don't necessarily Have a sane ability To roll back to a previous Get snapshot, see what was there See what was built, test against An issue that somebody has filed And then produce a patch They have to send that person an entirely brand Or send that vehicle an entirely brand new build Of Yachto And of the vehicle software To see if that actually works So That workflow is problematic For a number of reasons If you're already well through Your software development cycle You have to roll back a number of things Based on one particular issue If you're using Fedora Or Rel or Sentos Reproducing a build And taking a known working Set of RPMs Makes things very reliable And you can take a snapshot Of somebody who has filed a bug Or filed an issue And go back, duplicate That exact environment, see what it is And then figure out what needs to happen From there, it's a lot easier to do That in Fedora or Sentos Than it is to do within Yachto So On the Sentos side It's basically package Your application As an RPM Or a docker Containers, do we have to call it Moby Containers If we're talking about community work Okay, so as long as I use the lower case D Then we're good Okay, I'll include the trademark Then later in the slide Or change it to a lower case D You expose the package To a repository Either a container registry Or a yum repository You install it And you deploy That's pretty much it The only thing that you need to do in between Is validate that it works the way you want it to That's it So you add a CI pipeline into this Or Gotta help you tie it into OpenShift And everything is fantastic The other thing that you can do Just deploy it Or just publish it as an RPM Expose it to a yum repo And build a layer on top Of RPMOS tree You can actually If you're not familiar with RPMOS tree You can unlock The Tree Install an additional application Close the tree up And then push that layer as Just the layer You can add it in as a second tree And push that That workflow is still a little problematic But it does actually work And we're talking with a developer To kind of Move the workflow The reason that we push that Is if you do it as an individual container You have one container running on One device, one application That seems like kind of overkill To involve a container at that time If you're running multiple Applications If it's an in-dash card player You're probably going to want A thing to play music A thing to answer phone calls A thing to pull contacts in If you want Containers in that space make sense But if you're talking about a camera Your camera is going to take pictures Or video and that's pretty much All it's going to do There isn't necessarily a need to have Multiple things layered on top of it The other thing that this gets you In this particular workflow Is the ability to add in Anything else that's already packaged up So if you want to start Looking at sharing out storage In the home gateway and you've got Six different devices You want to be able to share things between them You can put something like Gluster In persistent container storage And share that out so you can actually Have seven or eight things that all Accessing the same point And I'll single out some of the Gluster folks In the room just because Nigel was nice Enough to throw out that introduction I'll see about making this part of his Task later on to do CI for this I told you My revenge would be swift The big thing out of this And the thing that I keep coming back to Is making sure that everything is consistent The ability to have A workflow Where you can reproduce An environment where somebody files A trouble ticket Is kind of key The ability to Test against things and Reliably roll back and then Create new Tweaks is really important But this also leaves out the smaller Devices, the ones that we saw on Those, the tiny ones We've talked about the big stuff We've seen the The rebranded servers That now count as IOT But the tiny stuff We've still left on the side So we need a way to deal with these Otherwise we're abandoning An entire chunk of IOT So how do we deal with the small ones The Zephyr operating system Is actually really nice On the One of the IOT working groups Red Hat has joined And is part of the Light, trying to remember what light stands for It's the Leonardo IOT Technology Gateway or something like that I can't remember what it is But Red Hat is involved in this And one of the demo platforms That the light working group For IOT is using Is the Zephyr operating system Zephyr is a real-time operating system So it's much It's more tailored to Real-time collecting of information A lot of the lower sensors need to have Where the smaller chips are Sensor designed So that you can pull in and collect information For temperature things The railroad industry is actually looking At a large portion of Because real-time actually matters for trains Who would have thought Finding out If the signals Are working right, if the gate comes down on time As the train is passing through Real-time matters in that instance So The Zephyr operating system Is Tiny enough that it can be built For a lot of these Low-end devices that will never get fedora on That will never get sentos on That just has no hope of running on But Zephyr is kind of purpose built So You can work through The Zephyr builds And duplicate this in QEMU So the QEMU environment Allows you to do the build on sentos Natively On fedora as well Run it Through the virtual environment to test And make sure that you've got all the functionality That you want and then deploy the build So you essentially add it into Your current CI chain You can do the build and test Right in the same operating system Right in the same pipeline that you're already Used to using and then crank out The small chip set If you don't want to do the native Compilation, fedora actually provides All the cross-compile tools So you can build on your x8664 laptop For arm64 For a lot of the other chips That they provide the tooling for So essentially you never Have to leave your current environment Even to deal with the stuff that you can't Necessarily run sentos or fedora on The point of all this Comes back to keeping it simple You want to have the It was said earlier today Developers are lazy And I know this because I am one I will say that again And I will steal from that guy Because I'm a developer I'm lazy, I'll use his words And it's kind of the same concept It's basically use the core that you already know Sentos is already there Fedora is already there Rel is already there You just need to apply it slightly differently It gives the developers a consistent base So they don't have to learn about Debugging Alpine and debugging Sentos and debugging Fedora and debugging Debian it's all the same Platform it's all the same libraries So everything works the same way That they're already used to And that gets us almost to the point A little bit ahead of time I was told 15 minutes for Q&A We're at 17 So questions? Anybody, anything? Yes, one sec He's going to bring you the mic And if we can get somebody on the opposite side Of the room to ask a question next It'll be great to see him run back and forth Have you done Linux from scratch There's a project out there where you just Take the source code and try to build it from scratch So What I had to do was Try to do something like Sentos from scratch Because if you are building Something like to cross compile The whole Sentos for Another target It's kind of Difficult Because there's no standard instructions Of how to build a Sentos Like build one Not just install Sentos and run it But build a Sentos So if you Truly want to do this Sentos on IoT Some instructions Along those lines will really help the Adoption And second thing Is there a definition Of what is minimally A Sentos Like should it just be the kernel DNF and SLNX Or should it be something else So Is there a definition for that That definition will really help Like You bring up two very good things The first is directions For building A version of Sentos And that changes So it's more difficult to provide Than you would think Initially when you look at what it takes To build Sentos You assume it's just a bunch of RPMs I'll cycle this through And that's not necessarily the case For each series of builds Things get kind of tricky You have to walk The build platform back and forth Even sometimes for little things Like libnss The nss library Has several checksums in it That break They're basically code time bombs MariaDB and MySQL do the same thing Because they validate SSL certificates And one of the nss Tests actually goes out To PayPal Grabs the PayPal certificate Pulls it in and then tries to validate it And if PayPal has updated their SSL certificate Like they do fairly frequently Because they do things right That test fails So It becomes Problematic for Okay, follow these directions To build a distribution But know that you're going to have time Bombs in the code when you try to build These 12 packages so you have to do This step, this step, this step Because every time that package gets updated Something new is in there For Golang Golang up to 1415 You could build that against GCC No problem there Now, Golang requires Golang To build So if we push those instructions And we say, okay, here's how to do it The problem becomes maintaining those Instructions every single time And we spend more of our time Focusing on building the instructions Rather than building the distribution It's not that the tooling isn't out there Or that it's not discussed We're not the only ones out there So scientific Linux is out there There's another one from Stanford University I want to say it was called Springdale I think they've renamed it to something different Oracle does it There are a bunch of different ways The other side of this, when you come down To what's A minimal distribution Or minimal version of CentOS That gets tricky as well You're bringing up, do you need a kernel For example For a bare metal install Absolutely, yes And we won't support it if it's not our kernel If you're talking a container You don't need a kernel in a container Because you're using the host kernel So Which one is CentOS, which one isn't They both meet the definition We ship them both So that gets tricky as well I would say Being self-serving And standing up on stage talking about this I don't necessarily Want you building your own version Of it and putting your own tweaks into it What I want is you contributing to the stuff That we're already doing I would want to expand the community out And say, hey, we've already got CentOS here Come help us do what we're doing And tailor it to fit your needs Or add an additional Package into an additional repository Instead of building it again from scratch Building it again from scratch Is pretty much what got us in the position Where IoT is right now Because Samsung has done it one way The folks from Dell have done it a different way With the Edge X Application There are a number of different examples For IoT products Where they've done it their own individual way That they felt met their needs at the time And it's gone horribly wrong for them The idea is this So generally If you look at other operating systems I mean the only In one of Your slides earlier you said Developers are lazy and the job of An operating system or a distribution Is to give some sort of Consistency to a developer So when he comes and Sees, hey What are you running on this piece of hardware And If I say CentOS Is running on this piece of hardware I know exactly What to expect from there So I agree with Whatever points you have mentioned But any definition Which you can give like minimally Like for example what First thing that comes to my mind Is CentOS has a silenax Like while Debian based Or Ubuntu they will not have that So I can expect So if I am going to Distribute Some application which is going to run On that specific piece of firmware And it's not necessary that you Are always going to be supporting that Piece of firmware Like there's some random chip out there Which I have got for Low cost and I have This decided to Use it so Can I build some firmware which is Which kind of Ascribes to the definition of CentOS Even if it's not like certified By CentOS or Whatever is the authority So how do we define Like This is like CentOS Like the most minimal basic requirement Not like list of packages These packages should be there in CentOS But more like hey it has the Linux kernel it has a silenax Like some definition would Really go a long way I mean it could be a starting point The container Argument is the reason that we Haven't put out a A firm definition of that yet Because if we require A kernel then even The official CentOS docker container Doesn't meet that definition So if we're just calling it a kernel Can't do it it won't work If we say it has to Have silenax we'll open Susie has silenax as well So that doesn't work if we say The way that the Distribution as a whole has said That we will deal with this this is to say If you are running code That we have signed with the distribution Key that is code that we will support If you have if you're running Your own kernel that's great You're completely welcome to do that As soon as we hit something where The kernel is Singled out as causing a problem You're on your own at that point As far as trademark goes If you want to do something that ascribes The CentOS the trademark guidelines For the distribution allow you to do that So you can sit there and say hey this is based On CentOS or this is Powered by CentOS but I have added X on top of it So there are guidelines on the website For doing that you absolutely can Do that as long as You're following the trademark guidelines And the tooling around it As far as the For the rest of it I want people contributing To the project rather than Trying to copy the project And duplicating it It doesn't seem like there's a lot of effort In building a Linux distribution It's not necessarily the Compilation that requires The work Although there's a ton in it It's the long term maintenance of it It's the ability to continue supporting it For that amount of time A lot of the value is actually there And that's one of the things that gets Pushed in the IOT landscape Because the updates for IOT just They're not there in most cases They're hardware vendors They're selling the hardware A lot of the Developer boards and a lot of the Even the android cell phones That you look at today Most of them are running a 310 or 318 Kernel Even though there are known problems An android itself has moved on Within the google aspect To a 4x based kernel They're just not being updated So that's one of the things that gets triggered It's that long tail of maintenance That's actually the Biggest part of the value But saying that Yes, it's maintenance that makes this Thing function Maintenance isn't new It's not hot, it's not the sexy thing That pulls you into a product So it's, you know, people get sold On the features, they get sold on the The upfront UI, the interface The flashy, the support Comes afterwards about a year into it When they start saying, hey, I need an update for this Because there's a problem Any other questions? Yes Hang on, the microphone So I'm quite interested Interested with the GPIO stuff Which I didn't find with Fedora So Dirt Center as some What's the GPIO On the 32 bit Side we have some limited GPIO support It's not Fully baked in the way that You would likely want And we don't necessarily have the hot plug For certain devices for Adding and removing While the device is powered up There are a couple of kernel tweaks in that That need to be done that aren't there On the 64 bit side We don't have that at all currently Because most of the servers That we've been aiming for don't have GPIO support So there hasn't been a need to do that In the future We probably will turn that on For some of the smaller scale devices I know that On the Fedora side they were working to enable Some of it, but Because Fedora builds Specific functions into the kernel A single kernel built for all the devices Just per architecture Getting some of those features Turned on causes issues For other architectures And that causes the whole Build chain to fail So there are some reasons why Fedora Doesn't have it just blanket enabled That are a little different You're not making him walk that far Sorry I could just walk over there So if you are finished With the answer So my question is CentOS is building Atomic images Based on OS3 But it's container oriented It has Docker inside by default Do you build or do you plan to build Other Atomic Not Atomic but OS3 based Images Or trees That would be as minimal as possible With devices without Docker Because maybe people don't want Docker there They want to just use OS3 install or something like that Eventually We have to have the workflow Around that Before we can start producing those images We have to have The developer environment and the workflow For people to consume that So we have to get that built first And then say this is the tree The easy way Is to say Here's how you build your own tree From this list of signed CentOS packages From this repo Here's how you can do it yourself That seems to be the first step to me Whether or not we say Here's a base tree This is what you need to use As your initial platform and work up We'll see where it goes from there I think we're good Thank you everybody You can pick up your feedback Forms and drop it at the bin Outside Not the dustbin but there's a bin next to the IFF counter So I'd like to thank A lot of the people who have done the work To make today successful Especially Hasgeek For hosting us along with Rootconf For the people who are running the logistics here So the hall monitors The people who have been passing around the mic The people at the cameras Who haven't been able to probably Move around a lot The people who are handling the bikes And Beavis has been stuck at The Video thing And that's going to end up on YouTube And it's a very thankless job for now And eventually we'll be grateful to him Thank you for doing it And a big thank you For all the speakers today And all of you who showed up Today and yesterday for the workshops And I think this is the end of DevCon