 Okay, welcome. Thank you for joining. I'm Nico, Nico Las. I'm from Argentina And I moved to the Netherlands in 2019. I'm living in Amsterdam right now I've been a Python organizer for for a decent amount of years I'm an organizer of this conference and I've been doing I've been an organizer of your Python for three years Before of that I was a living in Argentina where I'm one of the founders of the Python Argentina NGO I also been named a fellow of the Apathetic Tone Software Foundation in 2021 and I also work and I identify myself as a software engineer But I've been working like really close to the metal and to the inoperations So I've been as is admin. I've been that all those kind of things But I still like to say software engineer because I believe that the proper way to do in-fries is writing code I When I moved to the Netherlands I joined Optiver Where I'm currently managing three teams the Linux Networks and infrastructure platform teams The other teams are working on things are going to mention a bit today If you want to talk to me, please say hello, Pygmin Discord or Pygmin Person I'm going to be here still until the sprints on Sunday I work in Optiver. Optiver is a Proprietary trading company. We are market makers That means that we provide liquidity to the markets and we try to do that in with the best prices that we can We have around 1600 employees around the globe And we use Python a lot Those 1600 are not all developers right for sure. We have traders. We have a risk. We have our teams and We use Python a lot for for infrastructure, of course for research for trading And as you can see we have many offices around the world. I work in the Amsterdam one so Why I'm expecting you to get from this talk so first is There is a lot of applications of infrastructure that you can just go and read documentation that are common But in this case is like we have a few limitations where we have to do our own premise Infrastructure and that's that has a bit of a different challenge It's also a bit. I think it's interesting to see how others are solving problems and see maybe you get one or two ideas that you can use Or maybe you get interested in the topic and I think also important for me is like When you think in infra you have the code next to it, right? It's infrastructure is not only people doing Linux shells and SSH and patching cables that kind of things so if you go home with that idea and successful sorry so This is like as a screenshot of the optical website and the reason why you see is that you see 1986 was the first trade ever of October that was in the options exchange in Amsterdam and In this picture you see there is a trading floor and there's a lot of people The way that trading was happening at that point was like, you know, the screaming I want to buy I want to sell This is my price, right? And you saw maybe movies or maybe real images of Wall Street, you know big guys suits like Pushing each other screaming to see who who will be first, right? Well that changed a bit, right? A trade exchange looks a bit more like this now Right, there is a still some people in the trading floor But reality is like all the other things that all the things are happening more in other to send And October as a trading company that has been trading for since 1986 have to convert to to adapt to move there, right? And who it works a bit is Imagine that this is this biggest square is at the center where one exchanges and Who it works? It's like all the members are connected to the exchange system and Imagine each of these small boxes member one member three October is one participant, right? And that's one small room where you have all your racks and all your things you connect the exchange is going to send you data market data and they They do that in a way that everyone is receiving the data at the same time. So it's equidistant, right? So no one has an advantage of member three here that is closer physically to the change system doesn't have an advantage because the exchange is Building in a way that that everyone is receiving at the same time And then you can send order that they use multicast That's our network protocol and then you can send back things over TCP and send your orders So if I want to be a member of exchange, of course, there is a lot of paperwork Then they give me my square my space and I go and I deploy my my hardware and my things Maybe a solution will be yeah, and then I just connect the pipe and to the cloud and write some terraform playbooks And I provision on my systems and that's that would be amazing. That would be really easy But really it is that you really don't want to do that Why because you are doing trading and the speed is important Depending on the exchange we can be talking about milliseconds, right? It means meaning I receive some updates from the market some market data And I want to update my prices or I want to send orders to the market, right? That can be milliseconds. It can be microseconds. It can be nanoseconds, right? So any cable that will take you to the cloud only just going not It doesn't make it doesn't mean if it's round trip. It's already too slow, right? So Unless someone can change the realities of physics and have a fiber that is a much more faster You have to be ideally your own in front side this change, right? Maybe in the future the exchange will move to the cloud and then we can be in the cloud and Now this thing imagine this many many times around the globe, right? So I Know 20 40 60 different Physical colocations that you have to build and you have to manage and you have to operate so To set a bit. I don't know. Is anyone here in the room working with physical infrastructure like a data center? nice a few nice so What you have to do right? So first you get the space, right? They give you space it's empty they give you power they take care of your thermal things, right? And this is already something important. It's no standard every single exchange will decide a different other center They are different countries different Standards the racks are 10 centimeters different. So it's already hard when you want to standardize Who who who you those things like they give you in one place they give you This amount of power and they're one they give you half So if you but your standard is I want to have 20 servers in each rack You can you can only have 10 because there is no power, right? Then you need you need to put racks You need to your networking devices unit servers cables for power for networking for all kind of things You need your one connectivity because you want to connect all these places You want to be able to access the colocation to run your your automation? You need to connect to exchange and you need other devices, right? And I don't know this can be Terminal services a good example is like a out-of-band device to operate your switches And then of course you need firmware to be provisioned to your devices You need OS to be provisioned to the buses you need configuration, right? And that's kind of the base I'm I'm omitting the applications here and then also I'm Oversimplifying a bit this so imagine I mentioned nanoseconds so so you can imagine that there is some custom hardware Right, there is some special devices. There is low latency Switches, etc. But let's let's keep it a bit simple just for for for the sake of for discussion Before going on how to solve hope not to solve and I think the ones that were raised in the hand will Reflect with me. Maybe one of you have experience on doing infrastructure in a bad way, right? So if we have to do 40 different colocations and manage them, please don't do that in Excel, right? Don't put like a Excel say one tab is my a piece there types my host names and this is called the cables are connected, right? So or building some manually maintained diagrams, right? Like as ambitious arguments that you have to keep updating manually every time and you already know is you're going to be updated, right? The second thing is be careful with the tribal knowledge, right? Or with conventions that are not easy To maintain but they seem easy, right? For example in calling metadata in your cost name, right? These are all red flags if you're doing this something is wrong if you have pets names Well, maybe that's okay. I don't know if he's random If you don't call me that imagine like Before it's not easy before because you don't have an agonized system. You say server number one database zero Application X right like you you put all the information in the host name and you say no, this is great Then I just read the host name and understand everything But then one day you change your system and then your database is in this server next to it And you have to rename all your servers, right? Or the other one is like, yes server number one use IP number one server number two use IP number two So I ping the same number one. I know it's a top in the rock, right? And you say, oh, this this is great Right. This is really easy easy to memorize But it's easy for you, right? Is it and and you are not the best source of what's your infrastructure, right? And ideally the tooling that you have should help you to understand all things The other one that is a bit obvious, but I'm going to say it is don't do it manually, right? And don't don't go and build one by one except it's small enough. You can do that. They're great, right? I say, oh, I have five servers. Who cares? I will write it in paper and and do it. I don't know. Maybe it's okay. I only change the servers once a year but You do manually that means that you don't have a standard, right? Because even if you have a standard and you wrote the standard, how you know the standard is really there, right? How you know that I didn't type server zero zero one and then the next one I type server zero zero zero two, right? And I and I did a typo and that kind of things there is no way to have assurance, right? Like or if you can do that you need to do like Several iterations or go and check all the time, right? That's almost impossible There is no good inventory, right? If you want to plug your monitoring system or You want to use Prometheus you will have to manually Configure Prometheus, right? So there is not really a nice way to have I feel doing it manually and then You are spreading the problem, right? Then your application layer So someone writing applications in your infra will still have to adapt to your Excel files To run their application, right? And that's of course a really bad idea and you are propagating the problems to them So Let's not talk about a rating thing one like a one single server can have the five different thing words and you will be like clicking Clicking and clicking one by one to do updates if you do it manually Basically, it's like a snowflakes everywhere and it's a bit of an Imer for someone doing it for us Finally there can be other requirements and I have one example that we have in October We are a financial company. There are financial regulators and one thing that we are required to have and we also want to have is a That means like our system can go crazy can be a bag I can start spending millions per second, right? So we want a way to say yeah, I want to disconnect you from the change now. I kill this thing Hope and if you don't if you don't do that in a good way, you are a wiki pay a page tomorrow, right? Like you don't exist anymore as a company So how much are you going to trust your exit fines, right? To get the list of things that you have to disconnect from the change Again, this is a bit probably Not really required, but I think it's good to mention what don't do And this question I really like is can you really from scratch? Right if you're afraid if when I ask in this question and you are doing infra that you have a problem, right? and infrasco is a bit of a way to solve this and and I honestly Probably no one can do this hundred percent exactly like oh, yeah From zero to hundred but if you're in the cloud probably you can do like 99 right ideally and And if you're not in the cloud you should be able to do this as as much as you can Then what's infrasco? I talk is about infrasco, so let's explain what it is It's a way to manage and do provisioning through through code instead of manual process, right? That's that's one one one point It's a machine really readable definition instead of interactive configuration tools, right? You are not you have a way to consume that with software You don't need a human to be clicking things, right? So you should be able to implement code to use your infrasco data. I Think this for me is a bit of a key is you add Reproducibility sorry you add a bit of speed because you can do it faster less errors humans We do errors by default, right? And that means risk reduction and in total is a bit of a cost reduction and the cost is not only money It's also time. It's also headspace, right? I don't want to be thinking on IP addresses in an Excel file I don't want to be thinking about IP addresses I want IP addresses to be assigned by some code I wrote two years ago and not to be a problem So I can focus on how I made my observability better, right? How I make my automation faster. How I do all the things It will infrastructure as code will enable you to orchestration, right? So if you have a nice Infrasco solution that then that means that you can orchestrate because I'm going to reinstall all my colocations every weekend, right? I can I can I can operate my my systems in a in a higher level, right? You have like a bit more of a power Of course it's standardization, right? Your environment is is you know, you are sure you have assurance that it's going to be in the way You want it to be a Bit of a machine to clone snowflakes, right? It's like you craft one snowflake slowly, right? With your hands and then you reach a point when you can just copy copy copy copy all the time And and finally one comment for me is as important as the whole definition of infrastructure as code It was declarative versus imperative so an imperative solution to the automation is just Describe the steps to achieve a state, right? So say, okay You will write some batches creep and you say common one common two common three or maybe I know something like answer will We'll run some steps and we'll create some Ginger render templates that has a problem that is not good for humans are trying to consume that is also not good for machines because to Give you An idea what's going to happen you need to process that bashing your brain, right? You need to process that in your head and or you need to render those things in your head and declarative is more What's my intent desire, right? What was my intention? Why I want this to be and then some code will make it reality I think we're net this is a good example of that you define your app and then cool net is now how to deploy the container You don't you don't say the step forward This is a screenshot for Terraform website This is a really common case and I'm I'm showing this because I think it's a good way to spray infrastructure Right, so you have Terraform that gives you a way to have like a high level definition Then Terraform have the code so there is code that implements the providers Right the Terraform providers and then you have this that is the Amazon or Azure or Any other cloud provider or provider at work with Terraform that they already have some kind of a standard for you Right, they already have like a solution that you're exposed endpoints They give you SDKs and they give you things that you can consume and sometimes we don't realize how much they Attracts from us and this if you're going to be doing infrastructure as common-prem You really really need to know this because you have to solve a lot of problems that they are solving for you, right and Here you have okay. This is a version of my of my code I'm saying in which region and for example teach you Micro, right? So I'm saying I want a VM that is micro But that means that there is existing menu You cannot get any random amount of course or any random amount of memory that you came up that morning, right? They are enforcing you already a standard, right? They are abstracting for you a lot of problems and What they are taking from you is I'll say, okay This is standard you can have this kind of VM or you can have this memory or you can put servers in this way And you cannot do these sort of things, right and I think it's good for a lot of cases, right? Because then they they they set you some boundaries and then you can move in between that But it's always good to think right they are solving for you all the physical realities You're abstracted for you don't care about a switch of that kind of things Um So let's talk now how we do that a bit on without them right without having the cloud First I wanted to talk a bit about the open source software stack because ideally if I ask one of you Okay, let's solve this problem. We have 40 other centers. Okay. Let's see if there is an already someone solving this problem, right? I think there doesn't exist like a de facto. There is no coordinates of on-prem infra, right? And There are some solutions like I think net box and now to vote that now always have is a fork or net box Those those those solutions are interesting but are solving only some part of the problems not all the parts And I think there are a bit of challenges on how much you can adapt Napal is a great Resource from the open source community Napal is a library to talk to networking devices And when they do that, it's amazing that they give you a common interface, right? And and then you can contribute to them, but you don't need to maintain your own interface with all the different vendors But for sure it would be great to have one I would know right that would be really really nice Because that because what you have to do if you want to do infrastructure on-prem You have to implement your your your tiny cloud, right? And then you have open-stack meta necessary Racken take a look to all those ones are interesting We decided in October to implement our own solution, right based on the situation that we were and the conclusions that we arrived to Before going to that before writing any single line of code you need to have a standard, right? And in this case is much more important because you don't have a standard because if you're in the cloud The cloud is forcing a standard on you even if you don't like it here. You don't have so Which color of cables are we going to use? How thick are the cables, right? How many servers we put in Iraq? What's our network architecture? Who we're going to do routing who we're going to do redundancy who we're going to what's our standard? We over provision or we under provision, right? So think on all those parts and think that your code your infrared code implementation Is the implementation of the standard and ideally you do that in in a way that is not one-on-one So you don't have to rewrite the whole thing every time you change your standard But there is a decent amount of coupling but think on trying to have this With some flexibility to you cannot have only one standard because some physical realities are as long So now let's let's let's take a look at our solution and and I Will try to use the last five minutes to to go through it If you if you see this this big big big square here think from there to to to your right That's kind of Amazon for now. Don't don't think on what you're saying So it's invisible for you, right? And in this side you have a standard, right? I stand I say how you think and then you have a high level definition or think on values, right? It's like input from my standard. It's like what's IP address space? Where are the uplings? What are the serial numbers or the physical devices, right? And what's the standard version? Then where you have that You have what Terraform is doing right what Terraform is doing is knows how to read that those That high-level definition and do all the API calls to the providers in the order that they have to be done to Create your input, right? and We we have that right we have our SDK we have our command line And then with that we can interact with with our infra But then the hard part is in the other part is you need to implement your your your Amazon solution so What we did here is that we have our intent system This is the reason why the name is intent is because it can be source of truth for a lot of people But we intentionally say intent because this is that this is our intention This is how our infrastructure look. This is a Django instance with a possess database is celery is a pretty standard thing But we do that there is what we modeled our Infrastructure right and then you can call an API as I want to create a location I want to create a rack and I want to put a server a switch and I want to connect this Server network card with this port and then you do a layer 2 of networking and layer 3 of networking And what you have is that when you run this part the code that oh, sorry I'm I have the mouse in the in the ropes. So when you run this this This code that implements a high-level definition of standard It will do all the API calls and it will define Your whatever you need to be your your colocation in this database after that We have our provisioning pipelines. We use fast API. I'm Other Python things. I'm basically this system knows how to read our intent and then push The configuration the way is the firmware changes to the divisor, right? Is it also has had something I didn't put in a slide that for example, you have the physical reality? So that means that you need to Run this thing for a first time and then you export the data and you give it to your data center engineer team And you say please go and build this right? Rack the servers connect the cables and these are instructions to do it and when that's done you can then Say okay now install it and then you have the assurance part, right? So you can take a picture you have as a system that is doing collection of truth It's taking a picture and we have our dating system that is Comparing our intent with reality, right? And it is our into saying these are as we have a tree IP number three But it has one it will create an event because there is a bug somewhere or some someone did something wrong So as a pretty common example something that that that we do is we will go to be the new data center colocation Then we will work in this in this file will run our our script that this can be terra from in the future, right? It can be Why not today we we are still working a lot on this part But probably we can just break with terra form But it at the end was important is what's the concept there is code that implements your standard and knows It has to create 40 servers in Iraq and it has to connect them in some special way And it has to do the routing in the switches, etc So we run that first thing we export the data to our other center team They build and then when we have some basic connectivity to the devices We just call our provisioning pipelines and we push firmware updates and OAS And there is a lot of super interesting things that we can talk here Like I don't know like we are doing docker images being flashed to bear metal servers But the end for me was really important is to get this this whole idea You have like a intent that you need to maintain and you need to solve a lot of things that the cloud was solving for you before I'm hurrying up because I want to give time for questions. So a Bit of conclusion is as I say there is no the facto. I don't know if that's possible I hope it is it will be really nice because each company is a bit different But there is maybe for there is some cases that are common enough that there is going to be a de facto one The Python ecosystem is great for these kind of problems, right? Like if you're solving these kind of things Python is the best, right? It's super easy. It's fast You have these nice frameworks to be on top It's a bit of like building your micro cloud and for me that's also really nice something that I really love about my job Is I have a bit of the luxury that I have a problem that is big enough that I can write my own Micro-Amazon right because I usually you don't have a problem Big enough to that and and that's a really interesting challenge, right? It's like someone did a lightning talk yesterday and was showing how to change the Python Interpreter and this person was telling me later. Yeah, you can on write your own shell And that's something really nice to do because you learn a lot and I think here I learn a lot like I have for example now a lot of thinking of how a cloud provider is is doing things, right? and That was my last comment and I really wanted to be time for questions, so I'm going to stop and Ideally in the questions I can give more more complex. Thank you Thank you very much Thank you very much Nico. Do we have any questions? We have a microphone there in case someone wants to Go and ask them. Okay. We have the first person. Thank you. Go ahead Could you share us About in any like the major outs that I've let you had like what's the worst that ever Happening when managing the data center either on the files on the specific physical Extructure any horror story that you can share with us Well, if you would be in enough time in this industry I saw a fire water smoke all the things If I tried to and that's more because the physical realities, right? Like you're in the center and and and it's just getting off that you have to be ready to move to a new place I think I Kind of think right now in one day will be scared that we did but yeah I can imagine our system having a bag and changing every single gateway for a single server or something like that didn't happen so far like Please Yeah, hi. Thank you for thought was really insightful This definitely interest in the space about the idea of these kind of on premises to micro cloud Recently the HH has published that base cam is no longer on the cloud AWS Had published that the server less architecture doesn't work even for them. So My question will be if there any plan on your part of Tevo, whatever To open source these stuff or even kick kick off these open source movement towards micro Amazon as you collect. Thank you You're welcome. I don't think right now. We are playing to that But for sure so we try to contribute to the project that we use like napalm and some other projects I think releasing the whole thing as a package It's like you you have to be the product that we are not ready to do because it's kind of our internal usage So it would be really hard We need like a whole team maintaining it to be generic enough for others Well, I do expect that we'll be able to do is release some components Like maybe there is a piece that is installing linux in a server and it's a pipeline that is pretty easy to isolate I think in that that those are things that we can open source and yeah, I hope I hope we Ready to that at some point Thanks for your talk. I have a question. I think that your system may be Far from being transactional my question is how you manage inconsistency Well, I showed this a audit in power, right? So If I understand what you're saying going consistent with inconsistency, it would be like my intent is different to my reality, right? something like that like for example, if you order to create I don't know a Cluster and first machine was created and the second one is was not created. Yeah, so We we run our pipelines in a way that they are idempotent, right? So you can run the pipeline many times and he will try to achieve the end state So if there is a bug in the middle, you will fix the bug and rerun it or you will find the problem and rerun it until you You achieve the end expect the state and we use this feedback loop to check if that's the case Thank you very much We don't have more time for questions But there was one question that they already tack you on this court. So maybe you can follow up later on So cool. Let's thanks Nicholas again. Thank you