 Okay, I think I think the timing is about right the last group was a little bit late So it took us a few minutes to get settled, but um, it's 1120 now. So I guess I get started So hi everyone. My name is Dan Paik. I work for something as DS I run the development and You know our software engineering teams over there working on our next generation cloud A lot of what I'm talking about today is really focused on high availability So it's things like how do we get multiple availability zones working? What can we do to build in disaster recovery in open stack platform? and a lot of that is The reasoning behind that is a lot of the background behind both the company samsung sds as well as The the enterprise customers that we have and sort of why that why we have to build these things So I wanted to discuss a little bit of background first in addition to some of the struggles that we have And I think off the bat, you know, one of the major points I wanted to make was we don't really have a lot of answers. So one of our Big sort of points of coming here and talking to you today was to seek out assistance and help and let's have these discussions And let's sort of figure out some of these things together So hopefully that makes a little bit of sense So it's these things here. So samsung sds is a you know, we're fairly large company. We're a 10 billion dollar company We've been around for about 40 years. We have six and nine global offices 23,000 employees, you know, and we have a few Stats in terms of like most most of our employees are in Korea and for those of you wondering it's South Korea We're in Seoul and and that actually becomes a little bit relevant a little bit later on but And you know, we're 10th in terms of brands 25 in terms of services These are the types of things that we do and the map there has the different offices that we have I actually work in Seoul Korea and it's um, it's 3 20 in the morning over there right now So I'm a little bit tired, but I think I'll manage okay So these are some of the words that we've we've had more recently We're listed in Gartner's Magic Quadrant data center outsourcing hybrid infrastructure And we're actually the first and only as far as I know, we're still the only Korean company That's that's in that quadrant. So a lot of the data center operations that we do the data centers that we've built Some of the things like PUE those things are We're relatively strong pioneers in that at least in the Korean market So this is some of the certifications that we've done You know, those are the certifications that we have primarily from a security perspective Most of our business as Samsung SDS, you know, a lot of you are probably aware of of Samsung the company You know, a lot of you probably have Samsung phones laptops refrigerators TVs that kind of thing So the Samsung group itself is fairly large Of which there are many affiliate companies electronics is the largest one and something SDS We're the IT services data center. We run a lot of the infrastructure for the entire Samsung group itself The majority of our revenue comes from internal Samsung group customers And that's a lot of the cloud that we're trying to build is to manage the different Workloads that we have within the Samsung group, but we also and we're trying to expand this We also have non-Samsung business as well. So we also need to build the cloud that can support These types of different tenants whether they're internal whether the external whether it's sort of publicly facing whether the internal private All these different use cases are the things that we need to manage So the different types of industries that we that we serve a lot of what I talked about right now Electronic semiconductor that's largely because of the the Samsung companies Samsung semiconductors one of the world's leaders in that area as well on batteries are with Samsung STI Biofarm pharmaceutical Samsung biologics, you know apparel, you know, we even have clothing brands in Korea The financial services, you know, there's Samsung credit cards Samsung securities you can buy and sell stock You know in Korea through Samsung. So, you know, it kind of hits every different area Construction engineering their theme parks in Korea run by Samsung. So, you know apartment buildings offices that type of stuff So the the industries that we serve are fairly broad the services at the bottom are Largely where a lot of our external non-Samsung business comes into place. So public sector government, you know, Samsung actually has hospitals Colleges as well as hotels, but the Department of Defense is also military that we also look at as well So we do have a wide variety of industries and our cloud needs to support these types of use cases as well as the different types of certifications that are required for these industries In terms of cloud itself at Samsung, we've been working in cloud for a little over 10 years now We started in 2010s, you know company has been in business for 40 years But we've been working on cloud for the last 10 Primarily with the aim to the goal to more efficiently run our hardware and services that we have I actually personally joined the company in 2019 and some of the things that we've done since 2019 is really we've built out Additional data center. So we went from 6 to 17 5 domestic 12 overseas. We have specific Cloud pools set up for finance public sector manufactured R&D. We built a new HBC data center That just opened up earlier this year things like water cooling and larger racks that can handle more more watts and more power so that's a type of So our cloud itself has been migrating very has been moving a lot as well That's a global infrastructure. So when we talk about global infrastructure in this sense, you know Although we have the the five in Korea. We since we are a global company We need to ensure that all of our workloads work globally as well as we have disaster recovery across the world as well We also need multiple availability zones for uptime So all of the different areas that we and the data centers that we serve need to somehow be organized in a way That they can work off of each other and we need OpenStack to be able to support that as well and this is some of the expertise that we've had you know because PUE and Green initiatives have been much more important in recent years. We've also focused on that as well We've gotten much better at that. So we're really leading a lot of the we're really pioneering a lot of this Especially in the Korean market in terms of like don't just build data centers make sure that you're using solar and wind and make sure that you're trying to be carbon neutral as much as possible and You know, these are the different regions that we've set up so across the world So we have the Americas, you know, EU Europe AP and we've set up different data centers in these areas and What we're talking about in terms of like disaster recovery is 2022 was securing me most of these regions this year. We're pairing these regions up in terms of disaster recovery zones. So for example You know deli is set up so that it's disaster recovery set up in HQ and in Seoul same with Singapore London has Frankfurt as a pair San Jose Dallas have had New Jersey as a pair and Sao Paulo also goes all the way back to Korea So you can kind of see that it's not necessarily optimal But we'd like to add additional data centers so that we can pair up disaster recovery of better now It doesn't have to be this way like, you know in theory if you wanted the Singapore data center to back up to deli You you could You know something we set up both the cloud as well as an MSP So we're CSPN and MSP and that kind of makes us unique in a lot of ways and that you know My team builds this cloud and make builds the infrastructure and builds the code for IS as well as managed services But we also have a managed service business that will actually go out and and run these for you and configure this stuff for you as well for a lot of the customers that we have so But this is the way that we've recommended it to to our cut to our customers and our clients to say here That's the pairings that we want 2024 is where we plan to set up multiple availability zones within these regions so that They so that we can support region zonal failures, right? so You know if a region were to fail and this is where some of the Talk that I had earlier about South Korea comes into place, you know, we don't live on a day-to-day basis in Korea thinking about our Unfriendly neighbors the North right it's not really what we think about on a day-to-day basis But from an IT perspective we do think about that because in the same way that you know Maybe in California you might think about earthquakes or that type of thing In South Korea, we always do have a threat So we don't know at some point when something might happen where either power gets shut down or systems get shut down or Infrastructure is shut down. So we always have to ensure that Although we have headquarters there. We're backing things up into New Jersey. So there's an arrow that goes from HQ other way to New Jersey So if anything were to happen in South Korea, all of our systems are would are Replicated in New Jersey for disaster recovery in that sense, right? And then New Jersey can also be set up for Dallas as well as other regions as well And you can think you know, it's also costly to do this But we can also think about systems such as the accounting systems that run all of Samsung right those systems are global They need to be up to date and if those were to come offline a lot of business would essentially be halted across the globe So so it does end up being a pretty big deal in setting this up These are some of the customers that we have there's a good mix of a lot of Samsung companies as well as some of the non Samsung business that we have as well The cloud platform itself that we built, you know, it's largely for enterprise So the main goal for that is we're not really the type of cloud where you know We're not like a hyperscaler where anyone can just kind of come in Punching the credit card get some VMs and start uploading the workloads like we're really around like managed services for enterprise So we have fewer customers, but the customers tend to be very large And with that comes a different set of expectations That are around HA and disaster recovery like they tend to be less price sensitive But they want to make sure that their workloads never die right or and and you know, sometimes it can be you know There's really you know in in even in cloud and IT services as much as we build in redundancy. There really isn't a 100% uptime right, you know, we talk about five nines or three nines or you know six nines things like that But a lot of our customers are like, you know, we want a hundred percent a lot of times some of my job is educating them around The complexity around that as well as you know, how do we sort of manage cost in those things as well? So we do focus on enterprise market security is probably number one You know the Samsung group is very sensitive when it comes to any type of You know exposure to security so so we make sure that you know, we are You know one of the most secure sort of cloud-based systems out there if we if we You know need to add additional firewalls and if we need to be behind certain firewalls we ensure to do that Even if it's additional cost and to the point where you know, it does affect things like performance Things like scalability these things do come into play Simple we do have an a CMP a cloud management platform that we've built Largely it's used by our own internal staff because we are a managed service provider But you know one of my goals is to sort of empower more of an enterprise customers to manage and run these things themselves And smart is well, we have a variety of ways that we can run our cloud Which I'll talk about in a sec. These are just the different service that we have So you know in addition to standard is and that's where we are leveraging open the open-stack platform We have built a lot of managed services on top of that right so managed databases, you know managed You know ML platforms things like that Now to be honest the current cloud that we've built here is not based on open-stack You know we built this cloud in 2019 based on commercial software commercial licensed software Right and one of the goals that I've had since coming into the company was we need to embrace a lot more open-source technology like we cannot continue to Pay these licensing fees as well as being in closed-source systems where we really don't have control over the new features that we need Right and there is no real well. There is no actual company out there that sells Here's commercial cloud in a box that you guys can use right it's all different You know there's virtualization software. There's there's hyperscaler software So bringing these things together in a way that we don't control the source code and we have limitations to the areas that we can That we can configure the ways that we can build In my opinion is not really the the best way forward So the initiative one of the initiatives I've been leading is we need to we need to rebuild a lot of this stuff in open in an open-source base open-source base platform using open-stack, right? And that's really what brings me here today talk about a lot of this right Public sector so, you know, we're really trying to branch and expand out into government. So this is local Korean government work So those are the different certifications that we need so we need to build an open-stack platform That complies with local Korean government as well as other governments around the world That our data centers are located Finance this is somewhat similar in that like you know We actually separate our cloud pools for government and private public sector as well as a finance sector because they're different types of You know, I guess what we would call here like PCI compliance and these types of things Need and there's some regulations around ensuring that physically the hardware is kept separated out Personally, I don't like doing that But it's kind of something that we have to do to have different pools of hardware and platform set up And so for private dedicated cloud like this is where a lot of our customers One question I get is you know Why do they why wouldn't one an enterprise use the Samsung cloud platform as opposed to one of the major hyperscales out there? and one of the reason I typically give is You know their reasons why they cannot use a hyperscaler right whether some security whether they want some hype or something on site And so here, you know We have configurations where customers provide space power and we'll just put in all the hardware for them, right? And we'll manage it remotely There are ways where we can disconnect it if they don't want that network connection where we can manage things remotely And we'll go on site to manage it for them, right? So and we'll put people on site to do that right so some of the stuff as you can see doesn't necessarily scale To a to a to a to a smaller enterprise it would only really work for large enterprises And that's really what we what we try to do So that's kind of what this was talking about here is you know There's a dedicated line where we'll manage it remotely On the on the side here on the private cloud side, but the dedicated cloud is like we'll just put everything You know, we'll put all the operational hardware in We'll put everything into your data center your site so that you have your own cloud running And we'll even manage it for you on site. We'll have engineers there. We'll have support staff there You know sitting in your offices all day if you want us to do that So what are our customers need? What are our customers look to us when it comes to? building high availability right so For our customers really the most important thing is is SLA like they want to make sure that things are up and running that the Workloads are up and running, you know the Samsung electronics teams the semiconductor teams all the different the non Samsung Enterprises that we work with, you know, they're really set about like Is our application going to stay up and running? Well, how you do that? That's sort of like the third point. They just trust us to do that, right? How you do that is really up to Samsung SDS But what does that mean for us? It means for us like we need to support zonal failures, right? Like if a zone were to go out because of a fire and we did have a fire Roughly ten years ago right in one of our data centers and you know, physically we were we had people You know manually like moving servers and trying to salvage servers as much as they can And you know when that when that data center did go on fire, right? So So whether there's that you know Korea also had a major outage last summer You know it was for like I think it's for one day because a data center a non Samsung data center did have a fire That went on and it kind of took the entire country down right because it was a major application that runs everything from chat to Booking a taxi right to getting the the maps Driving maps as well as like the metro maps right all of these things So essentially the whole country had to live sort of without that for for a day And I really kind of crumpled in crippled the entire country, right? So You know although that wasn't one of our data centers like we need to make sure that these systems really like in a sense Never go down, right? So for that it means, you know, how do you set up things so that it can handle both a zonal failure So we can have high availability as well as scaling, right? You know if there's a lot of spike usage, how do we spend into the next zone next to us, right? Our our customers tend to be less price sensitive I did say within reason because they still do you know try to push us on price and things like that So it's not like, you know, they give us a blank checks and just do this So it's not the world isn't like it's not that good, right? But like but They do care a lot more about uptime and making sure these things stay up then they do about You know saving a little bit of dollars here and there, right? So our customers really want this type of high availability On the disaster recovery side, they really for no data loss, right? Like and and no data loss is a really tough tough sort of thing to negotiate and you know personally I didn't really negotiate some of this stuff and I probably would have phrased it a little bit differently But a lot of our customers like we don't want any data to be ever be lost in any situation, right? Whether it's you know Whether it's threats global threats, whether it's you know Like physical threats, whether it's like natural disasters. They just don't want any data loss, right? And they want to recovery time within minutes, right? They don't want to be You know, they're not they're not content with just saying okay We didn't lose any data, but it's going to take you know a day or a day or two to get up and running with your other systems Like you know even in these types of disaster coverage, they still want to be up and running within minutes off of a data center. That's Globally, right? So we always have to make sure that these backup systems are ready to go. We have to test these systems You know, we run a lot of tests to make sure that you know We are running off of these other systems periodically and even things like you know that we wouldn't always consider things like You know if you have an in-memory database that takes a long time to load up into into memory because the databases You know can be in the terabytes of size Then how do you make sure you have a backup of that in memory in case it goes down while you're trying to put it in memory? So there's a lot of like different things that we always have to consider because they have to have that recovery time in minutes So this is from our customer perspective. They're looking for this high availability. They're looking for disaster recovery They're looking to always be up right and hopefully I made that part pretty clear at least But from our perspective right as the guy is running in the cloud. What do we need right? Well, we need we need multi-tenancy right like we need to be efficient because even though They're less price sensitive. There may be some other customers are you know, they still are price-sensitive. So we need to you know Profit revenue minus cost right so we have to still drive down our cost as much as we can So how do we get as much multi-tenancy as possible? How do we get you know, you know Samsung's internal workloads to work with their external facing workloads? You know, can we put them on the same set of hardware even though they're on different, you know Network topologies, right? Can we put non-samsung business onto the same servers as we have our Samsung business, right? Like how do we ensure that that these things are secure, right? How do we put global how do we how could we be more global? And how do we sort of minimize these distinct cloud pools? How do we make sure that our hardware can be standardized? How do we minimize any type of snowflake type of technology snowflakes that we put into here, right? So from the cloud provider business, we're always looking for for these types of things, right? So we've taken all this consideration to try to build an architecture That we think might work, but some of the architecture challenges that we had I'm primarily talking about Three major areas like one is around identity keystone. How do you manage? Logins and people's identity and access management across this type of global infrastructure that I talked about You know that's always one sort of area What do we do with the control plane the open-set control plane like, you know How can we sort of run the control plane across, you know, these different global areas like that? You know things like stuff OBS like there's always performance constraints in terms of like not only management, but also You know, how do you ensure that that is fast, right? like if you're if you're running off of a data source that's that's far away, but you're you know Your actual workloads are running here Might be easier to manage in certain ways But like, you know your performance will probably be pretty bad, right? So customers won't be happy about that either, right? So those are the things that we've done and we've talked about, you know We've explored a few options here, right? Like the first option that we kind of looked at was let's say all of our data centers is one huge cloud Right, and this is like West region East region. These are like different zones within the regions But if you thought about this globally, right? Like you can be in Korea You can be in Europe you can be in the Americas like if you think about everything is sort of one big cloud, right? Then well, you know, we that's why this isn't stretched right now. This is like You know, there are pros and cons of this right like like some of the pros are like well It's easier to manage, you know, like we don't really have to talk about syncing data and things like that But at the same time like the performance will probably not be all that good, right? There's some overhead here You also have some security concerns, right? Like let's say for Whatever reason, you know, there's a security breach into your identity right into your keystone, right? So now all of a sudden this breach is a global breach, right? So your blast radius is huge now, right? So on any type of like those types of concerns are kind of what some of the downsides in here, but on The plus side like hey, we can sort of manage everything in one place and and it's relatively easy to do And we think this sort of stretch way has some of those merits, right? But if we look sort of on the other side of things Well, we also said well, why don't we just set up one data center is one cloud is one region, right? So here we get rid of the concept of like Availability zones like instead of zones and regions stuff. Everything's just a region, right? We just have a bunch of regions throughout the world, right? Now if we do this like It's relatively easy to set up actually The hard part is keeping things in sync like that's a hard part right like, you know, you'd have to build some type of layer on top of the identity API where it's like Hey, this is so that you don't access the identity API directly across these different regions like you'd access it in one sort of like Portal and then that portal would sort of manage it and keep these things in sync And that way if if you know from a security perspective Well, if your portal gets hacked you're kind of screwed But at least if they hack one of these regions here, then you're you've limited that blast radius to here, right? so You know and performance wise it's actually pretty good right because like you're you're you're within you're within your own region right there Right, but like scalability. It's a little bit suspect to right because like, you know, let's say West a region Is really popular and you need to you need to scale up over there And we're running out of you know compute or running out of block But then you know be really nice to just like add things into be and hook them together But like some of that might not be so easy to do in this kind of situation, right? So there's definitely some pros and cons with that there's kind of a third approach which is Kind of a bit of multiple concepts is sort of a hybrid type of approach where you know The things that are kind of hard to do like identity and keeping these things in sync and like if your identity API in the other picture Got out of sync in some way like it'd probably be pretty hard to troubleshoot that and get those things back in sync again So like for some things like that like let's try to like stretch that out and manage that in sort of one area like that But for other things like the compute API, let's let's do that and let's have this concept where you know West ABC are all zones and they can sort of stretch across each other so that if you know if zone a starts getting full We can sort of add things from zone B and physically, you know, this is the same campus, right? So that region should be the same area. So it's like one building over so like you wouldn't have a huge performance hit in theory So, you know, let's let's try to look at this type of concept, right? So What we're doing as something as CS is we're sort of pursuing that third option, right? Like how do we sort of pick sort of best of both worlds here of like management and ease of management yet? Still managing security as well as performance, right? So but you know Like it's still in progress, you know, we'll probably make some changes to that architecture diagram that I showed you right now You know, I mean I said if we encounter some unforeseen complexity But like I think we probably will come across some more stuff, you know Like how do we keep like in this picture that we had here if we have multiples? You know stuff storage across regions, but then they need to be kept in sync like, you know They're like we need to look at how we do that. We have to build lifecycle management tools to like say well, how do we? continually upgrade and manage this platform and environment, you know We have we're looking at building our own set of lifecycle management tools to support this architecture But you know But I'm always open to finding other open-source tools or things like that if we can You know how to set up a geo distributed database if it's if it's a global keystone the way that we talked about in there In that other picture, how do we set up a geo distributed database for that? So those are the different types of like challenges that we have right now that we're trying to figure out and that we're trying to answer Now from a disaster recovery perspective You know these were the different sort of pairs that we had set up earlier. I think there's a So oh, yeah, this one looks like it. Yeah, so there's a set of pairs in terms of like where the main data center is and where the disaster recovery center is and This is how we set up the format is kind of weird on this But this is how we set up the way that we replicate storage and database here, right? So if it's a VM or data snapshot, we just use auto image replication for file storage It's at the volume level objects We use bucket level replication for block storage of the VM in the bare and bare metal side use volume level Replication for that for the databases depending on what the database is For postgres and an epox of my SQL, you know, we've used DB replicas for some of the ones on the bottom It's more of an object storage basing that we've done for that So this is how we've set up disaster recovery to ensure that the data itself is my is is replicated with you know As little sort of downtime data loss as possible, but there's work on top of this, right? This is the data side. The second side is also making sure that things fail over that the applications themselves are also updated on a regular basis as well so Those are the main sort of areas that we've been struggling with that we're talking about so for next steps here You know if some of the stuff that I was talking about today like it's kind of an overview of some of the things That we're struggling with and that we're trying to work with that something STS around You know multiple availability zones disaster recovery and how to architect these types of systems and it kind of falls under some of the large-sales SIG stuff, so I mean in addition to some of this, you know, there's also discussion in the large-sales Scale SIG about you know, how do you scale up and what are some of the bottlenecks and identifying those bottlenecks, right? Like things like that rabbit MQ, right? How do you split that up and we're talking we have Different rabbit MQs for different areas like even within compute, you know, we split that up into like the RPC rabbit MQ So we've done some work there to ensure that these things scale as well As well as a database side like sharding databases, so those are the different types of topics, you know things like You know cells versus regions and you know what's going on in different areas Like those are the types of discussions that we also have in that special interest group in addition to like areas Like like this as well. So, you know, we're really talking about like how do you work with open stack? Essentially to build not just like a private cloud for one company or One vertical but something that can be used across multiple verticals the way that we're trying to do it as well as across Global data center, right? That's a type of work that that we're largely involved in so You know our course in Korea We're on Korea time But I was happy to discuss these kind of topics with with anyone who also has a type of interest You know one of the things that I've sort of figured out in this You know path I don't know. I just don't like using the word journey, but in this path is You know there really isn't a right answer out there in terms of like here's a standard sort of given way that you guys need to do This here's an accepted way to do this. So You know, we've been exploring a lot in our way and this is sort of how far we've gotten so far Yeah, so this is my contact info. You can always email me or chat, you know be here here as well And out of that and staying so there any questions. I'm happy to answer them as well I think we have a couple of minutes do that. Hopefully you found this somewhat interesting. Thanks. And if there's nothing you're all dismissed Thank you very much