 Good afternoon and welcome to Turning Pets in the Cattle the stickiness of data My name is Chris Cannon. I'm with Hewlett Packard Enterprise work on the cloud engineering team I'm Leon. I'm well for Intel. I'm a senior cloud architect And my name is Gert Prussman. I'm with Mirantis and the cloud solutions team and the three of us are members of the Enterprise working group here for the OpenStack community and we Are collaborating on this presentation today and one of our goals here is to Foster more participation from you. So when we get to the Q&A section We hope to be asking you questions so that we can help forge Enterprise community solutions Forward with with future presentations and learnings So with that a couple of things about what this session is not about First, it's not about CICD second, it's not about deployment techniques or automation tools And third it's not a discussion about agile development practices All of those things are important, but we remain focused specifically on data and how it applies into application architectures today So our agenda for this session First we wanted to start off with a recap of a session that was delivered in Tokyo where we sort of teed up this Conversation to give you a little bit of history if you weren't there for that one Then we'll jump into our discussion of data and this is where we would really like your participation and assistance Talking about some of the challenges of data and moving it to a cloud world Also some of the strategies off the top of our minds for possible ways of doing that and we hope that you will help us there as well and Then at the end we'll have this Q&A session where we hope that we're asking you questions and you're providing answers So with that the recap Dr. Sun Thanks. Thank you Chris. So I'm Leon as I mentioned just now So I just want to give you guys a quick recap of what we have done in the last summit in Tokyo So I have a co-presentation with Stephen Wally so from HPE so we talk about Turning paths into cattle, but during that presentation We're focusing more on the web here and the app here so in this talk We'll be focusing more on the data here So I just give you guys a quick recap of what we have done last time So in the last presentation we talked about the differences between pads and cattle So pads basically referring to the traditional conventional workload and the cattle basically referring to more towards the club native or club Oriented application. So yeah, if you're interested in that presentation There's a references on at the bottom of the link there So we can just go to the link and this presentation slide we have published this online after the talk So we can get the link there And we also talk about the differences between virtualization and cloud so in the virtualization world, right? I mean most of the structure that we use is more like a scale up model is a horizontal is a vertical scaling model So we rely on the infrastructure for the resiliency, but in a cloud-based model The application tends to be a very disputed model disputed Dispute architectures and it's a very horizontal scaling model and the application itself is designed to be responsible for their own Resiliencies and not sick or independent of the underlying infrastructure. So that's the key differences between virtualization and cloud And we also talk about the design principle the differences of the design principle between the conventional app and cloud-aware app So I also provide a link there Published by the open data center alliance talking about how do we architect cloud-aware applications? So these are some of the key Design principles that for conventional applications that it tends to be monolithic centralized state But in the cloud aware app is more disputed use microservices and one of the key things is about this Eventually consistent. So this is a very new concept when compared to conventional applications So in the previous talk we talked about multiple strategies that how can we turn the past applications into a cattle? cloud native model so basically what we suggest is Look at your application don't look at application as a whole when you want to migrate the app to treat your application as a multiple components So one strategy that we sorry So one of the strategy that we propose a suggest is you can actually move in it in the conventional application Conventional models you have your own web VM web here and app here Those are in the VMs right you can migrate those VM as a whole to the cloud That's fine. That's one strategy that you can adopt and this strategy the benefit for using this strategy is it doesn't require too much of Re-architectures, but it doesn't give you the best benefit of the cloud features So there's another strategy that we propose is actually we suggest that it's actually using Kind of like microservice model So you you treat your applications a web here or the app here as a different sub components so you only migrate certain components to the cloud and slowly one by one and you can and one of the Example and we use in the demo is we can actually host the static content if you're using a treat your apps, right? You can actually host the static contents like images or html files or jet JavaScript files in the cloud Using an object storage. So this are three different strategy that you can consider when migrating application from Pat's model into into cattle So with that today, we are just we'll talk about the data here data stickiness So let's jump in data stickiness First let's level set a little bit about some of the challenges that we face with data sets as they're utilized by Applications whether they're legacy applications or cloud native applications. They deal with the same set of data requirements in many cases The first point is every pet is different as we all know the data set that they're Using the way that they manipulate that data how it's processed. Is it ephemeral or static? The data sets are different for every sort of application. So every pet has its own unique dependency on data As such there's some challenges in dealing with existing data sets One of the largest problems from the technical side is how do we move data into the cloud? You've got the technical aspect of it. Do we migrate it somehow through file copies or import export? Do we set up a replication scheme somehow? Do we make? Carbon copies of machines and set up a parallel environment To be able to move this data There are as many different possible paths to that as there are technologies available today So this is an interesting technical challenge in Migrating data the second piece is after you've done that you need to ensure that your data is accurate So there's a dependency on on making sure that things are consistent and the integrity of the database still there Depending on the migration path that you choose you may encounter licensing problems Perhaps you can't stand up a parallel System because of the unique licensing for the database technology that you're using Also downtime copying large data sets requires a lot of time Perhaps your business your organization your customer doesn't have the luxury of of being able to to consume that much downtime The second challenge is also about how we virtualize data as composed to a typical Legacy sort of database that we know and love today Some of the things that you need to think about are the SLAs that are related to accessing this data Customers are concerned about data access policies that may be different in a cloud model the performance virtualized databases May not be as performative as your highly tuned legacy enterprise class databases that you have as well the recovery time objective and recovery point objective Data is really important as well for for most enterprise SLAs Etc etc. You can there are many arguments In multiple directions for the complexities of getting your data migrated So this will be the first question that I'd like all of you to think about for the Q&A at the end of the presentation Consider this with the customer years you're working with or the challenges you're facing today the experience that you've had doing this to date and At the end of the presentation, we would like to hear from you if we've missed something on this list If there are items on this list that are more important than others How can we as an enterprise working group help create? additional training collateral reference Documents ways that we can help expose information so that we can all make this migration a little bit simpler so we discussed that we would present three different options for possibly moving your data today and I'll take the first one and it it at simple Explanation it looks like the easiest one. I would offer that actually It's not necessarily any easier than the other two that you've seen But there are some pros and cons to each of the options that will present this afternoon The first being don't do anything with your data preserve your existing legacy database infrastructure many businesses have made a significant investment in enterprise class databases and Part of this is not just a reluctance to get off those things that they've been so comfortable with over the years Whether it's an administration training usability Focus or whether it's a reliance on the types of SLAs that they get around Using enterprise class database products But I would offer that that you can also realize a less risky migration path by Focusing on moving your applications to a cloud native platform first and Then pulling your data along in a phase two approach So that you're mitigating the migration risk to some extent To support this we're finding that most most applications and Customers that are using a pass layer whether it's cloud foundry or or most of the others Those products have capability to connect to legacy databases easily so there's no reason for Customers to expect to have to port their data at the same time that they transform their legacy applications The third data point that we've learned through through our research here is that you can continue to leverage Some of the capabilities that you have in your existing enterprise databases that might not necessarily be available or fully fleshed out or at parody today with some of the cloud based database applications and Gert will be covering some of those considerations in option two So why should I move my database to the cloud? Probably because I want to benefit from the advantages the application guys Profit of I want to have horizontal scalability for example on-demand database instances scalability and elasticity and For this I need a solution for the database on the cloud. This is one example how I could try to achieve that We picked one example here With my sequel and the Galera library to have the opportunity to use scalability with a Galera cluster and Galera offers you active master master replication and We have here the applications on top cloud native applications scalable horizontally elastice elastically scalable and you need a load balancer to Load balance the the traffic from the application to the database and then you could scale out the database cluster the advantage for the for the Project is that you know the technology you know all the tools you know how to manage the The database because you might already use it for your application anyway, for example on-premise In the past you know all the operational tasks necessary to maintain the database and The gap between the old technology and the new one is very very low And it brings you the scalability that you want to have on the cloud But there are some limitations on the other hand as well. So for example You need explicitly Primary keys on your on all your tables the update on schemas is different and some other other limitations as well like it's only possible on the innoDB database engine but Not all and not all engines are available, but at least with this subset you could work The operational tasks remain so there is no automation you have the scalability you could scale out the cluster You could provision new instances, but you have to do it by yourself. It's a manual task So obviously it's not the level of automation that you Probable you are looking for on the cloud All the automation efforts that are necessary to automate it have to be done by yourself Another option could be to use The open stack project trove which is in fact database as a service It provides you with a self service to provide to provision your own database instances It's a very easy-on-demand database consumption and It offers you a lot of databases as of today For example Cassandra and MongoDB my sequel and a lot of others The advantage advantage of trove for you would be that it would automate a lot of the tasks that are necessary to Deploy the database maintain the database Make backups from the database and all the different and various tasks In fact not all of these databases are currently considered as stable in conjunction with trove Because the data store that is used for the databases It's only considered stable for my sequel currently and on a technical preview level for Cassandra and MongoDB And all the others are considered as experimental and the whole project is But I'm not very major stable I'm not very major level from my point of view. It's considered as a very very young project But on the other hand it offers you a lot of functionality So there are currently some gaps and limitations with respect to the self provisioning services and all the the functionality that it delivers to you we compile a short list of functionalities that fulfilled by that we consider as major and with trove But for example some of the operational tasks that you are required to do are not very major So for example push button compute scaling That is not automatically done It requires you to intervene manually the same with the migration to a new database And that's normally something that you would expect from the cloud out of automated cloud environment that it does it for you It requires an external backup solution there is a backup possibility with respect to swift but It delivers not all the functionality you would expect from a full blown backup backup software in case that you are Required to change your database engine, then it's a good idea to have a close look to your application We did this for example for WordPress WordPress is a well-known software and It uses a very my sequel century code base So what would be necessary to use another database for example no sequel database Then in fact you have to fork the application and introduce all the functionality again And refactor it for a new database, and that's a lot of effort needed. That's a lot of costs Because you have to maintain these fork in the future as well. So if there is a new functionality in the application you have to Develop these this functionality on your fork for your database as well you could introduce An abstraction layer for the databases for all databases, but that needs a lot of work as well So it would be required to invest a lot of work and money in this this task The other possibility would be to have a full database abstraction layer It's again a lot of work that needs to be done to achieve this So it depends on the application and the database that you are currently use if the two options are Possible for you or viable options. Okay, so the third options that we would like to talk about here is Migrating converting our DBMS into no SQL. So before we do that, right? We always have to ask ourselves one question. Why do I need to migrate from our DBMS to no sequel? Is there such a need for us to do so or not? So I think there's nothing wrong with our DBMS, right? So but the things is in today's world I mean most of the new use cases coming out right if your mobile applications you're doing big data stuff. You're doing IOT stuff Architectures is widely distributed. It's a very very widely on horizontal scaling architectures And we have to dealing with a lot of structures as well as unstructured data But those unstructured data is just not satisfied I mean just not you just can't meet those requirements if you continue continuing using the our DBMS system If you are coming into that situation whereby you need to support those use cases That will be the point whereby you need to consider should I re-architect my applications? The data tier layer from our DBMS into no sequel and of course and if you still focusing on those OLTP transactions and structured data are very concerned about the ASIC model of the traditional models Then that might not be a good choice for you to move into no SQL. So it all depends on your applications so Okay, so when you talk about migrating our DBMS to no sequel, right? There's some other things that you have to be aware of because in the our DBMS system basically have to design a schema and you run a lot of different SQL query thought procedures and But in no SQL, you basically don't do those things anymore, right? Generally, you don't do those thing animals and escape the way that we model the data the data modeling will be different as well In the our DBMS, so everything is more than being like two dimensions data is right the rule and columns of data, but in the No sequel you talk about key values things you kept talking about Documents based so those data modeling will be totally different so if you're considering migrating to no SQL you've looked about how we we have to change the way that we model the data and other things like Integration if we integrate in your applications in the our DBS model you we run a lot of we have asked you a lot of different SQL query You do a select you do update do all those things, but we come to no sequel Those are no longer valid for you, right? So you basically have to build that into your application doing like a DB dot insert DB dot update something like that so there's a totally different model and Of course when you migrating to no sequel the concept of foreign key might gone So you might not have no it when you do the data modeling all those things the foreign key might not be exist anymore and Definitely the node AC the AC more is gone and in the no sequel that basically the model tends to move towards the Ideal about eventually consistent. I'm not sure you guys heard about the CAP theorem in a distributed application the disputed architectures consistency Availability and partition tolerance the concept about this CAP right? I mean in a disputed application you can only choose two of them You can't achieve tree at all you can you cannot achieve tree at the same time You either have consistency availability and sacrifice the partition tolerance and in a no sequel model the time to move on towards Availability and partition tolerance. So that's why this we have this concept about eventually consistent So you are not always getting the database right? You're not you might not be always getting the same data at the same time, but eventually you will get consistent So that's a new concept about moving into no SQL So if you if you really want to migrate into no SQL right there are few migration tactics that we can consider here So one thing is how do we move? I mean how do we move the data from our DB as to? No, no sequel for example, my mongoDB. You can build your own script to retrieve all your database data and then put it into the No SQL database can use some of the ETL tools that you can use and some of them actually Actually considered using Hadoop so I wish they actually run a Hadoop clusters and Hadoop using some Hadoop mechanism to retrieve the data from your RDPM systems and then use a Hadoop to do the processing and then put them into the No SQL database and another strategy that you can people consider is Do a snapshot and do an incremental transfer and other things that I want to talk more about is this application driven model so what I mean by application driven model is One thing keeping the data in both RDPMS and node SQL concurrently during the transition period So when you're migrating the application during the transition period, right? You just have your application just have to write two things one write to the RDPMS Also at the same time right through the node SQL So keeping two data at the same time and the second op and in fact this such these options I mean these tactics have been used by a lot of companies today and Another thing is what I call on demand So when I mean on demand side basically what I mean is your applications, right? For example, if you want to read a data from image data, okay you first read from node SQL the new database that you're pointing to the node SQL and If that data doesn't exist in that node SQL, right your application logic has to be built to Can be built to consider fallback to the old database and then when you fall back to the old database You copy that data to the node SQL on demand and the return the result to the use So that's like an option But of course these options right you're involved more Transaction time because you basically reading the you first try to read from the new database first It's not found there and try to read from the old database and it might copy that that are over to the new database and then Return the data to the users. So there's a second tactic that people can use can consider using that of course It all depends on your use cases. What is your application? And this are a few tactics that we put up here and for your considerations to summarize what we would be provided until now You need to understand your pet applications have a look at the source code the options With respect to the databases and the different models Some technologies that you already know probably with the with the old applications and the old databases are Available on the cloud some are not so they need replacement and this has Consequences for the management of the operation of the database or the schema or the technologies that you use There are different design principles with respect to SQL and no SQL you have to take this into consideration as well and There is a transformation necessary from the traditional culture to the cloud culture with respect to the technologies and the processes that you that are required and your devs dev teams and operation teams need the skill set to Change either the application or the schema or the technologies or maintain and operate operate all these technologies With this I would like to ask you what are your experiences with the data migration or migration of data and databases to the cloud We have a microphone up here For those of you that have experience you'd like to share please do so Hello, yeah, I've got an interesting use case I work as a workflow analyst in research science computing We've just adopted open stack and the pets problem I have is that our researchers basically deal with really large collections of flat files. They're like 55,000 images for example of earth science Models and things like that. And so we have in there very attached to them just like your pets analogy here so we're trying to figure out the best way a combination with Seth an open stack of Figuring out the best way to manage that as we go forward. I'm I don't know We don't have the answer yet, but the best I can do is come up with an idea where we have Physical hardware in the middle to handle them the computational side the workflow then on the pre-processing and post-processing side Have persistent storage on either end of that that we bridge into the middle Which might be ironic or however we do that. I'm not we haven't really figured it out yet But that's our use case, and I don't know if you have any Observations on that but it's it's a tricky one because it's really orthogonal to your example of the rdb mess kind of idea But it's Another one out there. Thanks. I work for rock space and as part of one of the projects. We had to move it up from Posca's database into my sequel database be using mongo and one of the departments we had was the two different applications and At the end of the day if we had a legacy application That was removed to the new application. They both need to have the access to the same data one of the key departments we had was Users using both the system they need to be able to see data in both the systems So one of the biggest challenge we had was we tried a two-way sync that more copies data from MongoDB Into the post this database and that was the biggest pain we had So I think in this case is the customer business requirements they dictated what kind of strategies you need and Moving the data from no sequel to a relation database. That was the biggest pain we had basically moving from no sequel to True my sequel to my sequel post-graph my sequels I'm gonna do a two-way sync Yeah, so some new applications reading that one and I also need to read it Right, so we had users are both that both the applications and The good critical give the criticality of the application. We could not do a fast cut over or hard cut over So way to do transition of way of time the system was in beta for some time But you make sure that we could support both the systems Yes, we mentioned just on every application just different characteristics, right? So you really have to understand what your data is and what is the best strategy for you to consider the transactional model So I could respect the might because I try to recall the sessions So the legacy application was built long time ago. It was built on relation database using All technologies the new system we're building. We had to move to cloud application cloud basically run everything in the cloud We didn't want to spend We didn't want to use ID BMS that new system you're building need to have flexibility for making change in future So instead of doing ID BMS relation of this set to go with no sequel Hey, so I work for a solid fire now part of net app But in a previous lifetime I've done most of the things that you discussed and encountered the challenges there and In moving from a traditional DVMS to no sequel What we found with our application architecture as we were we were having explosive Scale growth at that time and during that transition we moved to a service oriented architecture Because that was actually I was a data-driven decision because as we moved out of the DBMS out of my sequel We found that there was a subset of data that truly Needed to exist there. That's where it made the most sense. This is user file information And so we made a basically user file service that Remained Responsible and was the access the web access point for that particular You know piece of data whereas the event information all the geographic locality information for for tracking eventing was moved into Mongo and you know so that so the bulk of the data moved into the you know this kind of Document store and you know kind of cloud-enabled You know type of structure Because it actually it was all JSON data and made the most sense to put there rather than you know sticking in the blob in a traditional Data store Whereas you know the user file stuff that we've been doing for decades You know user accounts information like that just it deserved to remain in the DBMS because that's where it made the most sense we just simply Architect the service architecture around those two different needs use the right tool for the job for the migration pattern we used a live migration used a Shunt method and then a backfill a verification backfill to make sure that all the ideas have been transferred over so that took like You know week or so to move many many terabytes of data, but that's how we accomplished that Thank you Great observations Anyone else we still have time We thought we would put this information slide up in the background so you could take notes write some things down our open-stack enterprise working group Has a mailing list we have regular Regular calls we'd invite you all to participate in of course as you think deeper about The questions we're asking here and and some of this subject matter. We would love to hear your experience And thank you for for those of you that have shared your experience so far so Others you please use the microphone I'm just kind of curious about what type of support models that you have for using Databases inside of open-stack, but yet You know usually inside of an enterprise you have the open-stack team, and then you also have a Database team that handles pretty much everything else. I'm kind of curious What you're all's experience has been in order to actually provide You know a homegrown solution To transition to get additional teams to support the databases that the open-stack Is going to be using so Really good point. I think you'll find that In the network space You might see the same sort of challenge right where you where you're adjusting your your staffing for your cloud Administration to adopt the practices that oftentimes were broken out to a specific Team or tactical unit, but I think that's a fabulous topic for a future presentation. So Anyone have an answer for this guy your own experience I know there's a couple of sessions one that I heard earlier today that you might be interested in around supporting Open-stack cloud that they did talk a little bit about the challenge of moving from a Dedicated support team that was deeply focused on things like databases or networking, etc. To to a more holistic cloud-centric More of a breadth support team if you have ideas for topics That you would like to hear about in Barcelona Please send your ideas to this list as well, and we'll try and take your feedback and and prepare for round three At the risk of being verbose and talking twice to answer the previous question having made the transition from a traditional operations environment database administration environment to transforming my own skill set as well as Hiring a team of ops people and turning them into dev ops people That was a similar experience that was along the same time frame as As the transition I talked about earlier with the migration we went from a traditional monolithic architecture vertical architecture To a horizontal architecture and in doing so we had to Stretch and make uncomfortable the operations folks who had a few gray hairs on their heads like myself As well as bring in the more operationally inclined development guys who really wanted to get into the guts of the system so we created a dev ops team by bringing in folks from both the old school and the new school to kind kind of you know Create and and transform into a dev ops team that could manage a breadth of environment and we did specific cross-pollination exercises to to increase the breadth of people's skill sets because it becomes more of a matter of Knowing where to go to find out the answer rather than having everything in your head for a specific Area of expertise you have to be quick and swift and be able to move back and forth and track it down quickly All right, I think we are at the end of our time allotment so On behalf of my distinguished colleagues, and I thank you so much for your time. Thank you for remaining here Thank you for your feedback So we are the enterprise worker we published to e-book So we talk about what is open stack why open stack and second you would talk about how to implement open stack in your enterprise Company is fairly available is that we have printed copies here So foundation print of a couple of copies here you can come and get it if you want and there's also e-book version If you need an e-book version just reach out to us Thanks everybody enjoy the rest of the conference