 I'm going to talk today about at least two of them, which I'll post in a month, I don't know exactly when, but probably within a month, maybe a week. I'll call you a little sooner if I weren't here, but that's how that goes. We thought it would be a little sooner than now, but it wouldn't be a conflict, but that's not how it worked out. So how many people here have used it or haven't installed it in some way? So maybe a tenth of the people here, once I could use it, and a third, or maybe a half, but I'm going to give more background and less detail, because a lot of people here have not spent much time with it. So are you talking a little bit about what AT-Cluster is? What can you do with AT-Cluster? You know, what you can do with what you project? How people use it? What kind of customers we have? How many of you have them? And then talk about what release two, release one, what the current release can do, and release two, which is the one we'll be out very soon. We'll do. And talking about YouTube for this, I'll put that slide in. And I'll have time anyway, so I'll be out of time anyway. So the idea of putting together a group of computers at AT-Cluster basically means a network machine acting as one. So taking the right of service continually, or more or less continually, even when one system component fails, when the whole machine fails, and this fails, whatever, when one machine goes down and the other takes over, it's work. This follows things like IP address takeover, things like that. It's not primarily designed for kind of clustering. It's not primarily designed for high performance. Although you can't get performance improvements out of it, that's not its primary purpose. One thing I want to say, there is no such thing as 100% availability. It does not exist. People who talk about, I want 100% availability. What they really say is, I want to spend an infinite amount of money and effort and time to get something less than 100% availability. Because, you know, it takes days to occur. There's a lot of diminishing returns of lives here. The more you spend, the more you have to spend to get the next improvement. It's like calculus. You know, you can cut the distance of half each time. Maybe we like the example where they got the mathematician and the engineer. He said the mathematician has the other half of the distance. You can go towards this pretty girl at the other end, but you only cut the half of the distance of half each time. He says, huh, I'm not going to do that. You set the engineer down on the same couch as the other end. He means he starts half of the distance. And he asks, what do you do that for the mathematician? What do you do that for? You never get there. He says, but sooner or later we close it up for all practical purposes. So, you can get H.A. when you're close enough for all practical purposes. You really can't get 100%. Just like the mathematical example here. But you can be close enough that, you know, you keep your job. That's the main thing. It can make your actions very short. It's designed to recover from single failure, not multiple failures. Although it often does, it covers multiple failures. It's a lot like a magician's trick. And in a lot of magician's tricks, sometimes you look at him and say, wow, how did he do that? Sometimes he excites and says, that's a cheat trick. That's not so good. And the thing is true about H.A. Sometimes you'll think it's wonderful magic. And other times you think that can be better. But H.A. is designed to improve. It's one of the few things you can do by 90%. What that says is you're still left with tips. So, of course, as you try to improve it more and more and more, it gets more and more and more complex. And eventually your complexity becomes your entity. And it becomes so complex you can no longer maintain it. So one of the things we try to keep in mind as we design things is complexity is the entity of reliability. We've really worked the engineer out single points of failure. It will cause the whole system to stop delivering service. And good H.A. design will eliminate single points of failure. That's sort of obvious. But how does H.A. work? If you have a single point of failure, the obvious thing is to provide a redundant component of some kind that does the same thing you can then use to do what the failed component used to do, when it worked. So we've managed the redundancy to improve availability. Sort of like you have an init process, it runs over the whole cluster. And I say super cluster, super cluster running in on steroids. That's kind of the way to think of it. And he thought of init with respawn as, oh, that respawns my candy and that works really well. Good to be trying to respawn the network. You'll discover that respawning network doesn't work too good. But this is analogous to saying you have a respawn on your services that if they die or they stop working, oh, they just get restarted. That's what really H.A. does. You have nodes on death of nodes on loss of productivity outside world for services that aren't working. Now, I know this has never happened to you guys, but, you know, sometimes databases, like, they're running but they aren't doing anything. Apache is running, but it's mainly, you can tell it's mainly consuming memory of nodes as well. You have a lot of redundant things you have to then manage. One of those things is redundant communications. And one of the most important things is your nodes in the cluster can't communicate with each other. They cannot make any decision about what to do. They can't talk to each other. All else is, it's kind of hopeless. So the first thing you usually go after is redundant communication because without redundant communication, you can't manage the rest of your redundancies. So external communication is also typically essential to the provision of services as well. And also, communication to the outside world is accomplished more through routing tricks, BGP or OSPF routing to get that done. And for that piece, it's helpful to have an expert in that around to manage those things. So I'm going to talk less about that and more about the other aspects. You also need redundant access to your data. And you can do that either through replication or for sharing or for making it somebody else's problem. So if you're ready to tell your guide to the galaxy, it's like covering it with somebody else's problem field. It's still there, but no one can... You have replicated access. You have data over here, you can share the same piece of data and either one can access it or you can do backend storage of some kind where somebody else has to manage the availability of that service but you can't or don't, or don't want to, whatever, and they manage the availability in their own way. Now, making it about your problem. Now, okay, thank you. Thank you, guys. This is a verb. Now, that's the next question. The two of you need to raise your hand to get it. Always the same guys. Thank you. The interesting useful question is why are social abuse systems high enough to save everybody who wants more and everybody doesn't have it very quickly? The commodity hardware costs which are very important. Whatever it is, you can buy HP if you really want. But you can buy it. Cooper is hardly one with the point if you build it for white box parts and you're set... I built it for 800 euros. No single point of failure. The cost can be eliminated here. The cost can be taken care of here in a way that people, both people, do not appreciate or understand. However, the second item here, you cannot give away complexity. Complexity is like latency in networks. You can always solve a bandwidth problem by throwing more money at it. Solving latency. How many people here have actually solved latency problems in networks? It's much harder. Complexity is like latency. It's very hard to get rid of. And complexity is the enemy of reliability. And so, once you solve the cost problem, pretty spinny stuff with... Well, the real reason I want to do this is because it makes my head hurt. But we try and manage that as well and I think we do a pretty good job of it. This is a general open source chart about HA... about open source in general but about HA in particular. Initially, people have charged for HA systems kind of in $100,000 to a million dollar range to have an HA system. And I'm just going to show you an HA file that's over for $800,000. I guarantee you that. But so if you take these costs and go over the down and down and as a person who gets the data, I get more developers. And this is in fact what's been happening is we've decreased the cost and there were people using it that's gone up significantly. We have people using it for things like we have high availability bad readers out there, man. That turns out there's a really good reason for that. But they wouldn't just pay a quarter of a million dollars to make their bad readers high availability although maybe in this particular application it should have because they wouldn't get it. We have, you know, I got locked out of a there's these storage unit things in the U.S. you go and get storage somewhere and I tried to get out, I clutched my number to get out and it was lucky out. And I finally found out why is the Windows machine had crashed and it was taking the next 10 minutes to reboot and I was stuck and couldn't not get out. I don't know, I don't want to sell an HA system cheap to these guys. I don't really, you know, it's like when they do something wrong, they call it lease. You know, I think, you don't know what's wrong, these won't let you out. So a little about the Lettings A.J. project I guess one more thing here about this is developers go up here too and the whole thing about open source software is it works once you reach critical max and the project becomes self-sustaining. I hate that part of the analogy of a nuclear explosion and open source software but it's very similar. You have to reach critical mass so that the reaction becomes self-sustaining. So what I have to hear in my project why I've done after the low end stuff is because the volume goes up very rapidly as the costs go down. The project, not as old as being the largest associated community we have thousands of people we have about 1,000, 1,500 people in the middle of this moment and now around 10,000 clusters in the world today. The core piece of the Lettings A.J. is what we call heartbeat a lot more than that. It's one of those things, what's your name? So I think it shades the name. This is a production about Lettings A.J. and I and it runs like a division that's called Lettings A.J. also runs on 3BSD servers and I think it's written on the openBSD in the past but I'm not sure what the correct status on that is. Lettings A.J. is shared with every major length distribution Damian, Gen2, Set1 SUSE and so on, all except for the project distribution they have their own solution to that we're hoping to change that in the future and we think that might happen but a lot of people use it, that's the point People use it for all kinds of things these are for low end users web servers, database servers bad readers, custom applications like bad readers firewalls, retail point of sale solutions one of our big things I gave ourselves this with a solution that how many people hear from Germany? You know, Karstads, right? Karstads, RedLittics, A.J. in the back office of all of their retail, all the retail stores, the store I'm called and authentication, files are there's branches, everything, all kinds of things people use it for. I didn't know of those users for so far that's come up as SAP SAP has an interesting architecture so we'll work with it some of the people that use this imagine you use this Karstads I mentioned we're very in regularization I like that story because guys sent me an email and he said we were there and then my sister administrator made a mistake and he crashed the machine now I know you guys they don't make mistakes like this guy's did but he made a mistake and he just fell through the machine and he said, and it worked it worked for the east Steve said he's making interviews with people in New England not New York, where they had the power outage and he was still up but I've been always playing credit how many people have seen this the thing about architects designing a swing you've seen that cartoon it shows them getting infinitely more complex and all the customer one was a tire swing well let's say they release one it's like a tire swing you can get on it and you can just get your H.A. and swing to your heart's content as long as what you want it was a tire swing and that's why SAV did something a little more than a tire swing one of these three layer swings or something like that you saw those pictures but it does lots of things it knows when those fell and fell over some services it can use various communication methods and it can also detect a lot of conductivity the outside world can be connected lots of sand, accountability accountability, passive most people actually want most diplomats are two known clusters because people understand two known clusters it's like there's only two possibilities either they're both up or only one's up they can do chances or none of them you also know what to do there too either everything's running on one machine or you have things running on both machines potentially or if you have an active passive you have everything running on one machine or everything running on one machine so the choices are even simpler but people like those things because complexity is indeed the enemy of reliability so even though release two does a lot more things a lot of people are going to use it for two known clusters because they know how they work and occasionally you find that people who design these systems are not as smart as the people who manage them and then you run into problems but if you stick with the lowest number of denominators they'll work for you that's not related to linking J that's just related to good advice so it has some some sort of simple kind of administration tools and if you want to monitor resources like when your web server is working or not and release one you have to do something like mod or monitor one of those tools to monitor that and it can cause a failure in the trigger when you say J but you can do SNPNs but it's kind of limited compared to what some people want the most commonly requested thing is this ignorant of the need for an experimental tool to monitor resources like your web server to see if it's really working so when release two does that add a lot more things it does have built-in resource monitoring which I put first because that is the most commonly requested feature so that you don't have to write another tool and continue your tool the average person by the way gets release one up in about a half day we don't know anything about AHA don't get the software downloaded it's all in the machine if they read the manuals and more funds around they didn't get it working without reading the documentation which appears to be the paradigm that most people use but I'll give it a hand I'm not going to give people to read the manuals so we've actually spent a lot of effort putting messages in that says this isn't going to work because you can't read the manual because Unix administrators read the system logs Windows administrators assume there's nothing meaningful in the system logs that they've ever read them therefore we support much larger clusters up to well we've been currently testing with about eight nodes I don't know what the limit is it's probably the limit of how complex you want to set your configuration and so on we have a lot more sophisticated dependency model where you can say this resource depends on that one but they don't have to run on the same machine that's the kind of thing you need to do for SAP SAP does not, for example if it's anything about the network to another component it goes down, you have to restart both components and even though they're running on different machines and that's the kind of thing you can't do with release one it has a lot richer set of constraint support and so on and the complexity at the end we rely on does that sound more complex than a phrase? we are still just as concerned as we ever were about it so we're adding as little complexity as we can but you don't get something from that and the resource configuration stuff is all XML based which simplifies writing and doing some sort of work and other auxiliary tools because it's a standard format like it or not some of these things here that in CVS and the process working on that will occur probably by June and things like a configuration monitoring that will happen soon to that we're going to work with GFS which is the next talk about so we can integrate and work together with GFS multi-state master slave resources most resources is like I didn't stop, we're starting there are some resources that aren't there are more templates than that they're in stock, they're in slave models or they're in master models so we're going to be handling some others most other EJ systems handle that by a combination of weird clues you have to kind of know how to make it work we're trying to handle it a little more directly here to start with it's not going to have ideas and models like we just learned that unfortunately there's something that's just really simple in 200-plus years that we have to think about how to do it right in release 2 that we haven't so release 2 credits I'm going to give a special credit to Andrew Beacock who wrote by the far the most complex piece that released 2 of this CRM and CRV which we'll get to in a minute some other people who work for NCSA in the US and for IBM in China Lars we're asking for you if you know what you're going to use the architecture and he's been a PHB a 49er boss for Andrew Beacock and I've done various things mostly as little as I can that's the whole paradigm of the idea is you get a project together and you get other people to do it the release 1 architecture has various processes and so on basically everything here starts a master control process and the processes that actually start the network are separate and there are security and simplicity reasons why we do that you can have a client our APIs and so on here and release 2 has more processes and so on than that in this process most of what was on the other charges in this party communication comes to the layer but I can talk a little more about the release 2 architecture because it's I think far more interesting there's plus a resource manager that is basically the policy is in charge of policy it has 2 pieces of P policy engine and T transition engine which are part of it which I did show up at several times this is like all the blocks that you could open it has a membership layer which does consensus membership which has a lot of nice guarantees of consensus fully connected consensus membership a local resource manager which basically does the job of starting it's like a dominant it starts and stops things locally for the zero the cluster information base which is a set of data consistent that doesn't say database because that would be like too sophisticated we're not that sophisticated it's much simpler than that it stands for shoot the mother and the head shoot the other than the head so it's basically an engine that's designed to terminate machines that have left the cluster without permission so it's not like it's in a high school room without a note going to the bathroom without having a note you leave our room we shoot you so these are the basic components that go into the community heartbeat is now its primary purpose the old heartbeat code is its communications layer it's a reliable multicast and things like that it's always like that and these all the green arrows and our clients of heartbeat all the blue arrows are a we have a lot of clients over time relationships here and that's what most of the arrows are and people just go in the heartbeat and the blue ones go elsewhere see I thought the green code was the most important that's what I wrote can you tell us something about these consensus cluster members that sounds fascinating the consensus cluster membership guarantees that who is in the communications is that it breaks and as a result one thing becomes more than one thing because you get divisions in the communication that can happen I know you've never had it happen where you had a switchboard go out but it does happen and so you can imagine again maybe you get divided in half depending on how you have your switches and stuff set up and the kind of bugs you have and maybe you screwed up your firewall rules you guys don't do that either what my customers do so if you become our customer then you can screw it up too so the idea is maybe your communication only four nodes here can communicate with themselves but what that guarantees is that everyone is in that four node they all have agreement that they're all in the membership and then they can all communicate bi-directionally with each other so it's a sort of transient closure on the connectivity matrix combined with selecting a click or a collection of those machines that can do that and if you think about that that's probably an NP-complete problem therefore we do that by heuristics since we don't have a solved P versus NP for those of you who don't know what P versus NP is it's a major major computer science problem it's it would make you famous infinitely famous for generations if you would have solved that problem so so basically what we do is we have a heuristic set of machines that can communicate with each other and it guarantees that all of them in there all agree that they're members guarantees about who to communicate with who and who all is in the membership and everyone that is in that membership agrees that they're in that membership and when you have you ever try to get a bunch of people to agree this is like that except that they're stiffer than the people but not as not as often so is it actually there are a lot of results if you haven't looked at the theoretical results for clusters there's hardly anything you can prove you can do in fact you can prove you can't do most things in a perfect way in an optimal way however it's like the other thing you do a good enough job for all practical purposes and that's what we do here does that help with that question I'm doing pretty well so we have what we call resource objects instead of just resources they're kind of a more abstract resource object thing but no it's not a C plus plus it's still in C we wanted it to be reliable and actually that's honestly the C plus plus versus C was about memory allocation one of the goals of an HA system is think of it this way we want to run without stop for a hundred years how many bytes of memory do you get there zero how many people see any C plus plus program we're running for a year with zero bytes for every leak and it's not that you can't it's just that you use all these other libraries that you do and it's easy to use them but it's hard not to use them and they aren't always as good a quality as your code so we made the choice to use C instead of C plus plus largely it's a memory decrease honestly memory resources which is something like a web server it's a different flavor of the resources basically one of the things you can do is give it an script for your service you can tell us oh that's a resource and we just know how to deal with it by itself we won't know how to monitor we're not going to do websites with it you can do heartbeat style which is release one style or these open cluster framework style ones they're all pretty similar to each other so there are slightly variations on them and basically that means like starting to stop into service or you can have something which is called a resource incarnation I want to have 10 copies of this writing I want to have for example if you use a GFS of global files you want to have a copy of the mount on every machine resource groups I get to talk about co-location of linear ordering of states but basically the mouse do basically a clone of the kind of things you can do with release one I'll just say that I don't know what the states are in a minute multi state resources which I talked a little bit about the master state resources they're really useful for replication because DRVD is a system that replicates data from one machine to another it basically acts as a block device which replicates the data over the land at the same time and you can either be running DRVD or not and if you're running it it can either be in master mode or in slain mode so that's three states, not just two and most HAA software deals with that badly and since we think DRVD is really important we try to do a little better job with that maybe something they don't do and something that doesn't work it's just that that isn't obvious and it's not obvious that the most probable cause of an HAA system to fail is to properly design, administer, and set up is human error what the software is going to do and that means the more your mental model that the software has matches the mental model of the human being the more likely they are to do the right thing now when I did a system, I like to get these calls at 2 in the morning nothing ever happened during the day where's she at I actually, the first ones were I came in in the morning and said this hasn't worked since 2 in the morning and I didn't call you I could have gotten up at 2 fixed and gone back to sleep and then told my boss I'm going to come in late today but now instead I came in on time and everyone's breathing down my neck I'd rather get up at 2 in the morning um basic dependencies is the kind of thing you can do in release 2 you can have already dependencies that is to say this service has to start before that service you can also have co-location dependencies by co-location, I mean must run on the same machine as this other machine for example, if you have an attached web server it's probably a good idea to co-locate it with the IP address that the web server is trying to serve if you don't you'll be disappointed and someone on the mailing list will be happy to straighten you out so you've got to start before or start after and must be co-located with and this one is actually here for SAP, cannot be co-located with there's something in SAP that has to be run on a different machine for this other piece or it doesn't work it's not the same as SAP, but SAP comes to mind so instead of having this linear list we have a dependency A right before B right before C right before D and they're all on the same machine this is more of a direct to a separate graph more flexibility in how you set it up and more ways to screw it up so that's the only reason why a lot of people want to use resource groups because they're simple and it's hard to screw up I love things that I cannot screw up you can also have mandatory constraints and never getting this sort of preferential constraints a mandatory constraint you can tell a resource that has to run on one of these machines that has a fiber channel attachment to the disk that you want to run and maybe every machine doesn't have that like in the case of PRPD it's two-way mirroring so it has to run on this machine or on that machine so it's better to run on one of those too and the fault, by the way is that a given resource can run nowhere so if you get to tell it it won't be able to run anywhere you can also have preferential constraints that say well I would really rather it didn't run on the same machine as this but you can have no choice for example like all the other machines are down it will run it somewhere else basically trying to satisfy the mandatory constraints first and the preferential constraints second and you can provide ratings kinds of things that says well I prefer for this to be run on this machine but I prefer even more for it not to be run on this machine kind of thing you know you can deal with those kind of things in the appropriate fashion and this is more complex this is the kind of thing you use when you need to use it a lot of people in the center are very very happy with the tires we work a lot of the leaks do stop and satisfy their main needs the point about this is this is the kind of thing that makes a difference between our system which works pretty well and a lot of people are really happy with the one that we're really competing with any commercial A&A system out there that's where we're at it is comfortable to pretty much any commercial A&A system out there and that would be important not because everybody cares about that but because people like to run the same software everywhere and if we can't solve all of our companies problems they're going to want to get a solution they can and produce that same solution everywhere so it comes us out of all the places where we can do the simple thinking because they don't want to learn too fast oh this is useful for managing what do you suppose I meant that's useful oh managing these things there that's what I meant resource incarnation that's the example of where you can have more than one or something running and that's useful for managing low-valency clusters just like it says here low-valency clusters for example where you want to have a low balance for a front which is right behind our machine so there's a bunch of behind it and behind that a database server well all these different web servers you want to run a hundred copies maybe or expand across the machine so you give it a low-valency cluster we want the end of them to be slave servers it's useful for dealing with cluster files which we can actually have the same files on more than one machine simultaneously and not crash the data it's useful for certain kinds of IP alias techniques called cluster IP alias where more than one machine responds correctly to the same IP address you have to set up your arms like that but that's something you can deal with resource groups like I mentioned before they're kind of a shorthand from providing ordering and co-location dependencies all at once as long as you don't care that you don't mind starting them all one after the other in a linear sequence as I mentioned here before each resource object in the group is basically declared to have a linear start after order relationship and each resource in the group is declared to have co-location dependencies which basically means they all run on the same machine and they all run in the order you have in the group that's much simpler to describe this is the easy way of converting resource groups into lilies too and honestly what a lot of people are going to need most of the time for most of the things they do and it has the advantage of it does what you want in all part of a linear it does what you want master study resources as I mentioned I've talked about this a couple of times but the main thing to note is it's ideal replication resources where you're replicating data from this machine to this machine either that it's running the replicator or it's running a master order some kind of replication can actually do by an actual replication that's not an issue here that's another kind of resource but not everyone can some want to know which is the master which is the slide you can have associate arbitrage now we're getting into this a little more more esoterica honestly those can have arbitrary attributes associated with an example you can say these nodes have a fiber channel connection you can say fc equals yes and then you can say color resource this resource must run on a node that has fc equals yes and so that's the way I'm specifying them that they attribute or property to the resource and then having the resources run up the ones that have those properties as opposed to just an extended specific list of them and that's often the hand here or you can say for example release greater than 4.3 or whatever as it says there's types of integer string and version version is of course being strings that compare like integers but not 1.2.3 is greater than that 1.2.2 and less than 1.3 are those kind of things the weird things that people use in version and you can do the kind of comparison operators and you can also say I want to run on one that has this has to be defined to be something or not defined or I want to run on a node that's co-located with another resource that's not co-located and these are preferential constraints so you can say that this can get a waning out of this which then says that I would really like it if you would run on this but if you can't that's alright too if you get into the larger clusters in a complex environment you get into things which you have to want to spend the time to test what you're doing here you have to have a good reason to do these in my view now not that there aren't good reasons to do it but the majority of cases don't need this 90% of the H8 cases in the world don't need this because the more complex you get in here the less likely you are to get a rock even beings will like that I don't understand about me anyway I'm not into other kinds of constraints each constraint is associated with the difference before I said and as my way to reach a node it might run on and then if things are not to be true then the waning that's given there is applied and the waning can basically be a positive or negative integer and the waning and all the things can fly are added up and then try to figure out roughly where you want it to run there's no attempt here that's really hard the integer linear programming problem that we probably would need most of the resources to possibly compute where we ought to be running the resources and that was something we decided that probably wasn't good for release 2.0.0 tongue in cheek guys it may not be good ever because it's the kind of thing one of the things you want here in a reliable system is you want to know what it's going to do because you want to be pretty confident you've tested that situation because an HA system which isn't work, isn't tested isn't HA you may get one probably won't all the time so those are the kind of things you do I want to talk a little bit about security here which is a slide where my hobby orgs is to talk about regarding clustering and a clustering is a computer whose back plane is the internet how many people find that right good, this is good you may think you have a secure clustering network there are two kinds you're probably mistaken now you will be mistaken later okay and the reason why that's difficult is security is not often well understood by admins and the person who sets it up they've done just fine when they set it up but they've turned it up to somebody else who wants to do better things left the company, former company whatever and the people left behind doing it they don't understand it as well and then you have your hardware installers to go in the half-room and plug cables together and maybe they cross-coded to two networks which shouldn't have been maybe somebody set up a route on one of the Windows machines and you had thrown it on the Windows machine set up a route I guess deliberately hardware installers don't fully understand it and that's very, very abnormal most very crazy to do it in fact from trust and staff and staff turned over a big issue where I mentioned the person installed it and understands it the combination of viruses and peer-to-peer technology are frightening instead of having one that sends out spam how about one that comes and makes all the Windows machines and your company send out spam how about one instead that opens up an outgoing firewall hole that's certainly going to happen the question is when so security is a very big concern and remember the reason why it's a big concern for clusters and clusters the RAC plane is really the internet now you can argue it's a secure network but I already made reasons why I have to select that suspect now honestly I like crossover cables they're reasonably secure I love this stuff good H&A software should be designed should be designed to assume insecure networks good H&A administrators will assume that the software doesn't work and they'll put it on so what they think could be secure networks so each of you needs to mutually distrust the other to have a good installation but if you look at clustering software the vast majority of it kind of assumes of course it's really important and you're going to have a secure network I know you are, really, truly and so this is something to be cautious of a lot of software does not do that in the way I would consider a reasonable right so actually I got it through the whole talk this is amazing maybe I should have had that extra little caffeine before it came in so this is kind of the appeal we kind of try to do in these two people use this for all all kinds of solutions I really, truly do have an H&A file to burn my house and then my MP3s play all the time so I have my priorities if you wonder and that's actually how I test it when my wife was out of the house and so I started playing music on this machine and I went on to this other machine and I was playing music and this one was using this client and this one was using this other client and this one was in the fastness and it had 4 or 5 songs you cannot get these songs synchronized so it kind of sounds a little hot and you know I failed the machine back and forth and I was just happy and just really glad my wife was so what I'm trying to say is this technology, indeed can be afforded don't misunderstand that you don't have an application for this that you didn't think you had these things work particularly with technologies like this for biosystem replication in DRVD that's a key for what makes it inexpensive I replicate from one ID to another ID now on the other hand what that means with the IBM sales that Deputy U.S. says buy a higher channel we know buy everything the IBM salesman tells you to buy but because that's what keeps me employed and they make money I keep my job so I'm happy on questions yes the question is about how that relates to single point of failure right so he's asking here about setting up something and he mentioned that you feel like you said it up for me the point of the thing is you do want in fact to have no single point of failure actually in the case of DRVD if you had a second with heartbeat certainly it would have known the two sides were back up and you can heartbeat over as many different interfaces as you want apparently it's only limited by your budget and complexity but two is a minimum recommendation for heartbeat pass you mentioned earlier on even though I was talking too fast that it was very very important that inter cluster communication be replicated because if it's not then you get the result you got because the two sides don't know what's going on and each of them thinks it does and they don't so that's a bad thing and that's also another reason why people use numbers as well in this environment further questions oh sorry I shouldn't walk into that life I can't see go ahead he's asking what kind of heart beats are currently available on let's say J and you can heartbeat over UDP multicast UDP broadcast serial you can set a funny kind of heart beats called pink heart beats where it appears that your router is a cluster member but the normal ones are UDP multicast UDP broadcast and serial the heart beats the only communication in release 2 is the same as the communication in release 1 we actually have a pink mechanism that won't come that we do for this drive in release 1 eventually we'll get adapted to release 2 after we figure out how to do it right it's not going to be something that's going to happen because honestly we have a lot of things to do and we need to think about we have a really nice one that works very well and we need to figure out how to adapt it to release 2 and it's much more complex further questions before the rest of the guys leave here over here you you're right about testing you can build one of these things tell your manager you can test it so what he said was if you don't test it it isn't HA that's the short answer right now that's not something we're prepared to deal with let's talk about it it's really one of as much of the LVD issues as well as the HA issue he's asking about backup slave nodes for the LVD and I don't know the answer to that question really I haven't talked to the LVD for a while we've talked about various things but I don't know the current status there's another question up here up the aisle yes that's actually something to support it release 2 we have to monitor the application but that's supporting release 2 yes you have a question here we support every configuration support we support configurations that they don't support ours is an open community project which a lot of people have been open over a number of years and historically it has to be an available outside of the organization we're working with them and they're looking seriously at release 2 because they're like we and with regard to whether that will happen or not you know like they say about the outbreak it's over to the family you see but right now we have some positive discussion with them and that all looks pretty good right now but you know it's not for me to say it's for the LVD yes where's the question let's look at the audience when they ask that question okay alright I need new glasses I think go ahead with your right hand we'll take the IP address with a new active note what about the MAC address is it supported to take the truth is that IP address takeover is just a resource to us we can take over any kind of resource for which someone has written that resource agent if you write a MAC address resource agent it'll work for you we don't currently supply one we know people who have done that but if you want to supply one we're always looking for patches okay you can't do that raise your hand if you raise your hand while putting your coat on you're asking the question what's the question over here we got you asked a minute ago go ahead do you have something do we have special techniques for long business clusters in release 1 we have absolutely nothing for that in release 2 we're thinking about it again we want to get 2.0 that zero out and stable and we're going to start looking at profit we haven't tried to address the path that is something we have had in the back of our minds when we were designing release 2 we have some good ideas to work but it's not ready yes up there at the top do you have integration do you have integration with re-imaging systems if you start with 5 machines doing handling one service another 5 machines doing another service and all 5 in this service break down you need to rebuild 2 other machines and over that service if you have a if you can write a resource agent that does what you need to do I'm sorry I mean the question is it's a script if you write it a knit script that does what you want it to do then that in this script will work just fine for us yeah you can code a situation like which says oh dear I have no nodes left in that that service that's a possibility it's not something people have asked about honestly but the thing that happens is time goes on people use it more and more they deal more and more different things with it I'd be interested to hear if you want to do that there are probably some difficulties like everything else probably some difficulties don't come but I can't think of any theoretical reason why you would one wouldn't be able to do that pretty soon when? release 2.0.0 will be within the next month I would imagine we'll have a fully stable remember HA is about paranoia you're not carrying out the paranoia which you should be right not that we're trying to produce crap that's not it it's about paranoia it's about making sure that other people we're going to try to have the 2.0.0 imagined by people would be happy to run on any system no matter how fertile you are but you have to get the 2.0.0 release in order to get that point there's no way to get there without going through a 2.0.0 release for the girls you can go now