 Hello everyone and welcome to the 1.3 pm session in the developer and open source track. As a reminder to our in-world and web audience, you can view the full conference schedule at conference.opensimulator.org and tweet your questions or comments to at OpenSimCC with the hashtag OSCC14. This hour we are happy to introduce a terrific session called Optimizing and Improving Open Simulator Performance. Our speaker today is Justin Clark Casey. Justin is president of the Overtre Foundation and one of the core developers of Open Simulator, working on many different areas ranging from asset and inventory to performance and infrastructure issues. He has created some of the better known data persistence formats for Open Simulator, such as Open Simulator Archives and Open Simulator Inventory Archives. Justin also provides Open Simulator related consultancy services. Welcome everyone, let's begin the session. Okay, great. Well, thanks very much, Emil. And hello everybody to my session here. Optimizing and Improving Open Simulator Performance has very well introduced. So I'm going to get going. I unfortunately do have quite a few slides, but I'm going to try and properly paste them so that I'm just completely run through them. Okay, and they're going to be pretty plain slides as well. So I'm going to apologize for that in advance. So what am I going to talk about in this presentation? And it takes a while for the slide to go through compared to the choice. So in this presentation, I'm going to talk about three things pretty much. First of all, kind of stuff about the system architecture. So really, this is the background for some of the performance conversation and some of the rules of farms and potential bottlenecks that we already know about. So it's going to be the components, the interactions between them. And then secondly, I'm going to talk about some of the measuring and testing mechanisms for actually finding out what the heck is going on in the simulator because it can be very hard to tell. These things are really pretty complex things. I think of them as really operating systems than, say, web apps or app servers. And they kind of have a whole bunch of things going on and a complexity to match. And then finally, I'm going to talk a bit about the conference itself and the work done for this, particularly for performance and how we kind of look to get 400 connections as a cat for the keynote. I don't think we actually had 400 at any point. I'm kind of a bit thankful for that, quite frankly, because it's really a bit of a voyage into the unknown. But what we did have performed very well, it seemed. And I'm just going to talk about some of the ways in which we make changes or conflict changes or adjustments to kind of get to that number or make sure we have the capacity to get to that number. So a brief slide, really, I probably don't need to show this. It'll come up in a second for you. Really, it's like, why listen to me? Well, I've been working in Opensin for a frighteningly long time now, more than seven years, which is pretty crazy when I think about it. I've done a lot of work over many kind of different bits. But some of the work I've done has been on performance analysis. And that has sometimes been for kind of my commercial and education clients when somebody has a grid that isn't performing well or they want to know how to structure a grid. And sometimes they talk to me and I do consulting. So if we come to an arrangement, I go and maybe look at what their grid is performing, how it's doing that, and what can be made better. So I've done a bit of experience in that area. I've also directly made some of the performance improvements, both last year in a cooperation with Krista, who did a lot of work there as well. And in this year, I've done quite a bit as well to kind of get the numbers up. But even then, even with that experience, even with seven years of open sim, I don't think I'm an expert in everything by Eddie Beans. It's really, performance in this architecture is a super complicated question. And there's many, many different networks instead of arrangements out there. So I'm going to give things from my point of view, but even some of you around to the audience might kind of have different experiences than I have. I'd be very interested to hear about them. So I'm going to skip this because we don't really need to say about this claim. So kind of talking about the system architecture. So I'm going to talk about pretty much a hypergrid-enabled grid installation, which is what we have in the conference here. And there we go. The slide's gone there. Oh, I can skip the slide. I see. Yeah. So I'm going to talk about hypergrid-enabled grid installation, which is what we have here in the conference, of course. But really a lot of it applies to standalone. You just kind of have to ignore the grid service communication bits and the hypergrid bits. And of course, much applies to a normal hypergrid grid installation as well. Just understanding your hypergrid parts. I'm also going to assume you're pretty familiar with fundamental system concepts, both in terms of OpenSim and what I mean by standalone grid and hypergrid. And that sets an inventory. And I think pretty much all of you are. And kind of some network stuff like what is HTTP, what is TCP, UDP. I'm going to assume you know what that is. Right. So I'm going to show next we should pop in a moment the kind of grid components here. So I'm just going to give a small pause because I think that would help. Okay. So this is my simplified view of the system with lots of colored boxes. So this is kind of like your grid installation as I think of it. So on the top left there, you have the viewer. And that communicates with the simulator on the right side. And there's two kind of channels there. It's both HTTP and UDP. And effectively the communication is bidirectional. But also the viewer also communicates with the public grid services. And that's things like classically login for instance. Of course you have to communicate the login service. And things like map. But also in kind of certain configurations like the one we actually have here. It also directly fetches say textures from those services. So there's a communication there. And also the simulator communicates with both the public grid services and the private ones in the back end. And I've just got a little box on the right just to show that it also communicates also communicates with other simulators. And you'll see that all of that is HTTP apart from one kind of channel which is UDP between the viewer and the simulator. And so the next diagram basically shows a very simplified version of the hyper grid kind of situation. And Chris will probably be cringe here. But really it doesn't show you a lot quite frankly. It kind of shows there's the viewer again. There's kind of a home simulator a foreign simulator. There's probably home services, grid services and foreign grid services. And what really all this diagram does is show you that pretty much everything talks to everything else. It's like a complete spaghetti of communication. I don't think there's any relationship that isn't perpetrated. Everything does have some kind of communication with any other set of services. And maybe you could break that down a bit further as you saw in Chris's talk earlier today. You can break down different services but really there's a hell of a lot of things going on. And really so then we can go and talk about the bottlenecks. And so you can kind of see from that diagram and just see from the side. The second network of course is a massive communication is a massive kind of point of concern. There's so much of that stuff going on. And then we can also talk about as a second aspect kind of processing obviously CPU and on the viewer side GPU storage and the software itself simulates to how efficient is a service. And so for those kind of four aspects we can break that down. I would send to three components. So of course you have your kind of classic simulator. You also have your back-end grid services and then you have the viewer or the client and really you could probably even decompose that further but we'll leave it at that for this presentation. So I'm going to kind of talk about those aspects in turn now and talk about a little bit of experience and some of the rules of thumb that I've kind of discovered in the course of working with OpenSim although it's an evolving situation which is always interesting. So let's talk about network bottlenecks for instance. So kind of like one of the obviously the major avenues of communications between the viewer and the simulator and the back-end services. So as we kind of saw from the diagram earlier there's kind of two main channels there. Firstly there's the UDP and that is kind of all the packets that flow between the simulator and the viewer. So for instance I'm walking about the stage now and I'm glad Dave is focused on the slides because otherwise he'll have a fit trying to kind of focus on me but every time I walk for instance I'm sending UDP packets both up to the server in this case I should say the simulator and then the simulator kind of redistributes those to all your clients. So then I kind of move and then hope very shortly later you get packets saying I've moved and they can see me kind of strutting around on the stage. So that's one avenue of communication and kind of like the critical part of UDP is that it's low latency. When I make a movement or in fact really even more critically when you make a movement. So when you make a movement even with your own avatar that's got to go back up to the server back up to the server and then back down to your client because your client doesn't assume it does some interpolation where you actually are moving but it doesn't kind of assume that just because you're pressed forward that your avatar is going to move forward. Because of course you might be hit with a sword or kind of another barrier. So that's why often when an open sim goes wrong or when something goes wrong you can rotate your avatar as the client lets you do that but you can't actually move because the packet is some reason I'm no longer getting up to the simulator and then getting back down again. So that is critical that that happens as fast as possible. But also there's the HTTP channel and nowadays the vast bulk of HTTP traffic is getting textures from the simulator or in more advanced configurations from the service itself. As you can imagine that's a lot of data especially if it's not in the case that it's not cached. You've got to put a lot of information down from the back-end service to the back-end simulator and so the critical thing for HTTP is high bandwidth. But because you've got both those channels going between the viewer and the simulator and the services you kind of need both. You kind of need both low latency and you need high bandwidth. And as a situation a problem one can see very often. So for instance wireless connections some wireless connections are very good but some wireless connections can be very bad there can be a lot of latency even if you've got good bandwidth you know you could take a lot of time for the packets even to get to your router and then back out to your ISP. So that's one reason why we often say if you are on wireless and it's not working well well then you want to go wired. But we kind of have a very and this is a super approximate rule of thumb okay I could stop walking around if that's actually affecting the audio that might might not be a good idea sorry. So the rule of thumb that really I've been operating on but it really is the rule of thumb it's like 500 kilobits per connection and that's really because of the need to pull down assets. If you're not pulling down assets then you can actually get away with a very low amount of data I mean I'm only putting down about like two or three well maybe five kilobits a second and that's very spore. So so really again there's so many variables it depends upon your viewers you can't assume of course that the viewer has your assets cashed so you kind of need to leave a good amount of bandwidth there so that's a very approximate rule of thumb for the actual number of connections you're trying to kind of run over a server and I've just mentioned there at the bottom PingTest.net is one way of also measuring latency I'm not sure how good a job it does latency measurement is kind of maybe not so generally actually there is another tool later on which you can run on servers to kind of measure latency but that's kind of one web thing and that's equivalent to the speedtest.net right so another network bottleneck potentially is simulated simulator so excuse me a second so this is kind of critical mainly in teleport and region cross so when you teleport there's an immensely complicated set of communication that happens both between the viewer and each simulator the foreign say your local and the one you're going to and between the simulators themselves the kind of simulator A which is your source has to tell simulator B that hey there's a new avatar on the way there's a new agent you've got to do some stuff to prepare for it and then only then later on it goes back to tell the viewer hey you're clear to actually communicate with server B now and then the viewer communicates so kind of having good latency having a good communication rather simulators simulator is very important as well and again if you're doing teleport that I mean there are delays built into the system but it is a somewhat time critical things you really do want to be able to do that as quickly as possible and so finally I've kind of like called it installation to installation so that's kind of like a situation where you both need low latency and good bandwidth I would say and good bandwidth is mainly because the lead to transfer in the first case attachment assets so when and I know probably some of you have a have experience and probably ongoing experience of difficulties with attachments and even in the conference today I know there's people missing that kind of stuff so you kind of want a good bandwidth there to actually be able to transfer these assets and it's actually been a bit of work on this conference branch which we'll get forwarded back into master fairly shortly to kind of try and improve the reliability of that transfer but it's kind of like a you know there's a lot of optimization one can probably do there and it's kind of a work in progress but it's you know occasionally we do lose these things so that's kind of like network bottlenecks and what I've not talked about there is stuff like services you'll see I've missed a few things like simulated services communication and I do think that's important I think as we saw with Tranquility's talk earlier today when she reached a certain level it actually does become critical to have and maybe here is talking more about the processing on the service end but it kind of like these those kind of things become critical but kind of at the scale that most of us are operating at and I include the conference grid in this it hasn't proved to be a critical issue but that kind of like me that might be another advanced talk for another time and maybe by somebody else like Tranquility in fact who did a very good job today so kind of we've gone to processing bottlenecks so this is of course CPU and GPU so on the viewer I mean as you can actually very well appreciate the more primes and the more avatars you're trying to display the higher the load on the viewer and you know as you know that's reflected in your basic kind of frames per second measurement and it's of course not a lot I can say about that on the server side I didn't have a huge amount of experience with viewers but it's kind of like the thing to bear in mind when you construct the scene as I'm sure many of you are very well aware to try and keep down both the primes and really textures as well as another thing to try and keep that under control so moving on to the simulator which I of course I know quite a bit more about again it's the same situation the more primes you have the more avatars and more scripts not so much the textures because you don't need to do a lot of those on the simulator side but things like scripts of course then translate into higher load and again this is where a lot of variables come into play you could have heavily scripted simulator and if the scripts are behaving well and they don't say process a lot of events perhaps then it won't be such an issue but if you've got a lot of scripts doing a lot of kind of a lot of heavy activity then that is going to be a problem and then you can combine that with more primes and more avatars so you know that comes down to the perennial question what kind of CPU do I want to actually have to run my simulator and it does become the limiting factor there so again in my experience and this is running on things like or attempting or seeing people attempt to run on things like single core Amazon EC2 servers OpenSim does not like running on a single core I don't know if that's true right at the cutting edge but you know it's a very heavily threaded system we'll talk a bit about that later on but you know I've always recommended really a minimum of two cores per simulator and the rule of thumb of one core per active region and that's kind of a bit that certainly used to be the case if you would see a lot of performance degradation if you didn't have enough regions to cover enough cores to cover enough regions and that's again a bit flexible if you've got regions which nothing at all is happening on if they're just water then as you can appreciate this thing certainly nowadays it used to be actually worse but nowadays there's very minimal CPU load from those regions so you could stretch that and maybe have many regions per core but of course if you've got fairly normal active regions then I would say as a rule of thumb it's kind of one core per region and then there's the question like do you try to run one region per simulator or multiple regions so theoretically multiple regions per simulator should be more efficient there's not been a lot of work done to make it really efficient because it's not a lot of people run one per simulator but you kind of at the very least you end up without the overhead of running another model virtual machine if you're having one or another .NET virtual machine I guess as my run says 64 bit I mean there's always debate about 64 bit versus 32 bit with 32 bit you do use less memory because the pointers are kind of smaller as it were but at the same time I don't know so many machines are 64 bit nowadays so that's where everybody ends up running on so you know the kind of downside to running multiple regions per simulator of course is that if it just you know the simulator is going to take all those regions down apart from one and to be honest that's one of the overriding reasons that so many people run just one region per simulator I would say as well as the fact that it might be slightly less buggy but to be honest I don't think multiple regions per simulator is any significantly more buggy than a single one because I do run multiple regions per simulator quite a lot myself although not in continuous high usage I will admit but also really I would say a processing bottleneck is kind of the software itself the software does have certain inefficiency especially with regard to and we'll come more into that later so just to go through storage bottlenecks as you'll see in the next slide in a second really the storage bottleneck I would say is services and from that it's asset storage which really kind of dominates everything that is by far the most amount of storage being used so so what can you do about that well one thing is to run a de-duplicating asset service and that is one where if you upload the same asset twice then instead of having two copies of that asset in your asset service you're only going to have one and there's going to be two pointers to it so in core development there actually is a de-duplicating asset service and I do hope to bring that online soon and that will kind of alleviate some of that and hopefully a kind of automatic migration method where you'll just be able to reduce the storage load over time with existing duplicates of assets but there's also today another project called SRAS and there are actually a bunch of references in these slides and I will make these slides available and SRAS is kind of like a replacement asset service which also does do de-duplication right and also the question is is faster storage better well I mean it's always important I wouldn't say it's critical but then that's in the usage scenarios I see where we don't have usually kind of like concurrent grids and maybe once you get to a high level of concurrency then that stuff becomes more important but kind of the loads I think most of us see it hasn't been such an issue and the simulator it's also going to have a lot of storage in the simulator because you want to cache as many assets and simulator as possible especially in the hyper grid case where in the hyper grid case for instance when an avatar teleports into another region another far region and if it doesn't have the assets for the attachments already it goes back to the originating region and then asks it some exported asset service for those attachments and that can take quite a long time as many of you probably have experienced so you want to try and keep that stuff cached as much as possible and to be honest it also gets loaded into the asset service so it's kind of there as well but it's always nice to have it kind of as close to the simulator as possible regarding memory and far cache to be honest I think memory cache was experimented with but most people found it to be very un-useful because memory cache to actually cache any textures you already need a lot of memory and the fact is actually transporting those assets to the viewer is dominated by the network latency network latency is an order of magnitude higher than storage latency and it just found that it didn't make a sufficient difference so far cache is kind of perfectly reasonable so yeah the final kind of final aspect was software bottlenecks so you'll see in a second that really one of them is of course the viewer and I know nothing about the viewer so I will not say any but nothing's a bit strong but I know very little about the viewer relatively speaking so I'm not going to say too much about that but basically services so take a little bit there we go okay so services so just to say Cora I would say from my experience and I think from most people it's far cache which is perfectly sufficient memory it's just too difficult you need too much of it so services you see I always have this feeling that so in OpenSimulator we use this embedded C-sharp web server I feel it's inefficient but really I should just say that's a hunch of mine I've not done I there is a little bit of measurement stuff but nothing really other people do say that it's perfectly fine so you know it's kind of debatable I feel it's part of me keeps wanting to say it's inefficient but that's probably because I haven't really done the measurements but really the major software bottleneck is the simulator and really this is because it will hand out free threads to anything anything that the simulator does or at least a lot of the stuff kind of ends up being done in another thread or we're very thread happy about doing that and and that's kind of great if you've got a huge number of cores I expect but the scale of jobs is so high that it's very easy to overwhelm the amount of processing you have available and I feel and some of this is kind of work in progress you end up doing a large amount of context switching and also it holds up other jobs so actually I've made various changes I'm calling them improvements for the conference and I think they are but there are kind of changes in that area and there's also the question of mono versus windows I think windows.net used to be a lot lot better than mono but I think nowadays and again I'm not backing this up with good kind of anything other than anecdotal but I think there's not such a big difference nowadays I'm perfectly happy running mono for this stuff there are differences but I don't think the performance difference is huge but again that might be something that other people have other experiences with another thing is you can offload some of the stuff like get texture and get mesh to a service corner and at the reference at the end there's actually the wiki page where I wrote up the configuration for doing that and which one can look at and that that means the viewer gets the textures directly from the back end not the back end but public service rather than from the simulator which may help but again we haven't done great measurement on that and handling bad foreign installations maybe that's a bit of a bit of a phrase but it means installations where for instance if you kind of do a teleport and you ask the originating grid hey could you give me the assets for this these attachments then and maybe a bad behaving grid would open the connection but never not actually respond to you and so wait wait for the thing to time out which can take as long as kind of a hundred seconds and then you end up getting things like maybe people don't see their attachments or kind of other kinds of issues so and of course you can't avoid not being you can't avoid bad simulators or bad installations but you want to try and handle them as well as possible so I should really remember to hit the slide next before before my time is up so coming to the measuring and testing questions so again I've kind of run through these rules of thumb but they really are rules of thumb if you want a really accurate kind of measurement of your simulator you want to actually measure this stuff because every install can be different there's a very large number of breathing parts you could set it up any number of ways I think as we've seen in some of the talks earlier on there's a huge amount of ways you can start to configure this stuff so the question comes how do you identify and reproduce an issue if you're seeing a problem with your grid how do you actually kind of work out what it is and either be able to fix it yourself because it's some kind of configuration issue or maybe some other component issue or of course kind of bring it to the developer's attention and maybe we can do something about it and often that's much more than 50% of the work and once you can identify an issue then you can fix it but identifying issues particularly in performance situations it is really difficult so that kind of brings us on to the topic of measuring and there's kind of unfortunate generations of statistical systems that I've been seeing now and it's kind of a good thing but that's how it is the kind of the latest and I say dominant because that's the one I've been working on for a long time is kind of one that that is called show stats and you can see this in any simulator and even on the on this service as well by typing show stats all in the console and that will show you and you'll see it in the slide in a second that will show you a large number of kind of raw numbers and moving averages for various things going on the simulator and it always has a facility period periodically record that data to a separate file and for later analysis so I'm going to show you very quickly and I really should have clicked next earlier then I should stop mentioning that because it's not good is it so there's kind of a large number of things and a lot of these are kind of obscure right and a lot of those I've read and I know for a fact you've kind of got to know the internals or how the stuff is working to make sense of them but I am trying to document them and there's another reference there'll be another reference to where I'm documenting them on the wiki and other people of course are free to document stuff as well so you'll see here for instance if I type show stats all on a simulator prompt you'll see kind of different categories so there's the client stack there on a region called keynote one and it kind of has a stat to show you how many clients have logged out because they need packets or because we have not received any packets for a while say 60 seconds go by and we don't receive a packet we kind of log this client out so you can see how many times that has happened and that might give you a clue as to whether something's going wrong some of the inbox and incoming packets are kind of like you know they're kind of complicated things they tell you a lot about how UDP data is building up in the system whether we're processing it quickly enough or whether it's kind of building up stuff like teleport attempts you know how many teleport attempts have you succeeded how many have failed how many HTTP requests have we served on here on port 5000 on one of the HTTP servers some of the stuff around the scene and I've missed out a lot of stuff it's just a visible selection of the stats and kind of information about memory and how many for instance at the bottom what CPU percent we're using so there's a lot of data there that one can look at and the reference at the bottom just goes to the and the references are at the end of this presentation by the way kind of an information about what some of this stuff means and if anybody wants to know what an obscure stat means please ask on the mailing list I'll be very happy to write it up I kind of don't do that in advance because it would take such a long time sometimes to document as as kind of as mentioned earlier that I kind of do it a little bit on demand sometimes which may be a bit naughty but there you go so what can we do kind of like recorded data so you'll see there and this might not be very this is a bit of a graph so this is graphing as if you could pan out a little bit Dave okay so on this graph you're seeing CPU usage so that CPU percent kind of thing we saw a little bit earlier that is actually graphed here so we've got a bunch of samples down at bottom which is every five seconds according to that if you execute the command stats record start every five seconds it records a set of sample data and then we can go and graph it and then and I've actually written a lot of code this year to actually do this graphing which is kind of like one of the things that took some time but now I can see for instance this is part this is one of the conference test runs and you can see that at a certain point we start using quite a lot of CPU but we never kind of hit the peak so this was useful to know that the conference is operating within for for a kind of a kind of like a 400 connection load it's operating within bounds we're not running out of CPU at least not kind of in any obvious manner so this was really useful to know and I've just lost my point I should have gone through these things a little bit more quickly but then want to kind of screw Dave up so the next graph you'll see in a moment is something called an agent time which is the kind of section of the scene loop which kind of measures which actually measures how long it takes to process some of the agent basically sending out a certain part of the sending out data to agents not all of it but some of the processing required to actually send information to other observing viewers that say a user has moved about so you'll see from the graph there once it appears and this was actually for four keynotes that the thing you immediately notice here is these massive peaks in keynote one and this was one of the kind of like one of the one of the performance runs where we saw and I've already changed it, I was a pity but you could see the peaks and this was one of the performance runs where we actually met kind of freezing in the scene in certain cases and this kind of showed you that it was in the agent time part and I was able to make some changes which kind of like tried to avoid that kind of thing and this graph is kind of thread pool work items waiting so this is like items which are queued for actually processing but haven't got a thread to actually do them yet and you can see again peaks here and this is not what you want in a simulator where the peak in work items waiting, that's a bunch of work that's not happening and maybe that's critical work, maybe it isn't but it's the kind of thing one needs to look into if one is trying to get an absolutely smooth experience so there's a lot of data one can kind of get out of that and all this graphing code is available in my public kind of tools repository and you'll see that on the next slide in a second there's a reference at the end called graphing which uses Python basically to run through some of those stats files and basically you can choose which to see what the data is and see and basically produce a set of graphs like the ones I just showed you they're kind of primitive at the moment it doesn't do some stuff like it doesn't filter out outliers if you've got kind of some strange data points the graphs kind of become the scaling becomes bad it doesn't do time series alignments you'll see in the graphs that I was just doing samples I wasn't doing time but all that stuff can be done it's a matter of basically coding and this can be of course very useful in diagnosing problems although you do need to know really what some of these numbers mean so that brings us briefly on to testing and I know I think yes okay that brings us on to testing and again it's tools one of the ones which is very useful is IPERF which can measure throughput on TCP and UDP and latency stuff basically completely outside OpenSim and that is kind of like if you're having network problems that's a very good tool just to kind of establish if your network itself is okay before actually looking at OpenSim issues and then service testing there's a very little bit of this I actually wrote a very basic texture load test thing which again is going to be at the reference on by GitHub repository but really there's not a lot of that stuff and I think actually might be some tools in OpenSim itself I've looked at them recently which actually can do some of the service cores but I'm not sure they kind of like work for stuff under load I think it might be more kind of actually checking if the stuff is working or not and then there's kind of general test tools for HTTP such as Siege again a reference at the end which might be helpful for trying to test services but I couldn't say completely for sure so really the next the key thing about testing is how do we test the simulator? My favorite tool which I have worked on a lot is PCAMBot which is a way of actually loading through LibOpen Metaverse a load of bots onto a simulator at once and kind of like being able to more easily manage them because it's very it's practically impossible to get a hundred real people so this is the kind of like the next best thing and there's kind of different behaviors in that tool which you can teleport, make the bots teleport and walk around and kind of stress your simulator out in different ways and this is a major tool I use for performance testing in the conference and one thing of course is that it's not like a real viewer the bots do do stuff like walk about if you're telling to but they're not kind of simulating a viewer completely well and that's probably going to be another focus of actually trying to make them more like a real viewer in their behavior so basically let me very quickly talk about the conference here so getting 400 connections into the same space was a huge challenge on the keynotes and I can't say for sure whether we met it because a lot of that was the pathetic kind of bot load which is not the same as I said but we did get there at least with bots and that's you know you've got to do that at the minimum the problem with connections is that the load really does grow exponentially and not linearly every new connection you have to a simulator has both got to have it's all its movements seen by other people and it's got to see all the other people already there so it's a case the more connections you have the higher the load and that does not grow in a linear fashion so getting from 300 to 400 is massively more challenging getting from 200 to 300 I would say and the other issue is that I don't think maybe on some other situations there are a lot of connections but I'm not sure getting 400 connections in the same space has been done a lot I know it's been done on the Intel distributed scene graph stuff but again I think a lot of that was synthetic load and it's very difficult to say you've done it until you've actually really done it with real people so you know there are big challenges there so kind of considerations of trying to get to 400 is that can the other parts of the grid actually take it and they kind of can it was not a huge issue services did not seem to buckle under the load there were some techniques of actually running multiple services and they are getting detailed in the wiki but I don't think we made enormous use of them and it's hard to say but I don't think that was a major point for us the other issue is the bandwidth and we're very lucky here in having very very good bandwidth so it's not actually such an issue for us but it can be you know you've really got to take care of your connection going back to the earlier rule of thumb for instance if you say 500 kilobits per avatar then 400 connections is kind of 200 megabits I think and you know that can be a lot of data so it's like do we have that and getting people to actually come on and hash assets really helps in their viewer do we have to see if you use another question we're kind of pushing it maybe a little bit we have 28 regions for 24 cores and yes this is all on a single machine it appears to be running very well actually so it's not too bad but of course we didn't actually see 400 so it's kind of like up in the air and this is again it's the kind of thing where you really don't know how it's going to perform until you get the real world loads and the other question was do we want to use a vibration or do we want to run four neighboring regions and as you kind of have seen on the keynotes we ran four neighbors and this was because firstly because of fault tolerance again as I said earlier we were kind of there was still a fear that the regions could go down because one of the keynotes did go down last year and you know if you're running everything on one simulator in a VAR region then all four are going to go down at once whereas if you're running them on four separate regions at least the other 300 people potentially are still going to be there and there's also the question of code maturity VAR region is very good but it is also new and other parts of APNC have not been scaled up to cope with say 400 people in the VAR region is my very strong feeling so that was another issue for not going down that route so kind of what was the process behind this and I know I've really not a lot of time left so I do want to some time for Q&A but let me go through this so the next side is the process and so as you know we were doing regular Tuesday low tests and that was starting to get real people on the grid which is of course enormously helpful as I keep saying is good but it simply doesn't simulate real people and all the crazy things real people can do no offense intended first we ran the stuff locally but that was actually really difficult to do because actually PCAMP bot does place a big stress on the system it's really trying to emulate real load and so you can't do certain kinds of optimization because that's really not the aim so we kind of like started running them from EC2 which is actually very successful it's very good to spin up EC2 machines and just run PCAMP bot and kind of hammer a simulator we also did science recording and analysis in these sessions and then it was a case of once we identified a problem it was trying to reproduce it because it's very hard to fix a problem if you can't reproduce it if you've got to wait every Tuesday for a test and then of course the test might be slightly different so it was a case of actually trying to get the same kind of problem to show up on the simulated load and that was actually very good we did manage to do that in quite in many cases and then it was a case of either changing open simulator code or making conflict changes and then repeat until you're completely exhausted you can't do it anymore so oh okay okay I'm being quick so what are the problems that emerged here so one of the things was that there was a Andrew Cedars slide in a second is that there was a lot of need to actually improve PCAMP bot the test tools themselves were not good enough and that actually took a lot of the time because if you've got bugs in your test tools then you can't necessarily rely on their output and what they're telling you so there's a lot of bugs being fixed in PCAMP bot and then there's a need to do graphing and analysis code because none of the graphing code existed until I wrote it this year so that was another kind of major source of work and then of course you know what you actually need to identify a problem we had to add more stats more and more stats to the thing to actually tell what was going on inside the black box so one of the things I did do consciously and I don't know how necessary this was in a sense is actually reduce the UDP traffic running between things so there's certain things you can do to reduce the amount of data you need to send out to every client and that kind of does help I think the processing and of course your kind of network to other people so that was one thing that we did and also trying to deal with open simulator not doing so many things at once so being super fed happy trying to kind of control it a little bit and because of course I had the potential to disrupt and physics actually was another area we looked at but to be honest physics is not a big issue and all of these kinds of things where we have you know where we have a lot of activity where people are just sitting down and they're not placing any stress on physics at all so you know especially with bullet sim bullet sim was it was coped with the physics load we're asking of it very well so that was not an issue so just to go through the things we actually did do which we'll see in a second so the first thing was there were various configurations weeks so so this was really again reducing the UDP traffic and some of the other communication traffic that we were using so the first thing is something called rotation update tolerance so this is the fact that when you rotate your avatar as I won't do because I'll probably screw up the voice is that every for a certain rotation you're going to actually send that information to other people observing that avatar and actually see that they've rotated and it turns out that you can actually kind of like get away with not sending that very often because you're only seeing a certain number of kind of avatar from a certain number of angles anyway so you can actually reduce that so there's a kind of a config setting and that's not default at the moment but you can kind of increase that to reduce the UDP traffic load there things like child reprioritization distance I won't go into because we're running out of time but that was a thing to reduce the thread work and that basically says where your avatar is to a child region so if you've got a region next door and you want to know what the chat distance is and whether somebody should be able to hear chat then you kind of need to keep communicating that data to the next door server but we weren't interested in the keynotes because in the keynotes anybody can hear anybody anywhere so there was no need to send that data so I kind of bunged the number up to stop the messages going through so that's kind of one of the things that was confirmed specific and finally there's this thing called child terse update period which is actually observing the neighboring avatars in a region and again it turns out that you can actually skip a lot of those messages so I actually ended up skipping one out of every four messages from neighboring regions and there's no finite observations there is no actual discernible difference sorry we skipped three out of every four alright got it so again another set of configuration tweaks was to thin and UDP there's this thing called adaptive throttles which changes the throttled settings automatically depending on whether you're losing any messages or not and to be honest I thought that very high load that did not work at all well I don't know if it was something we were doing but we saw very bad behavior of the fossils so we ended up disabled and that's probably something I need to go back and look at in the code itself there's one reason why there's a separate branch for this stuff and it's not just mainline stuff because I don't know if you know we've got to fix that properly but we ended up imposing limits on the scene and client so there's a maximum 400 megabits going out from the scene and a maximum of one megabit per client so even if you have that set high on your viewer you're only going to get one megabyte bit of data so that's another control point and I'm going to leave five minutes for questions so this is probably going to be so some of these tweaks will be standard there's actually a lot of tweaks which have been standard kind of keynote specific so they're going to be things that if you're trying to run an event like something on the keynote then you're going to want to kind of look at that manually but they're all configuration settings right there's no code you need to change so I'm going to very quickly talk about software changes so some of it was pure performance we had to cut well I ended up changing some of the way XML was processed from being as a streamer sort of as a document document is very easy to do and it's very handy but it ends up loading everything into memory at once and sometimes when you've got huge kind of attachment assets you end up taking it all the bettery and the garbage collection also appears to be very high and CPU usage is also very high so I did some work to kind of process that stream instead there's kind of perceived performance so one thing you might notice is that moving it for yourself, for your own avatar is generally very good and that's because we're kind of like we're not queuing we're kind of avoiding one of the queues for those movement packets and I actually think that's actually a good thing because when you move your avatar you actually want to see it move I think when you move your avatar and it's laggy or you're not moving at all and I think that my perception is that that's a bad experience and I think it's better to actually kind of prioritize those but maybe that's another somewhat debatable kind of thing so again the major thing there was to make OpenSim vastly less thread happy and that was actually done by queuing certain kinds of processing so instead of trying to do all the appearance and attachment changes at once instead we actually queued them and we do them sequentially and same for handling some incoming or going UDP stuff but the problem with that approach is it's complex because you need to desperately avoid one task not blocking all the others when you start doing sequentially you've got to stop one blocking so you've got to be very cautious about that stuff and I think that's going to be kind of an active thing so really I know there's a brief start on future directions but I'm not going to talk about too much basically I do think there's room for more efficiency and there's always room for bug fixes so as you saw we're not on the top of the CPU usage for this region even yes so you'll see that slide flash by and this was kind of like a very quick talk I did have two of these slides but thank you very much and I'd just like to ask are there any questions I'm going to ask for Q&A I really gabbled through that crazy I feel like and it was the most important performance tuning that's a very good question it's very hard to say actually because what happens as you can probably appreciate is that you fix one kind of performance problem and you almost immediately run into the next I would actually say from my perspective the most important thing was actually making avatars move a lot more responsively your at least your own avatar not other peoples that actually really makes me feel better about moving into the environment and that's very much perceptual thing because it's not necessarily improving performance anyway you're just kind of like changing the priority of how this stuff goes out so actually I would say that was actually the key thing for me which is interesting because it wasn't like a pure performance thing so about slow request or anybody who runs a simulator to expect to see the huge number of slow requests information and yeah the problem is those can originate from so many sources I mean the place where you do see those is when you do outgoing request from the simulator usually so I would say that a lot of the time they're actually services, actually services can be very slow to respond in certain situations and I didn't really cover that and I think especially the asset service can be slow to respond and again you're putting a lot of data out there I mean there is also this debate about whether a file system is better for storing assets or a database and from what I've kind of read out on the net there's no consensus to this issue I mean some people say file some people say database there's kind of no clear hey one's better than the others I know some people are much more keen on file but I'm not convinced because I'm not seeing anybody really give good numbers as to say one's better than the other no sequel I think the stuff that actually Tranquility talked about today was very interesting because he's talking about being able to scale these things especially in heavy inventory situations certainly that would help for services if you do have slow service response that could be a very good thing the thing with no Starlacing processing is actually managing that so one of the things I don't know and this is a complicated scenario and really you know I'm just doing this myself and I don't always know what other developers think of what's happening but it's almost to the point where you want a scheduler in open sim itself to schedule all the tasks being attempted I'm kind of a bit uncomfortable with that because it's kind of almost something like you should leave the operating system to do but we have so much of this asynchronous stuff going on and some stuff is higher priority than others that it's almost to the point where you kind of want to manage that much better than the kind of fairly ad hoc ways that we're doing it now so that's actually kind of an interesting future direction it would be nice to kind of explore as to whether we can much better manage and get to much higher CPU utilization for a simulator than we have now so I mean you know serendipity I would love to see people experiment but what I always want to see is hard numbers right I mean there's a lot of anecdotal talk about hey approach x is better than approach y but I'm kind of a hard-dated person so you know I would want to see those approaches attempted and say hey here's my study this thing's ahead of a lot better because hey look at these numbers and that would that's the kind of thing that would convince me other than that you know I kind of like I'm always a bit skeptical I'm the cynic I mean I think there is a lot of room for improvement I don't think we're anywhere near a very great software efficiency with this stuff and some of that is because it's a complex kind of architecture right it's like running a really complicated website even a simple grid it's like a complicated website and as you know there's websites approaches in a very different way I'm not talking I'm not saying anybody has to be on the scale of Facebook for instance but there's many different ways of approaching these problems and kind of like you know it's getting more technical I don't want to say anything so but to be honest the thing is I want people to use this stuff right I want the basic I want the out of the box experience for OpenSim to be a good one I want it to perform well and maybe we make some compromises we do for instance have SQLite as default database and that's because you know it's very easy to get going with SQLite I've said it both ways and in two sentences whereas my SQLite is definitely performing better in any situation but at the same time you know you want to be accessible but in general I want a good out of the box experience I'm always keen on improving the basic services but at the same time I have to say that if you've got a lot of you know technical capacity to throw at something then there are inevitably improvements you can make if you're running a big grid then you know if you soup up your services maybe you do something fancy with Apache Cassandra then yeah that might be better than running out of the box OpenSim and I think that's just inevitable unfortunately I just think that's one of those things it's like running a complicated you can't just simply bung Apache on and have that have that serve every website ever you know at a certain stage of complexity you end up doing more stuff inevitably so you know that's just my feeling on that kind of stuff so I don't know if uh so custom mono settings Clovis so there's a lot of threading settings that OpenSim and at the moment we have a kind of a default thread pool setting which is massively over committed for the number of cores anybody ever has I'm not sure that's the best thing but it's kind of like a complicated question because when you the luxury you have when you basically send anything off to its own thread pool thread is that if that thing is held up for some reason maybe it's making a request to a back-end asset service maybe across the hyper grid and that asset service does not respond in time then that's not holding up any other work or the other threads can still proceed so it's kind of nice you kind of get rid of the problem of trying to manage that stuff if you just throw it on the thread but of course if you just throw it on another thread you're trying to process like 500 things at once as we have seen from my stats you do end up doing that in certain situations and the whole thing just seems to grind to a halt basically as you just frantically try and progress 500 threads on a 24 core system it just doesn't doesn't work very well so physical servers versus cloud servers I've I've seen OpenSim run very well or at least run well for lower concurrency on EC2 services and of course I know Kitely are using EC2 I believe they're doing it in such a way that they use very very large instances and they kind of manage kind of individual regions on that they don't kind of like spin up lots more of the instances but other than that I mean it seems to work very well I mean the virtual stuff has come on hugely in the last period it's how so many people do things anyway with cloud systems and all the rest of it so I think it's very well and of course if you do virtually it can give you much better management options to kind of like be able to spin up simulators and not have to kind of relate those necessary to physical servers and all the rest of it so I you know I think it's perfectly fine but of course you do want to make sure you've got sufficient CPU capacity and that kind of stuff yeah I mean the cloud situation is that if you my belief is that if you know you're going to be using a certain amount of physical kind of processing then you are better off getting the physical machines I think it's more kind of a thing that in some situations for instance education some of the education people I've talked to is that they actually do have large numbers of machines so a virtual kind of system is very handy for them because they can kind of like control the number of virtual machines they've already got a lot of physical servers but if you're just kind of running a grid and you know you're going to use a physical server then maybe that's better but of course there's a question of how do you determine fairly unused sims if people are holding a party say on two sims running on the same server you know not ideal world you might kind of separate them out and run on different servers but of course that's kind of like you know you're getting into complicated management situation there so I don't know the answers these are all very good technical questions okay so does anybody kind of have a have a last question here sorry a last question just type that that might be better so actually one thing I did want to say quickly is that yes the ET on ghost setting master is actually what I was just about to say so as you know I've been doing a lot of this work on a separate branch called ghosts and that was originally because as you may have noticed at the conference nobody can bump into each other there was a hack done fairly quickly to stop avatar to avatar collisions back in the time when I thought physics might be an issue or appear to be an issue and that was more kind of a bot issue in the end but then of course it came in there are certain other changes I wanted to make very quickly which I knew would work for the conference but I didn't know necessarily would work for every single use case out there I think many of them will might you know might have more debate from people and of course we can always do that in master it's always open to debates and questioning you know they're not fixed in stone but I do want to go back through the changes I've made and kind of like think about especially fixing some of the hacks like the avatar to avatar collisions and some of the other stuff and then and then once that's kind of like fixed up we converge to master and everybody can just kind of use that bunch and we can go forward again so I hope to do that fairly soon I do need to take a holiday I just really take a few days off at some point but hopefully they'll come back no hopefully but that will that will happen fairly soon and then kind of everybody can see the kind of improvements hopefully we've been able to make okay thanks a lot everyone