 Thanks. Hi, everybody. My name is Evan. I work on a project called StatusNet. And today I'm going to tell you about how StatusNet works, go over the architecture, go over why we created it or why I created it, talk about what StatusNet does and how it does it. And I'm also, because we're in the scalability track, I'm going to talk a little bit about how Identica scales and how StatusNet scales. We're not Facebook. We don't do a lot of the things that Facebook does, but for people who are just starting, if you're making a move from, say, a medium-sized website to a large-sized website, we may have some interesting things for you to think about or do. Other people might find some of our scaling tips less interesting, but you can just follow along or feel smug or whatever. So I'm going to go through four parts of this in this talk. The first one is about the problem that StatusNet tries to address. Sorry about that. Is that a little whistly, huh? I should think. Anyways, the problem that StatusNet tries to address. Second thing is about the requirements, kind of factoring down what we really wanted to have come out of the StatusNet project. Third and the very longest part is the architecture of StatusNet, the different parts, how they work together, and that's specifically where I get into the scaling issues. Finally, I'm going to talk about the future and where StatusNet is going. Can I ask a real quick question? How many people in here are on Identica? Awesome. That's great. I like seeing that. How many people have taken a look at StatusNet at all, the software? Good. Install StatusNet and set it up. Okay. Okay. I want to see more of those last hands. That's really where I want to be going. Also, I know this is kind of a bold statement, but it's something I think that's really important for us to do. Say, is that in 2010, the most important software that we as free software people can be working on is web software. Web software is where it's at. We are splitting up the desktop paradigm into small clients and large clouds, and if we want to be players in this world, we have to be part of the web. The web has become social. Sociality is an important part of what people use the web for, and finally, social has become Status. What do I mean by that? People asking this question, what are you doing? What's on your mind? What are you working on? These little boxes pop up everywhere. This is what people interact with on the web. A little box asking you a question. Status matters. Status is an important part of the social web. It's a very popular part. Everyone who knows about the popularity of Twitter knows that this is an important part of the current landscape of the web. Everyone who knows about Facebook knows that status updates are a central part of their system, but these status updates are, for the most part, completely disconnected. Status updates from different parts of the web don't come together, and this mesh of Sociality is not completely connected. And these systems are, by and large, closed. I think that it's fascinating to see how many large websites use open source software to build their very large proprietary systems that we don't get to see the code of. It's cool that we get to see the infrastructure stuff, but for the most part, very big websites keep their code to themselves. You can play with the client side. You can have an API or some kind of data feed sometimes, right? But the server side belongs to someone else. You can't control the rules of how the server works. And for the large part, that server side represents you to the rest of the world. If you have an account on one of these systems, that server side is you. The client side is just a little listening device. We need social software that we can own, that we can install and manage ourselves. We need social software that meshes into the existing social web. This social web right now is huge. What Facebook has like 400 million users. Twitter's got another 70 million users. MySpace has, God knows how many users, right? It's like billions of people out there are using the social web. If we think that we're going to transfer everyone to a different social web, that's just not happening. We have to mesh into what's going to, what exists right now. But, as usual, open source software, free and open source software, can lead the way to a more open web. Communication revolutions on the internet have always depended on free and open source implementations of open protocols to make standards move forward. Because people who are experimenting with these standards in no cost, hackable way to get connected into this new network, into a new network. Email had send mail, the web had Apache, blogging had WordPress. We need free and open source software for making status networks. Now, some of you are probably saying, wait, it's kind of trivial stuff. We're hackers, we don't really mess around with that kind of stuff. It's kind of silly, it's a, we don't do social, we're not social people. We do programming languages and text editors. That's our hacker stuff, right? Social software is for those shallow jerks that really care about SEO and personal marketing and stuff, right? That's not us. Well, that's wrong. That's just not the case. Social software is how people connect in our time, in 2010. It is the most important software that people are using right now. It connects people with their friends and their families. We all have them and for many people, their most important relationships are made and maintained online. If free software and open source software means anything, if it's going to be meaningful and relevant in the world of the 2010s, then we have to be participating in the social web. And we have to have a presence there. If we can't take on that challenge, pretty much what we do is pointless. We should just give it up because the world is moving to social software. And if we are not moving with it, we're blowing it. So I'm going to reiterate one more time. We need free and open source web software for making status networks. All right, so you can tell that I care about this stuff, right? So I'm going to kind of lay into some of the requirements that I see behind what we can do with that kind of software. First and foremost is that I think that any kind of social web, social status software has to work with the internet as it lives right now. I know that this is going to sound really basic to most people, but I get these questions all the time. I don't think we can require changes in the web architecture just to make our software work. This is about playing catch up. And we need to work with the internet as it is now, and not try and make a change. That means that the internet as it is now has many small unaddressable clients. We're all using one right now, right? Your IP address that you're using right now, you didn't know 10 minutes ago, right? And no one can reach you on it yet. And we have a few large unaddressable servers, our web servers, our email servers, these are places, a hub and spoke architecture. It's the web, it's the way things work right now, right? P2P mesh networks, they're awesome, they're great. I'm really looking forward to when we have a P2P mesh network and our key going on and it's going to be awesome. But if we are going to wait to participate in innovation on the social web until we've got the P2P mesh blah, blah, blah Utopia, we're doomed. That's a long way down the line and I'm not going to wait for that. However, I do think that the way that we can be participating in this social web is by providing the piece of this social web that is distributed and decentralized. SMTP is a good model of how a social web can work that is distributed and decentralized. We form natural groups around workplaces, areas of interest, geography, or just where you get your network connectivity, your ISP or whatever. And then these networks are federated through simple, really simple, agreed upon standards. I think that social web software that we develop has to be private or public. We have to allow one or both. Different kinds of groups are going to approach social networks at different cases. Some people are going to be really ready for this meshed out, distributed, decentralized system. Other people are going to want to set up their own private social networks, see how it works first. And that's something that we have to be able to do. Fourth and very importantly is scalability. Now, scalable is the way that web people say not broken, right? You know, they say, is it scalable, right? And that just means, is it going to work, right? So when I say scalable, I mean, can it work at different levels on a size scale? If we look at like a logarithmic scale of the size of a community or size of a group going all the way from one person on that far left side, up to hundreds of millions of people over on the, over on the left side. You know, we need to be able to reach across that scale. Groups as small as an individual or a family. Going up to, you know, larger groups like a school or a church or a small company into much bigger groups like a global web community. This would be like a slash dot or something. A large town, a university, or a large company. All the way up to very large groups, very large installations. And ISP or a mobile phone network that would be providing an installation. Or a huge consumer website. I think our software that we develop has to work across this entire scale. It has to be easy to install on that small end. An individual should be able to set up and use our software. But there should be a clear upgrade path as groups either grow or shrink or the software is applied to different purposes. Finally, I think that any kind of important software has to follow the developing trend in social software of being accessible anywhere. By that I mean that you should be able to reach the software using a mobile device, using a desktop device. It should be accessible using other websites. And people have just gotten really used to being able to get their data in and out really easily to different devices. So we have to have secure APIs built into the software that provides remote access. So now that I've laid out my requirements, I'm going to go into the architecture of StatusNet. And I'm doing pretty good on my time. This is where I do deep dive, right? So if there are anything in here that sounds like really crazy or weird, ask me questions at the end because there's probably a good answer. So in developing StatusNet, this is also where I go from these like big generalities to my actual personal ideas and belief and where the rubber hits the road, right? And if you're developing an open source web piece of software today, it means LAMPstack, right? You can take all your different frameworks and blah, blah, blah. If you want wide distribution of an open source platform today on the web, it's gotta be LAMP. Now, LAMP is like Linux, Apache, MySQL, and PHP, right? You can like switch stuff around with, put some free BSD in there where the Linux used to be or put some lighting in there or PostgreSQL or whatever, right? But the basic architecture is gonna be a web front end, PHP doing some execution and an RDBMS providing the back end. And this is what's available on commodity web hosting right now. It's actually what's available on mid-sized web hosting. You have to kind of move pretty far up the line before you start getting any other kinds of programming languages or runtimes that you can use. So anything that's supposed to have a wide distribution, supposed to be used by a lot of people, is gonna have to use this kind of stack. If it's gonna be easy and cheap to install, it's gotta be LAMP. The other great thing is that there is a huge community of floss LAMP developers out there. There's great packages that are already in use. So that's one of the main reasons that we want with LAMP or I want with LAMP in developing this software, MediaWiki, Drupal, WordPress, the list goes on and on. If you want to reach out to that great group of developers, you have to meet them with the languages and the systems that they're already working with. With the StashNet software, I chose not to use a web framework like Cake or Symphony or whatever, right? There are a lot of really nice web frameworks for PHP. I didn't decide not to use them. I got a couple of reasons. One is that these frameworks tend to be aimed towards people who are developing software for their own internal use that can be a little bit difficult to make software that you're gonna redistribute and share. But the other thing that I find is that when I'm using frameworks, I find myself writing to the framework instead of writing to the requirements. I really wanted to write software that was gonna work for the problem at hand. So StashNet, as it is, does not use a web framework. But it uses a very simple model for its web processing. That practically anyone in this room who's ever done anything, even remotely software development is going to easily recognize. We have a front end controller system that based on parameters that come in through get or urls, instantiates a particular action. That action uses one of a number of data classes to access the database, do operations, business operations on that data, and generate output for the user. Our controller uses a pair library called neturlmapper, which is kind of nifty. I think it's pretty cool because it generates urls backwards and forwards. So if you provide it with a path, it will generate out some parameters. If you provide it with parameters, it'll generate out the path. I think that's pretty neat, and it works pretty well. DB data object is the framework we use for database access. This is another pair system. It's okay. It's based on, it can use either pair DB or pair MDB2. Does not yet support PDO, which is kind of depressing. But it'll be nice when it eventually does. I think pair libraries are great. I totally recommend people use them. They're usually the best thing that you could get to, and hey, someone's going to work on them eventually, right? Or they're going to fork them and make a new one. So I like using pair libraries. Some of the things that we model in our system, and this is where I start getting a little bit floaty about our system. Our system has users, obviously. Users have a presence. We also model remote users in our system so that if you subscribe to someone over a remote system, we keep a kind of mirror record of that person's profile information. We retain profile information. Profile is information about you. What's your name? What's your biography? What's your location? What is your homepage? Some things like this. We also keep avatars in various sizes that are stored with the profiles. So like I said, this is what the profile data includes. The model of our system, it's called following or friending in other social networks. We call it subscriptions just because that's kind of not very weighted. And it works okay for business use as well as personal use. Saying that your boss is your friend doesn't always make a lot of sense. And telling your friends to follow you also seems kind of asinine and rude. So simply subscribing to someone's information is kind of neutral and that doesn't really carry a lot of loaded sense to it. However, we do store subscriptions as a separate entity within the system. Subscriptions have some relational attributes about them, dealing with how information in the subscription gets delivered. Finally, and most importantly, in this kind of core triumvirate of users' subscriptions and notices, is the notices classes. These would be called status updates or tweets in other systems. We call them notices, again, because tweet is a stupid name. Notices have a considerable amount of data attached. Some of the things that are pretty interesting about the way that we stored notices are plain text objects. They can be a variable size. They can be limited down to 140 characters to be Twitter compatible. They can also be of infinite length if that's what you want to do. They're rendered at save time. We do a rendering process where we take all those funny little codes, like the little at sign and the hash sign, and we turn it into HTML. We derive all the information about the extra metadata about the notice at that time. They're stored individually. We don't store them in any kind of bunch or bundle, which some systems try to do with messaging. They each have a unique integer ID that is an auto-updating ID. And they refer to various other records in the database. Their own author, they may be in reply to another notice. They may be a repeat or a retweet of another notice. We also keep a conversation ID that's a unique thread ID. If I make a reply to you, you make a reply to someone else. That thread ID stays with all those notices. We retain information about tags in our system. You can include hashtags in a notice. Those tags are retained as a separate record for easy indexing. We also allow tagging of profiles. So you can tag people that you know and group them together. And you can use those tags to organize people that you know. Or in a community, people who have common interests can tag themselves with terms like Ubuntu or Fedora or skiing. And you can find other people using those tags. We also store information about favoriting. This is saying that I have favored this notice. I think this notice is good or bad. Here, I've been talking about trying to use professional sounding names and we do use fav for this thing. One of the things that's different about StatusNet from some other status sharing systems is that we have a concept of groups. Groups are like a email mailing list. So you can address a notice or status to a group and then we'll get distributed out to everyone in that group. Whether or not they're individually subscribed to you in particular. So if you have a group that's organized around your particular work group, or people who share a common interest, one of the largest groups on Identica is around Ubuntu. I think it's greater than 4,000 people on this mailing list or on this distribution list. Groups have a whole bunch of data that's associated with them. They have members. They have logos. They have descriptions. They have some admin lists, things like that. And direct messages are a different kind of structure that we have in our system. Direct messages are very similar to notices. However, they're only for distribution between two people. So we store them in a different place. The sender and the receiver are the only people who can actually read them. Like some systems, we store attachments. So you can actually attach a file or a link to a notice. And that attachment stays with the notice. You can use this to say upload pictures or add sound. StashNet makes actually a pretty good low-cost podcasting system. You can just attach an MP3 file to a notice and there you go. You've got simple podcasting. Attachments have their own little subsystem in our architecture. They have, we keep a record within the system. And we also keep the files out on the file system. Various bits of plumbing that are related to serving web pages. So things like setting Remember Me tokens, web sessions. We have OAuth data that we use. Another thing, and this is where I start talking about some design issues. So for those of you who've been wondering what I'm going to get to that, here's where I get to it. We have inboxes in our system. I think I was talking, this is the data structure that lets us serve you and your friends what's been happening to you. This is by far the most important part of our system. It's where people go first. Because if you're reading the web page, you really want to find out what's going on with you and your friends. Our API, which I'm going to get to in a second, this is the equivalent API call for this is the biggest hit-getter for us on Identica. I think something like 40% of the hits that we get are for the inbox. You and your friends, friends timeline. And so what is this data structure that I'm talking about with inboxes? Basically, I think I've laid out that you have a user who might be subscribed to another user who posts a notice. Now, one way that we could create this inbox, and in fact, the way that I originally kind of spiked this out, was that we did join. We said, hey, what are the most recent notices that have been posted by someone that you are subscribed to? Incredibly inefficient, really, really bad idea. It turned out to be really viciously underperforming for us. It's like a three-way join across three tables, very difficult, nicely structured, great that way, but very bad performance. So somewhere along the line in the development of StasNet, we moved to write time delivery. So this means that when someone posts a notice, instead of later that you go and lazily say, what has he said lately, it gets delivered at the right time. So every time that someone posts a notice, we have a data structure for noticed inboxes where we put a reference to that notice into your inbox in the correct order. Now, formerly, we had a table that looked something like this. It was a notice inbox table that said, this user has this notice in their inbox. This worked out OK a little bit. It was a great improvement over the join there, but you can imagine that this gets really, really big, really fast. People who have hundreds of subscriptions and each of those subscriptions are generating five or 10 notices a day, and you have hundreds of thousands of people on your website, pretty soon this table gets completely unusable. I think that when we finally moved away from this system, we were at like 200 million records in the table. So we instead moved to a denormalized system that keeps a great big blob of packed notice IDs. So each user has a big blob of packed notice IDs. This is the system we currently use. It is very fast. We can update it atomically using some tricky syntax that works pretty well on both MySQL and PostgreSQL. I'm a little scared to see what's going to happen if we ever try and port it to any other RDBMSs, but for right now it seems to work, and it's really, really fast. So that's the inbox. Some other parts of the system that I kind of want to go over, the architecture, our plugin architecture uses an event hooks model. That means that in different parts of the code, there are marked as events, and plugins can register an interest in those events, and they are called when that event happens. This works a lot like MediaWiki does, and the reason is is that I wrote MediaWiki's system and I wrote this system too, so they're about the same. We have all kinds of different events that happen in the system that plugins can hook. Some plugins are very interested in modifying data, other plugins are more interested in modifying the UI, but they can hook events like when the site logo is shown, or when a new notice is saved, or when a new user registers, or when a user has entered their password and they want to, and we're authenticating it. Every important event in the system, in the lifetime of the application, and its dataset is hookable. At least we hope so. That's what we'd like to get to. It's not quite there yet. The kind of neat thing is that plugins have a lot of options with these events. They can just examine the event, they can watch it go by and maybe log it, or have some side effect. They can alter the event. They can change the data as it's going by. They can also reject the event. We use this for, typically for spam filters. So if a user tries to register and they're coming from a known bad IP address, or they're using a bad nickname, then we can reject it. Or if they post a notice that has a bad URL in it, we can reject it. Finally, they can completely replace the default code, and we use this for things like our LDAP, authentication plugin. Plugins can also add their own actions, like I was saying before, actions are what show a page, or show output, or do something in the web world. They can add their own data types. All those data types that I just listed out are just the core data types. We have all kinds of data types that happen within our plugin environment. They can add widgets that show things, different kinds of things on the page, and they can also add handlers, I don't know why I wrote that. So speaking about LDAP and authentication, our security architectures, relatively simple but I think it's pretty robust. We have a Plugable Authentication system, which means that we have just a username, password system in the core, right? It gets you up and running. But because the system is very plugable, we have a number of different plugins that can use different kinds of authentication. We use OpenID authentication in a plugin. We also have Facebook Connect authentication, sign in with Twitter. We have an LDAP authentication plugin in our core system. So we think that our authentication is flexible enough that pretty much any kind of authentication that you wanted to do, barring the really weird stuff, we could probably get away with doing in our system. We also have a role-based authorization system. That means that we assign roles to users within the system and those roles get particular rights. We have a bunch of default roles, but as I've been kind of pointing out, different roles can be added or overridden by the plugin. So the default behavior of a role might be overridden by another plugin. Another thing that we have that's part of the system is a Twitter-like API. I wish that our API was a little weirder or more different, but it's very similar to our web architecture. The same kind of browser interface that I was talking about before is the way that our Twitter-like API works. And if you think about it, that's really the right way for it to do it. It's a web-based API. And so it uses the same kind of system. It just puts out JSON or XML instead of HTML. We support SMS kind of. We currently have this kind of bastardized SMS to email gateway system, which is pretty chunky. We're gonna be replacing this in an upcoming version with a little more robust SMS gateways. There's not a huge demand for SMS use. For us, SMS is expensive if you get up to any kind of level and people don't seem to wanna do it on the public web. But we use a model that just kind of abuses existing SMS email gateways. So if you give us your phone number in your carrier and we know the carrier, we'll get out the well-known email address for that number and we'll just use an email gateway to send you SMS messages and receive SMS from you. It's not great, but it works. One of the most interesting things I think about Stashnet and one of the things that I was most doing okay, one of the things I was most excited about early on is the open micro-blogging standard. So open micro-blogging is our standard for remote subscriptions and remote delivery of messages. It's a home group protocol. I just kind of made it up because there wasn't anything that did this at the time. This was 18 months ago. It's a person-to-person subscription so you can only subscribe to an individual person. It doesn't support groups or subscribing to a list of people. It also only pushes plain text. It uses OAuth, hey, I got one in there at least. It uses OAuth for subscription and kind of the semantics of it is that you authorize a server to push someone's notices into your inbox. We have an XMPP interface. This is effected using the XMPPHP library, which is a great library for XMP with PHP. We have an offline daemon that receives notices, or excuse me, it receives messages from people using XMPP or Jabber, and then it gateways those into our system. We have a Twitter interface and a Facebook interface. Like I was saying, very important to be plugged into the existing social networks. We push notices out into those networks so if you post on your own StatsNet system, it pushes out to Facebook and Twitter if you set that up. You can also pull subscribe notices back into your system to read them on your own. It's a user-to-user tunnel, so in order for it to work, in order for the Twitter tunnel to work, you have to have a Twitter account and you push through your Twitter account and pull back through. It uses your account on that remote system. So a lot of the work, I said some work there, but a lot of the work that's kind of time-intensive or uses external systems or fans out pretty high, we use offline queuing servers to affect those that work. So things like outgoing XMPP messages, outgoing SMS messages, and these OMB messages that go out to other systems. This all happens on backend systems. Optionally, by default, they'll happen at web time and usually we expect that you would have very little of this configured on a system that couldn't handle having some offline processing. These are PHP processes and the main reason that I did this, PHP isn't really the best system for creating long-running demons. It doesn't have a threading model. Creating child processes is a little crazy and complicated and a lot of libraries aren't really built for running for more than like two seconds or five seconds, right? So they tend to leak a lot of memory. But we've offset that downside with the upside that if we use PHP for our offline queue handlers, we can use the same libraries and use the same, most of the same stack that we use for our web system. So consequently, our queue handlers are very small and dense because they use the same libraries that the web system does. Now, the downside of this is that a lot of our libraries are leaky, we try to patch them a lot, but we do see some growth in memory with all these child processes. Fortunately, we've got a pretty sophisticated memory management, well, it's not really memory management. It's like bad child management. If children start growing too far, since they're only taking on small tasks each one, if they grow too big, we'll kill other child processes that are too leaky while they're idle. Our queuing is, once again, pluggable. We can use a number of different queuing systems. One is a pretty clunky, DV-based polling system. So basically, we just dump stuff into a big table and check and see if anything new is in that table. If you can run something a little bit better, we support the stomp protocol. We have a stomp plugin, so servers like ActiveMQ or RabbitMQ will work too. We're looking at some of the other offline queuing servers. There are a lot of great ones out there, and I've personally done plugins for Gearman and AMQP which is the native protocol for RabbitMQ. There's a very interesting pair of servers called Kestrel and Starling, which are what Twitter use. Another interesting thing about our architecture is our location support. We use structured locations that are identified by integers. The integers are namespaced using another integer namespace. So for example, this location, Brussels, Belgium, is stored as two integers, that long number, and then one says it's a GeoNames ID. We support other databases of locations like Yahoo, Geo, OSM, Nafer, name finders, so that's W-O-E-I-Ds. OSM has their own system, and we actually can have other vocabularies, once again, it's pluggable, and even private systems. We also store lots and longs if we need to. Brussels may not be at the right granularity, you might need more in location. We support pluggable systems for real-time in the browser. We support a lot of the comment-based systems. I'm really interested in using more XMPP, but we haven't had really time to make that happen yet. So down to the scaling part, I know everybody's waiting for that. What have we done for scaling? There are some things that we support for scaling, and this gets us pretty far. One is that we move stack files off of our Apache system, and you can move them into a CDN or just a remote server. So we let Lighty or Nginx serve our stack files, and then we keep Apache just for doing PHP-type stuff. This really kind of lets our Apache systems go crazy. So we move practically everything that doesn't have a heartbeat off onto external servers. Second is that we use a master-slave replication for our database server. Most of our hits are gonna be read-only. Not entirely, we get a lot of, you can imagine that we get a lot of pushes, a lot of posting, and that has a lot of, like I said, a lot happens at right time, so that has a kind of cascading effect, but most of our hits are read-only, so we can mark each action, we do it as granularity of the action or page, and we direct those queries to a slave server. We also use memcached almost to a fault. We've taken that dbdata object, we've added a wrapper around it that just pretty much catches everything, catches anything that moves. We cache on read, we cache on write, we cache single objects, we cache query results, they're compound objects. We even keep old invalid results that have been invalidated by new data, and that way we can, so if we have a list of notices, we can just prepend new results onto them, and that really speeds things up for us too. We use pluggable caching systems, so we have different models. We use memcached by default, but you can also use apc or xcache variable caching. You can cache the disk, you can actually just cache the memory. And then finally, one of the main things that we do for scaling is that we try and push as much as we can to offline processing. So this is these queue servers I was talking about. We use queues for most of the less visible activities that notice right time. Things like distributing notices to somebody else's inbox. Usually if you post a notice, you don't notice that it didn't get to other people's inboxes immediately, and they don't notice that you just posted it. So we have a little bit of spare time in there, a few seconds, like 30 seconds, maybe a minute, before you actually notice that it's going out. And so we can actually do that work offline. Good, I'm glad I'm over with that one. So I wanna talk a little bit about the future of StatusNet. I think I'm time's up, I'm gonna go fast. I'm almost done. We've got a new release coming out in two weeks. It will coincide with the launch of our new StatusNet cloud service. We have a private beta right now, if you're interested in using the private beta, that is the coupon code to use, fostmx. And we will get you your very own StatusNet site if you wanna play with it. Probably the neatest thing is that we have a new remote subscription system setup called O-Status. It's based on these cool things instead of my hacked up system. It supports this great stuff. We're also gonna be supporting privacy, not in this version, but a later version. We're gonna be supporting more data sharding by user and by time. And I hope to see better integration with existing software, especially existing free web software. I'd like to see us have a JavaScript widget only interface, more federation protocols, have more web cache friendliness. And most of all, I wanna have more people involved. So if this thing is gonna go, you're, I really hope that you can be part of it. You can find us on status.net or on Getorius, statusnet. We need help with plugins, themes, core development, integration, testing, promotion, and use. Thanks a lot. I am over my time. So thank you very much. Thank you very much, Evan. Well, now we have about a little less than 10 minutes for question and answer. And during this question and answer section, I would ask you to stay seated or if you want to leave to be very quiet in respect for the speaker and for the ones who have questions. So are there any questions? Just show up and we'll come to you. Where is it going? Way up there. Well, I wanted to ask if your open micro blogging protocol supports search, not only a notification but search between the different sites. Search like certificates, like keys. Searching from one... No, it doesn't support search. Nope. No, and it probably won't. Distributed search is a really hard problem to make work in a decentralized system. I think that any time for the foreseeable future we need to be able to share this data out to centralized search systems and that's really the only realistic way to make this kind of thing search. We do support search in each individual instance and so you can search your own inbox, you can search stuff that's happened to you but a global search really depends on having a global search system. Thanks. What's advantage does Apache have to you over light HCTPD or engines for PHP? For PHP? Yeah. It's like, I don't know. We just, you can use our software with either one, right? You can use it with Lighty or with Nginx, I think. But we just happen to use Apache more often. I think we're just more comfortable with it and there are some tricky parts with using the fast CGI to make things work. So it's not grievous, it works so I think that there are some tricky points and it didn't work very well at scale for us. Yeah. Why did you choose not to use XMPP as the core platform? What's that? Why did you choose not to use XMPP as the core platform? Yeah, that's a really good question. I think XMPP is an awesome platform. It has a lot of the things that we need for distributed sociality. It's got domains involved. It's distributed. It's got a great authentication system. It has this whole federated model. It's got a client to client and server to server protocols. All of that is great. The one thing that XMPP does not have is the install base on the kind of commodity web servers that I wanted that I want our software to install in, right? So I really want our software to work in the same kind of installation profile as a WordPress so that you could set up your own social networking hub in the same place or maybe on the same server that you would run WordPress. And right now, those servers don't provide you with, excuse me, those kind of hosting systems don't provide you with an XMPP server. I think that there's a great opportunity, and I mentioned it very briefly in my kind of future thing, that there's a great opportunity in supporting more of the XMPP for microblogging and for federation. There's some really interesting work that's gone on in just the last couple of months with federation for microblogging using XMPP, and I'd like to see us support it, but I don't think it can be the only answer if we're going for that low end. Yep, can you share any numbers on the status of federation using open microblogging in terms of number of users or the federation and what you plan to do in the future to increase that interconnections between the identical domains? Sorry, say that one more time, I didn't quite understand. So do you have any numbers to share of the federation? So how many people are remotely interconnected? Yeah, yeah, so the last number, so one of the kind of cool things about status net is we've got, we get a little ping back from installations that are out in the wild, and we have something like somewhere between 1,000 and 2,000 public sites that are using status net right now, right? The numbers that we have on that, the number of users that they report is somewhere between 1.1 and 1.2 million users, so it's a fairly big chunk. Of that, 120,000 are on Identica itself, so Identica represents the lion's share of that network, but it's not the majority at all. So it's a big chunk, but it's not it. So there are a lot of people on this network already that are using status net, and I hope that with the launch of our cloud service, there's gonna be much, much more. We have something like 15,000 pre-signups for our private beta of the cloud service. We've been putting about 1,000 new sites up over the last few weeks, every couple of days, so my expectation is that will be somewhere around 20,000 when we actually do launch the cloud service. Sorry if you already answered that. What is the relation between status.net and LaConica? Yeah, it's a change in name. LaConica was the name that I launched with, people didn't get it, and the status net name is a little clearer for people. It just really says that you're gonna send out your status to your social network. I was very fond of the original name, LaConica, but it means short or brief, and it just didn't quite resonate with people and status net has picked up a lot better. That's all. Hi, here. If I understood correctly, when you talked about memcache D, you said you cache both your objects but also your query results. Doesn't that mean you cache a lot of things twice? Yeah, no, that's the, so we actually, when we cache query results, we don't cache the entire object, we just cache the primary key of the object. So if we, say if we cache your inbox, we don't cache all the notices that are in that inbox, we actually just cache the notice ID of each item, and that means that we're not caching twice and we don't have to validate in lots of different places when something's deleted or overwritten. We're kinda lucky in that our notices are immutable so you can't actually overwrite them but you can delete them. So the answer is no, but it is something that we worry about quite a bit. The nice thing is that because we just cache the ID in those query results, we got a lot better cache hits because the same notice will appear in lots of people's inboxes or in lots of places. Yeah, the more current notices almost always show up in the cache more often because more people are looking at them. Other questions? Yeah. So first of all, let me thank you for Statosnet because it's one of the success story which highlights the fact that free software is about freedom in all the software we use and social software should be no exception. So thank for this. Thank you. And then a kind of political question. So it's really unfortunate that Twitter does not support open micro blogging. I wonder whether if you have ever asked yourself it will ever happen. If that'll ever happen? Yeah, if it will ever happen to have this kind of support in proprietary. So if you had asked me like two years ago or a year and a half ago, I would have said no. I would have said that's not on the agenda. I think that for large networks, they would not be interested in seeing an expansion to a federated system because it's not in their interest. What they sell to their advertisers or to their investors is a captive network and if they don't have a captive network and they're participating in an open federated system, it's less of a good sale to them. That would be my point of view, say 18 months ago, a lot has changed since then. There's been a lot of movement along the lines of federation. Things are really happening very quickly right now. A lot of those subjects that I showed up in the O status slide, things like PubSubHubbub, the salmon protocol, webfinger and activity streams are really picking up a lot of usage. MySpace for example is the big push behind activity streams and they're sharing a ton of data out into the world using this activity streams model. So I think that with a free and open source implementation of an open social web standard, we might see more movement by the big networks to do it. I don't think we can wait for that. So I think that we have to use the APIs that these networks are providing in order to mesh into them right now. But my ideal would be eventually to see more of those networks providing federation between each other and with an open source implementation like status net. Okay, we are running short of time. So running out of time, there's one last question over here and then we're heading for the closing topic. How does open micro blogging and identity place itself relative to Google's open social? What's the interface with open social? What's the interface? Are you a comparator? The comparison, the comparison. Yeah, I think that open social, so open social is a, it's like an API that was developed originally at Google I think and it's got a lot of support from people. Different organizations are using it. LinkedIn uses it, MySpace uses it. I think high five and Bebo have announced support for it but they may not have it actually running. Ning uses it, Orkut has it in a sandbox. So there are a lot of systems that use this. Open social is a really cool system for creating widgets that run inside a social network and you can get access into that social network. However, it is not a federation system. So you can't use open social to subscribe to your friend on Bebo from a friend on high five, right? And for some of the services, if you tried to make a system to do that and plugged it into open social, your open social widget would get yanked, right? That's not part of what they think that open social is supposed to do. That said, open social supports a lot of the same kind of things that the same kind of development model that status net supports. So it's a activity streams or status updates are an important part of the open social model and we hope to have the same kind of integration that we have now with Facebook and Twitter with open social sites like LinkedIn. So it's not an open federation system in the same way that open micro blogging or the new O status is, however we think that we can use it in order to give users of status net presence in these networks. So good news for the audition. Well, we have 10, I was just been told that we have 10 more minutes. So if you have any more questions, take your chance, everyone is still here. So it's open again. There's more, wow. Hi, Facebook now keeps adding more stuff into your status. So you only have, in the beginning, you only had your status and now you can like or dislike other people's status or you can comment on them. Is this supported in the platform or will it ever be? Yeah, so our, let me see how to say this. Facebook has a really interesting model of how they do things and I love how they do statuses and I think that it's a lot more sophisticated than some other micro blogging platforms. So I think it's pretty cool how Facebook works. We support faving, but we can't support remote faving. It's just not part of the way the API works. We support attachments and we can actually support remote attachments that go into Facebook. However, our big problem with Facebook is that we can't pull data back out. So we can't take your posts and repost them onto an open network. It's just part of the Facebook terms of service. So and we try not to get kicked off Facebook because then we couldn't use it at all. I mean, I think someone could do it with StatsNet, but I don't think it would make a lot of sense. It would just get you kicked off. So I think that Facebook has a really interesting model and Facebook is part of the activity streams process. They're very active in it and my hope is that as they start publishing their feeds more often, we can actually be participating in that network a little bit more. Are there any more questions? Hello, so a question about enterprise use of status.net. So how many customers, just rough numbers, are using status.net for enterprise behind the internet kind of applications? So the idea is how many people are using it for a private network right now? Yeah, but more enterprise customers. Enterprise users, right? So I've got kind of two answers to that. One part is that enterprise users love status.net, right? They love it because they could take our software, they can install it inside their firewall and they can have a social network that works inside their university or inside their company or inside their government organization. Totally awesome. People are really using our software like crazy for this. Motorola is probably the one that people know the most. Very big company. They've been publishing a lot about it because it's their main part of their social media strategy within their company. But other companies like Intuit, let's see, SAP, Sun has a system. There are quite a few. Some of them have asked us not to use their names so I don't use their names for promotion. Other ones are customers, but there are big European banks, large American retailers, companies with hundreds of thousands of users who use status net internally. And we provide commercial support for those companies if they want to. We also provide the cloud service for that kind of company. However, our cloud service tends to be more for small and medium-sized companies. So the kinds of people that want to use a cloud service tend to be the 100 to 1,000 employee companies rather than the 100,000 to 300,000 employee companies. But yeah, there are a lot and they're very interested in our software. Thanks. You didn't say why you chose PHP. Is it for the install base? What's that, the PHP? Yeah. Yeah, that's exactly for the install base. So it's for that low-end hosting system, they're gonna have PHP, they're gonna have MySQL. That is like, you have to look long and hard to find a hosting system that will not let you have PHP and MySQL running on it. So the idea is that anyone should be able to take the software and install it somewhere and PHP and MySQL is that lowest common denominator. That said, there's also a huge number of lamp developers in the open-source community and they're the kind of people who are gonna be taking our software and using it. So that's the other reason why we want for lamp. Hi, Evan, another one. What's, or is there any kind of learnings that you could pick up from JQ or it's kind of JQ and kind of dead-pot it for you? Is there anything that we can learn from JQ? Or JQ can learn from you? Yeah, I think that JQ is a really cool system. I think that, let's see, how do I say this diplomatically? I think one of the things that we've learned about microblogging and status sharing in the last couple of years has been that it's really useful away from the SMS world. It's not as SMS dependent as people thought when JQ and Twitter were started. And I think that one of the things that we're kind of tracking now is how that microblogging world is more web-oriented than SMS and mobile texting-oriented. I think JQ is kind of keeping up there. I love their software. They have a great piece of software. There are some things that I'm not crazy about. Like I don't like the fact that I make a post and then you comment on those posts. I think that a post is a post. But besides that, I think it's a really great system. The big difference between JQ and Statusnet is that JQ's written in, written for Google Apps with a Google App Engine. And so there's a kind of a narrow installation point there. If you want to install it somewhere else, you have to use the kind of Django emulators and everything to make the GAE system work. And I think that's a little bit high-end for a lot of folks. But besides that, I'm really excited to see Django, excuse me, JQ hopefully support the O status. I know that James Walker, who's recently joined Statusnet, is eager to make some submissions to some of the Django-based systems. TypePad Motion is another one that's based on Django. And see if we can get them all using the O status network.