 Thank you all for showing up. I am so if we were doing things properly, Tim would be running this, but I'm pushy and I'm willing to tell people, hey, we're about to start. And so I'm kind of the guy. The other thing is I want to make sure that we get the list of people here, which I didn't. We didn't actually fully assign this. So we need to assign four roles here. The facilitator, gatekeeper, scribe, and timekeeper. Who is going to be the facilitator for this? Anyone, anyone? Where did the cards go? They're there. I mean, well, OK, I'll be the facilitator for this one, but that's going to leave the other three here. All right, so there we go. And this is basically what we're doing, what we're hoping to get out of this is to have a charter for the working area for an ongoing activity well past the developer summit. So there's, and what this area is about is the content format working area, which we'll go over here. You'll see there's the content format working area. And this is basically about the data, the format that we store our data in. So this can be not just about WikiText versus some other more structured format, but it can also be about some of the implications for how we structure the data that we back up, the fact that we put it into a database rather than we put it in flat text files, or that sort of thing. So that's really what the area is about. But then there's a whole bunch of questions around that that we could potentially be doing, that we could potentially be talking about. And so that's going to be a goal is to, when we're done, we'll be able to actually have more of a charter for this area to describe this. But before we get into that, one of the things that we want to do is just make sure that we, to give people an opportunity to highlight the RFCs, the sessions in these areas. And so, one of the very, let me see, or do I need to bump up the font on this, don't I? Whoops, that was not smart. Okay, so, and by the way, you should, this link should be available, the link to the etherpad that we're on right now, if you want to go in and cause all sorts of trouble, you can go in and edit what's up on the screen right now. So the things that are already scheduled so that we don't need to belabor these points. So today, right after this meeting, we have separating info boxes and nav boxes from article content. That's a conversation that Scott and Hu are leading tomorrow, or today, tomorrow at 2 p.m. Gosh, I got these a little bit out of order. Tomorrow at 11.30 a.m. is the language converter, and then at 2 p.m. is semantic image styles. So we've got a whole bunch of stuff to talk about there. And then additionally, there's some follow up that we can do from the meeting earlier today about next generation content loading and routing in practice. So section tags for media wiki sections and the balance templates stuff. So does anybody want to say anything more about these areas? Like, for example, Scott, do you just want to briefly touch on these? The section, for sure. I'll give a brief summary, starting, I guess, with the section tag. So the simple description is we'd like better semantic markup for sections. Right now, there's sort of implicit stuff that goes on with heading tags. There are some problems with this because headings in articles are kind of misused. No, alternatively used in lots of interesting ways. So just blindly slapping sections on there tends to break pages. So there's some technical details there. I got the impression that most people just want this to happen and want us to figure out the technical details. So if that's the case, you don't need to sign up. But if you would have urgent discussion stuff that you want to have signed up at the bottom of the etherpad. Balance templates, although templates right now don't need to close their HTML tags. In some templates that's desired, in some templates that's accidental, we'd like to add markup where authors could proactively indicate that their templates should be balanced. And if you do that, we will render them lots faster because when you make an edit to that template, we can just cut out that section of HTML in our cache and replace it with a new version. And we don't have to completely re-render the page. So again, there's lots of interesting technical details there, and if you're interested in discussing that sign up, I'll toss it over to Daniel. I was going to toss it over for multi-content revisions. But why don't you do that at the end? Visual templates. Visual editor doesn't let you actually edit template markup, like the code that you used to implement a template. That's a proposal for doing that. Here, doc, arguments for templates. Some of the things that we'd like to do with templates are hard because you have to take this big chunk of text and stick it inside an argument to a template. So this is one of the reasons, for example, the site template is at a point, and you can't easily mark the section of the article that corresponds to that citation. You just have to stick at the end. If you try to stick all the content that that citation actually refers to, you get these weird wiki text problems with escaping and whatnot. So here, doc, arguments to templates is a way to sort of try to work around that. I think this is the next one is Gabriel's. Do you want to talk about parts like HTML for page views, too? I think it's a bit premature to talk about that, actually. But I wanted to add one thing to the section tags for media wiki sections thing. There's actually an implementation on the task. Please have a look. And the question is basically, how should this behave in edge cases? And there's a couple of options that are described on the task. So please weigh in right there if you have time. I'll say part of the HTML for page views. Part of that is talking about adding more semantic markup to our articles because part of it, in some cases, differs from the PHP markup, mostly in the case of actually adding proper figure tags around things. So that's sort of a long-term plan for there. Just in case anyone's not aware that section tags for media wiki sections, that's about wrapping a section in a section element, the whole section that you would edit on the page in the output would have a section element around it, which means that sections have to be balanced in the HTML tree. OK. So I've been trying to get a conversation going on multi-content revisions. The idea behind this is that for every page, there doesn't necessarily have to be one content stream. But you can have multiple content streams. So you have multiple content objects for each revision. That basically means that you can have wiki text side by side with structured information like media metadata or just a list of categories or maybe a quality assessment of the page as structured data, versioned alongside the wiki text. You don't have to embed anything in the wiki text to have it versioned alongside and rendered in the same place and addressable via the same URL. So that's the basic idea behind multi-content revisions. I think it would be very useful for graphs, maps. Well, image metadata is one of the primary use cases, actually. I think quality assessments is another good use case. Maybe we can have a session about that on the third day. I know if any one of you is interested, just add your name. Just in this etherpad or should we have separate section or etherpad for that? I mean, just put a section at the bottom for uncompressed. OK. We can do that. Right. Just to see what he's talking about here. So there's the add your name right there where somebody is typing right now. OK. So yeah, that is not really about the format of the content in terms of syntax. It's more about how revisions are stored and managed. And also what our conceptual model of revisions is. Yeah. By the way, it looks like there's wonderful healthy participation here. But there is also a number of anonymous editors on this. So please click on the little people icon. And then when you see your name at the top, you can put your name in. OK. So we can talk about any of these RFCs now. So if somebody wants to step up to the microphone and speak about any one of these RFCs, we can speak about it. Or we can jump right in to the charter for the working area. So I don't see anybody lined up at the microphone. So I'm going to assume that if you haven't stepped up to the microphone that you're OK with the moving past the RFCs and talking about the working area. So one of the RFCs, the second to last, is actually kind of a high level overview of, and I think not very relevant to this discussion. So maybe it's worth looking at that. 99088, is that the one you're talking about? Yes. It kind of motivates the need for more structured data and some of the use cases that I think should drive the development of our content structure. So yeah, Tim, maybe you can speak a little bit about it. Yeah, so what this area is about is partly about Wikitext, about the future of Wikitext, and about semantic data storage. And what things should be semantic and how we should store that, how we should sort of, how we can support editing in a wider variety of ways. We've got a whole lot of new applications for Wikitext, and we're going to be talking about some of those. So some of the questions are whether we should put all of the content into JSON blobs inside tags and inside Wikitext. It seems inelegant to do that, and yet that seems to be standard practice at the moment. If you split it out, where should it be split out to? Should we have a single timeline of a history of an article or should it be branched? Should, how is a revision represented in storage? Daniel's ideas of having multiple content blobs associated with each revision aims to sort of abstract that a bit. Yeah, so that's kind of where we're heading with this working area. I don't know if I think I'm... Yeah, I guess maybe, Gabriel, maybe you can say a few more words about the, like, do you see the 99088 as kind of a charter for this area or do you see it as more specific? But I think it's a part of the things that I hope to get out of content format working group. It might not cover everything, but I think a part of it could possibly. So a lot of the things that Tim mentioned, I think, also, there's a lot of overlap. And what are the big sort of contentious questions in this particular, like, I mean, there's obviously some aspects of this that everybody agrees on. What are the things that are a matter of debate for debate? Well, I think there's a lot of details to be solved, like there's concrete use cases that people have right now to identify earlier. It was brought up identify info boxes, NAF boxes, sections. There's added use cases that don't have a solution yet, back unbalanced templates. Performance, Parsuite, for example, the Parsuite team would like to have incremental parsing where they don't reparse the entire page every time something minor was touched. So there's a whole gamut of things, but most of them are mostly technical problems, I guess. The contentious part is more where, I think, in the longer-term trajectory. Like, if you follow this step by step by solving concrete problems, where do we eventually end up? And what are the implications of that? Yeah, I think on the longer term, it starts to get more contentious. There's not really a consensus on a lot of big topics. Like, for example, what's the future of Wikitext? Is Wikitext a legacy format, or is Wikitext going to be a thing going forward? Which is pretty fundamental. Yeah, so I kind of think that we need an edit source button sort of going forward. You know, you don't see many, like, really serious, you know, I guess there are, you know, pure whizzy-wig editors around that are used, but on the web, you know, you see a lot of, you know, something like WordPress or Drupal or whatever has always had an edit source mode. And the question is, what do you want to see when you click edit source? Well, probably not HTML, probably not raw HTML as presented to the user, anyway. So, yeah, I'm sort of of the opinion that Wikitext should stay around in some form, I guess. Yeah, one thing that I would like to say in this context is, yeah, maybe, like Gabriel said, in some areas there's no real agreement where we are really headed. So, maybe it's best if we discuss our conceptual ideas, the conceptual model of what we actually mean by content, what is content, right? And also, what is a revision? What is transclusion? It seems obvious on the first look, but the edge cases get increasingly interesting. I mean, one very interesting thing, of course, is how does transclusion mix with revisions, right? Wouldn't it be cool if we could just go back to some revision and look at it and see it as it was, kind of like an away-back machine. But that would kind of imply that if someone edits a template that is visible in the current revision, the current revision wouldn't change, right? We freeze it. So, how do these two things, these two competing wishes, actually, how do they combine? So, what's our idea for that? How should this work? That, of course, plays into caching, into change propagation, into how APIs are structured. So, this is a really interesting, interesting topic with lots of implications. Yeah, and the user probably expects to see the article as it was when they go back in time in the history, right? That's probably the user's expectation. Yeah, if you go to the history of the main page, that's not a very satisfying experience right now, is it? When you are having that, it's sort of, we... When everybody comes to speak, can you introduce yourself with the microphone? Oh, yes, please. Rob Lamper, by the way, Rob Lamper. Yes, I'm Joseph Alamondu, working in the analytics team. One thought I have around the content per se is that there are very different areas of usage of content from performance, as said, semantics, the representation, et cetera, and that I would really separate the concerns and maybe the projects around the various aspects of each of them, because some of them might be overlapping, some of them might not, and the goals for each of them might actually be fairly different, I think. Depending on... If you want to increase performance, the changes you might want to do might not be completely overlapping, but if you want to increase semantic interpretation over some bits of the content. I'm sorry, what was the second thing you said? You might not be completely overlapping goals if you want to increase performance, as if you want to have better semantic over the content. Ah, the session. Example. Semantic storage and performance are not necessarily mutually exclusive, since currently having everything in templates which are interpreted by lower modules, which is not very... The performance of that is really challenging, actually. On Wikipedia it's not so bad, really, but if you look at Wixionary, which is every page on Wixionary is just a big mass of templates, right? And it's trying to express semantic data, which is... A dictionary is sort of the most obvious use case of semantic data, right? SGML sort of came out of the Oxford Dictionary project. We should really be doing something better there. Yeah, to the performance point, we already separate semantic data from the main content. So for example, partially it has very rich metadata that it uses to make the conversion back to Wikitext as loss... Well, as accurate as possible, and that is already stored separately, and in the next step we will store other semantic information about template parameters and so on in a separate format as well. And we'll have IDs to reference back and forth, basically. So it is possible to kind of satisfy both, but it is definitely something we have to think about and keep in mind while we design the semantic stuff so we don't overload and bloat the main content. Okay, it's a little quiet. So I thought I'd read a little bit from a comment here where I threw some bombs and see if that makes people discuss this with more passion. So here's my... Someone edited a version of this. So it's been 20 or so years since Wikitext was invented, and it's off the modern mainstream. If you look at new programming language projects, they tend to use Markdown or some other thing like that. Even Lula, which we use for our template engine, is not on the modern mainstream. And so we force all our users to climb these barriers to entry before they can contribute to learn all these new things which they don't use in any other projects. So here's two ways to re-sync with modern practice, which hopefully will be somewhat... Well, anyway. So first, HTML-only wikis. So some of the motivation here in restructuring core and rethinking about what an article means is to decouple Wikitext from MediaWiki so that we have content in MediaWiki which could be in a variety of different formats. Info boxes are not best expressed in Wikitext. Info boxes are best expressed in some other form, which is more natural to them. And maybe even articles are not necessarily best expressed in Wikitext. The idea is long-term for a new wiki starting up. They could have a visual editor-only experience and have an HTML-only wiki where all the content is natively stored in HTML. So not everyone will want this, but the idea is if we start thinking about untangling the many ways that Wikitext is in core, we can start thinking about new ways. Maybe someone's going to come up with a Markdown wiki. Maybe that's the way that people want to do it. Right now we don't have that ability to use Wikitext. Similarly, we tied to Lua in Scribunto. Some of the mechanism for allowing multiple programming languages is there, but it's not really there. When we chose Lua over JavaScript, it was based on a snapshot of JavaScript implementations at that point, which isn't relevant anymore. We could add JavaScript support to Scribunto. Again, not everyone will use that. Maybe it won't ever get turned on in English Wikipedia, but we could start broadening our mind about what an article or what a template is. I've got a couple more suggestions which aren't relevant to this particular session. The other one is Polygot Wikimedia. One of our big selling points and the thing that I fell in love with with the project was that we supported all the world's languages. Unfortunately, that's not really the case anymore. Development on language-specific features has pretty much stalled. There are limitations to language converter, which haven't been fixed. The visual editor doesn't support it at all. Even the visual editor doesn't support it at all. If we need to take a step back and think about what our language support really means, I'd like to challenge us to reimagine what, if we're going to make big steps, really embracing multilingual content in our Wiki's means. One example, fine-grained content tagging, which WikiTicks doesn't let you do, but in HTML, for example, I can associate an ID with every tag to effectively every paragraph of the thing. Into something else, I can track that over time and remember that this is the translation of that paragraph, and if someone else makes changes elsewhere, I'm not losing the information about what that translation was. I'd love to see an easily accessible split-screen view like mainstreaming content translation. Just because I'm editing on English Wikipedia doesn't mean I'm not also fluent in Spanish. I'd like to be able to look at the Spanish version of the article, with the English version, and see if there's stuff that I can contribute both ways. Right now, we've got a silo for each individual Wiki, and the content is not shared. Also, same thing for keeping translated sections up to date. This goes for that fine-grained thing. If I grabbed a paragraph from the Spanish version of an article on a city in Spain, and that was later edited, I'd like as an editor to be able to be notified of that, update my translation and things like that. And finally, a sort of a process to migrate. There's a lot of stuff language converter does, which content translation doesn't do right now. There's a whole separate section for this, so I'm not going to really get into it, but we need to start thinking about can we embrace this better and integrate it into our model of what our content is. So we think of our content as not just a fixed set of Wiki text for an article, but perhaps a collection of resources in Infobox. Some mapping between translated versions of this article, a layout information, which again, layout of figures is not best expressed in Wiki text, and really take some steps forward into sort of reimagining what our content could look like. So hopefully that gives people things to yell at me about. Hi. I'm Mauro Trubots, and I have some comments on the development on the format. I think we should ask ourselves who will understand the new format that we are going to develop, because I think for example, we could come up with some semantic versioning for templates and so on and so forth, but would a normal user understand it, and would the standard Wikipedia editor be able to interact with this format, especially with regard to the visual editor where we want to make it easier for users to use this format? For example, when Masamal was invented in 1998, there was a section about content of Masamal where they built up a 200 page manual how to describe the content of the mathematics, and the result was that they had really a beautiful format, but it was somehow too hard for mathematicians to understand, and so we need to make sure that that's a new content format what we will develop is understandable for the final users. I think it's important to separate the user interface, like how do users interact with this content from the format we store things in, or the format we use to process things, because there are very different requirements for those two parts. Right now we use Wikitext for both, it's about primary storage format, but it's also a user interface, and it's okay, but not great on both of those tasks, and most of the discussions we've had recently about more semantic content and interaction that has been in the past with HTML, there is actually a DOM spec for the first time that spells out a link is going to be marked up like this using RDFA. An image exposes all its parameters like this, basically all the things you need to edit and extract semantic information are already specified in there. It's not by no means perfect, but it is used by visual editor and increasingly post-processing, other post-processing tools that leverage all the semantic information that is already in there. So, I think it's important, I'm not sure if there's actually a proposal to have a completely separate format from that, or to basically keep involving that to better address our use cases. I think most of the discussion is actually about this evolving part, rather than finding something completely new. So, other kind of complaint regarding the current VK text as a way to represent the infobox is that, currently I think my name is Jiang, and I'm from Google, and so currently I feel or we feel that most of the time Wikipedia focus on the interaction between human beings and the Wikipedia software where all the content are kind of almost in free text, although with some key value pair structure most of the time, if we try to write some program pass the key value pairs, we are actually facing some free text to give you an example maybe you can try to take a look at two Wikipedia page one is for Google, one is for Microsoft. In the infobox there are key called products but the value are represented totally different from each other and if we really want to write program ask machine to pass the values it's really very, very painful. So from the perspective that we want to make machine to really understand, read the content we do need a uniform format to organize the content and I think Wikipedia data is one of the way, but it's still far away from organize all the different format from the Wikipedia so if we are really looking for the long term perfect solution we do need to think about the underlying format should be also friendly to machine. Can I just ask what is your interest in semantic markup, are you looking at it from a point of view of semantic search or improving rankings or is it for richer search result pages with semantic data right on the page? Before we are talking about ranking the fancy stuff, the first step very, very basic step is to do the semantic extraction. So another example we keep using is that in the Wikitext we have both place and both date are separate key value pairs but when we look at the HTML these two key value pairs are rendered into one label and value so it's really hard for anyone to write a program passing both date and both place so we do need to consider how to make the content at fine-grained granularity and also make it easy for a program to read and to understand. Until we have all these basic semantic paths out we can talk about other stuff like knowledge inference and other stuff. In this vein I think the main thing is making things easier to edit for people making it easier to read for machines that's kind of like the two problems we have Wikipedia is so hard to edit for humans and it's so hard to read for machines the other way around it kind of works If we have new formats that have more semantics we should definitely avoid inventing new complex syntax for human interaction I think we have done this once and once is probably enough we might come up with very simple things like how to define gadgets or lists of categories or whatever in a structured way but I don't think we want to invent yet another Markov language. I am Jack Taylor I wanted to talk a bit about this idea of putting JavaScript in Scribonto so you talked about the idea of Headscom So we're talking about making things standardized and JavaScript is the standard language and I agree with that but I think we might be targeting the wrong people a lot of the time we see people editing templates. The reason they're editing them is because they want to, they're editing articles and they find that something's not rendered the way they want and so they go okay I've got to edit this template What I saw a lot when we had the change to Lua is that people were seeing these templates in Lua and they no longer knew what they were doing so there's one user in particular, Red Rose 64 whenever he sees Lua or she I'm not sure the reaction is well this is no longer in the realm of human comprehension I give up so if we wanted to add JavaScript to that then it would add even more complexity for people who are editors and they just want to write a template or edit a template so we've got to think how all these languages play together I think if you if you can read JavaScript fairly easily then you can probably read Lua without a huge amount of effort I learned JavaScript after I learned Lua a huge leap JavaScript was harder than Lua I think because there's a lot of asynchronous stuff Lua in Scriban too there is a synchronicity in Lua itself but in Scriban too there's not it's just everything is done one step at a time they're pretty similar languages So perhaps you should either take a look at that RFC or sign up in the Strupple because that actually presents a strawman implementation which maybe will make you hit it more but you know at least you can discuss a concrete thing On the point of semantics that Jung brought up I think one we can see it as an evolution I think we started all with templates that are very universal and that freely mix presentational things with semantic information so you have an age you have a color, you have whatever and they don't necessarily agree so every template has its own parameter structure and they are not very uniform across domains necessarily which creates the trouble for you it also creates trouble for us when we want to format things differently or select certain information or a subset for mobile for example if you only want to show a couple the most important things on mobile but the full info box with everything on desktop then we have to know what the most important things are and have to distinguish it from green and red and whatever but the problems are very similar and what we've been talking about or considering is having identifying some of these really common news cases like info boxes and providing basically a canned solution for those that is more like a component and has a defined interface but there's always a connection between flexibility that we give to users and that has enabled all this growth and on the other hand the control that we have in software to reformat things and to extract the meaning and so on so I think that tangent is not going to go away but I could see I think it's very likely that we're going to move towards more optimized solution for specific use cases like info boxes a good example for that is actually Wikia who have already done the leap in that realm where they have complete control over the formatting of info boxes and only ask the users to supply the actual data and that way they have their motivation is primarily controlled for mobile versus desktop but they have worked with the community a lot in the law of support for this approach because it removes work for editors Is there somebody who can hand off scribe efforts too? Great, thanks I think I totally understand what you said that info box is kind of a very hectic almost to everyone and so I think I'm mostly talking about wiki text especially talking about wiki text versus long term perfect solution currently the info box are just described as one example that wiki text handle too complex to handle for the info box part we also met with other difficulties when handling wiki text to give you another example is that there are lots of special templates for example the first title so such kind of stuff is used to manipulate the content in a special way for example iPhone the I is become lowercase title C sharp the language is used correct title with several special parameters all these things are kind of point of view abuse of template to achieve special purpose well if I design the things from scratch I would try to separate these functionalities from the language from the markup language itself other examples like some redirect labels and category labels so all this stuff I think from program engineer point of view we should try to separate these functionalities into different modules different people or different use cases may want different functionalities we should not hook all of them into wiki text which can not a good thing for anyone I think I get the sense that you are looking for more domain semantics I mean there are two kinds of semantics one is semantics of wiki text something is a template, something is a link image but from what you are talking I am getting the sense you are looking for semantics that are about what a particular template means as opposed to that it is just a template and that is not something we have that information programmatically by just looking at a template name that is not information that is available in wiki text representation even so am I interpreting you right I am trying to say that if we reconsider about all these things maybe we should not use one technology called wiki text to implement all these stuff maybe we have different functionalities for example if we want to allow user to specify the canonical title for iPhone we should give them an input box explicitly and then ask them to input that instead of using a lowercase title template to implement similar things that is a very good example for your use case for the multi content revisions because we need to store that somewhere and it should not be in wiki text but it should show up in the history so that is a good example for that I just wanted to circle back real quick and just ask a question about at the beginning of the session there was kind of this low level and high level approach where we are going to think about both the concrete use cases we are trying to solve for now and then maybe try to project where those might end us up and try to figure out which tradeoffs are best I was wondering if we could get a quick recap I am just losing track of what the use cases are we are trying to maybe problem solve and what are the tradeoffs we might be needing to compromise on at this point well I can um like what I am doing now and then leave it to Tim or whoever else wants to step in to help guide us to what we should be doing what I am doing now is trying to capture many of the questions that we can eventually turn into more of a charter for this working area and we can decide like one of the things we could do with large list of questions maybe now we can prioritize this list of questions and that would be an exercise after this session or whatever once we have all of the questions down but I am pretty flexible in terms of how we use this time yeah Tim do you have any further thoughts yeah well about the use cases that we are talking about um part of the use cases that are driving this are um partly things that are around already but are implemented in a kind of hackish non-machine readable way like info boxes being the number one example now we can possibly integrate info boxes with wiki data that might be a nice thing to do um in wikipedia also there is things like climate data tables data tables of population over time there is in wikipedia we have maps that are colored by some property like maps of GDP per population around the world Yuri has been doing a whole lot of work on that on generic sort of semantic information being fed into an image into a graph image which is just displayed for the user maybe you can then animate that that would be really nice then how do you edit that well okay that's not an easy thing to solve and that's one example of a use case you know also here we have language converter which is involves its own special variant of wiki text to annotate differences between we have like close pair translations like simplified Chinese and traditional Chinese and they use these special markup to annotate the differences where the machine doesn't correctly translate between them um yeah so uh and um yeah there's also mobile friendliness um that's probably a big part of where we're going with this um for example images at the moment in wiki text are specified by the number of pixels in width and whether they float to the left or right um you know the default well pretty much most images on wikipedia float to the right and have a width of 300 pixels or something which is not really appropriate for mobile where anything over 200 pixel width or so may as well just fill up the whole screen and not be floating at all um so um yeah I guess that comes under the heading of responsive layout um currently we don't allow users to specify style sheets except on a whole site level we have like a whole the whole of the english wikipedia has a style sheet which administrators can have it um but in articles themselves you can only edit style attributes which is uh really not nice way to do it especially uh it hard to edit and makes the html very large um yeah so those are some of the use cases that we're solving thanks as a timekeeper um I think there are about 30 minutes left my question is um as to do with attention I suppose between uh the granularity of wiki text and that usefulness for uh translation for machine reading um and uh perhaps the um editability of um wikipedia media wiki wiki data and this may be a kind of out of the box question but um would it be possible to uh reduce um templates or code templates such that every part of them um was a q item in wiki data so that the granularity in certain ways for some media wiki interfaces or use cases became actual q item um references such that the interlinguality of wiki data is 300 languages would be potentially uh there as well and the granularity would be really fine I mean a q item could refer to even a letter or um a pronunciation of a letter is that patently absurd or is there something workable in that kernel for granularity for both machine learning uh reading and also um the potential for editability uh can you explain again what exactly should map to q items like template titles or parameters or what exactly many aspects of templates um could you make the q item a basic unit for um some aspects of media wiki development to get the granularity for uh human editors but also machine learning so just as a straw man for that that that t11445 for visual templates thing maybe I I mis-titled that but that has a straw man implementation where one of the parameters to a template is just a wiki data query right so most of the parameters from that template come from wiki data it can be shared among all sorts of wikis right yeah well maybe one of the parameters is a query in wiki data which gives you the proper name for that item right I yeah I anyway I I I mean I am interested in your question that's my best approach at it um I don't think it's a complete solution but I'd love to talk to people more about that I'm a follow up just you know for content translation for the accuracy of um sort of translation beyond the current machine learning um 50% or so translation sentences aren't complete in probably media wiki content translation they're not complete in google translator um granularizing to the q item level um as a way of conceiving this problem I don't think I'm expressing this in coding language would be something I'd be interested in yeah I'm probably going to talk past you but another benefit to fine grain tagging up translations in articles is that we can then use that to train the translation engine so modern translation engines work best when they've got lots of data so if we can give them all revisions of all articles and everything that's ever been translated as input will have a translation engine which gets better by itself um we can't really do that right now because we don't have that fine grain correspondence that this was a translation of this yeah there's some issues with that but so by the way I'm uh I want to reserve the last five minutes to talk about next steps but um I believe you're next and or you next okay so either way whichever ones whichever you want to go first hi I'm Yanan from google I want to discuss with everybody the gap between human friendly and machine friendly as we know um when Wikipedia appears it is designed for people to contribute their knowledge to the Wikipedia so talking about human different people have different understanding even if we give them a standard of how we get text should be as time goes by there are many many useful contents on Wikipedia already and now we want to get some semantics some machine friendly knowledge from the Wikipedia contents so for our developers there is a pinpoint which is that when people express their knowledge in different format different ways we have to translate them into a uniform unified form this is really difficult because there are no unified ways for many cases we have to do them one by one case by case so I'm thinking is there a way that we can do this together I mean that if we do this separately we have to translate the the human the human knowledge to machine knowledge one by one but I think as the machine needs to precise the uniform data maybe we can share our result of the unified unified content that's my point I want to know everybody's ongoing effort about this sorry for cutting in front again there was an idea that came up during a conversation a couple of days ago that was basically having a natural language editor that kind of acts like an IDE in that it would make auto suggestions for work like auto completion and it would let you pick a meaning for the word you just entered or for the phrase for nouns we could already do that with wiki data but for other parts of a sentence structure maybe for function words well there are just limited numbers so if they're ambiguous you can also let people select once we have a machine readable dictionary we can do it for more things people would actually while typing in a sentence actually annotate that sentence with the meanings that would be an extremely useful resource I think both for direct translation and interpretation and machine readability as well as for training machine translation just an idea that came up I think it was Jan who mentioned it I'm not sure I think Tim's use case of data tables is also very relevant because for that we want to make it easier to edit this crazy transformations like date converted to age and so on we would actually need to have the original semantics to make it easier to edit because we want to ideally let people click in the table and change the birthday rather than the age I guess we still need the editor for a long term for a long time maybe for the translation to human knowledge to machine knowledge the translation may need to take a few years I guess what's your opinion yeah existing content definitely what we first would need to provide some structure to to address these use cases in a more convenient way that both helps editors so they're incentivized to actually use it and are happy to use it so we don't set it up as something the semantics people want to do and on the other hand also captures the semantics so we can make editing easy and extraction easy okay thank you we just mentioned mobile friendliness I just wanted to bring up a couple of use cases with that explicitly and also another use case that I don't think has been mentioned yet but has been brought up in a couple of different product discussions in the reading team just for mobile I think whatever content format we need just needs to be in line for page composition in general a lot of the workarounds that we've implemented to come up with the mobile presentation at least in the IOS and for the most part Android apps as well is rearranging stuff on the page being able to get specific image resolutions that we need for that specific device that's requesting the page and of course being able to support cashability but the concern that was brought up at the very beginning of the talk was being able to go back and see a specific revision with all of its various parts at their snapshot revision and also being able to get the most recent version of the retranscluded article after a specific component has changed seems like that needs to be solved in order for us to have these composable components the other use case that was mentioned from a product standpoint is being able to highlight and comment on specific ranges of like text actually within the article and how the complication that therein arises is being able to map that across revisions and if we had more granular section components we would only need to validate the selection ranges for a specific revision if that section had changed So do we have strawmen that we're looking to propose what are the next steps here? I think we have a lot of interesting ideas so maybe we're coming to the end of the session and I can talk about what the next steps are so I think there's a whole bunch of ideas here that are basically I tried my best to capture them in question form and then the more detailed minutes you can go and read like what people said I think the next thing is to take this sort of boiled down version of questions and decide which are the important questions which ones aren't but that's something we can do after this session I think because I suspect we're not going to get it done in the next five minutes One thing I'd like to add though is to think about how Wikipedia content might be viewed and edited from human interface devices aren't text-based that are voice-based or otherwise non-existent so how do we decouple the presentation layer from essentially the storage layer to some extent the bulk of these issues are because you're storing a document and it's coupled with presentations specific things logic specific things when in reality it's more structured than it should be decoupling some of that might be the way forward to future-proofing different form factors even in languages and other characteristics as well I'll even go further than that currently our encyclopedia is pretty much text only we have some pictures, we have very little media what if the article on the moon included a video walkthrough of the moon or something VR environment of the moon how can we incorporate the different types of content we might want in the future into an article is an article just remixing in my video walkthrough the first paragraph of the article there's a section of my video that corresponds to it can I express that correspondence can I watch halfway through the video and then I've got to get on the bus or something like we don't really have any way to express a true multimedia encyclopedia we just have ways of annotating text which by the way Yuri has a proposal vision for Wikipedia he did a lightning talk on it a little bit ago that's I think that gets to that point that Scott's making and it goes both ways too there's not just embedding the VR contents embedding the knowledge within an article into a VR scene think of a document an article is essentially a giant data structure with components that are metadata that are in there for some reason components that are data and components that are presentation specific if you think about this as something that could be transformed and decoupled from the presentation layer entirely that could be the most beneficial way to look forward I think a lot of this is already underway and there's concrete use cases that actually inform the designs and I think that's a good way to develop this rather than try to come up with a brand new grant design that tries to cover it all I think it's really healthy to iteratively improve the spec at the markup that is needed try to extract semantic information about the use cases so I don't really think that this committee or working group or whatever should come up with one grant plan I think it's good to focus on the concrete use cases that we have right now with a night towards the longer term direction but I think there's very concrete things to be solved right now I wanted to elaborate on one of the ideas that have been presented revolve around being able to annotate sentences and paragraphs so this revolves around finding the revision that introduced the sentence finding the revision history of a particular paragraph being able to associate multimedia with particular sentences and sections all of that all that sort of requires a structure that is richer than the typical text that we right now in wiki text and I think a lot of the pushback from introducing such a structure is because how this would affect editors of wiki text and so one idea that might be interesting to throw around here is what if our wiki text editing would be abstracted similar to how visual editors abstract it where you don't really see directly what you're editing like basically what you want is a visual editor that shows your wiki text and just like with wiki text there are attributes and wrapper elements that you're not seeing that keep track of this for you so it would basically mean that there would be wraps around your paragraphs that you don't see when you're editing the wiki text so before we go on we're just about to run out of time here so I think we can do now so I think one of the actions out of this is not we've got a good list of questions here we've had a great conversation today I think the next steps are to take the list of questions flesh this out see if there's a proposal see which ones are the most important most urgent for us to solve now versus which ones like maybe are a little more abstract and are more down the road and there might be some that we're not even that we didn't even touch on in this list I'm not sure like so brainstorming more on what if this list is complete and coming up with the prioritized yeah this is the question we should focus on now and this is the one we should focus on three months or six months or whatever from now but does that make sense as the next step okay great well I think this is a wonderful conversation and obviously it's going to matter the follow up is really going to matter here but yeah I think thank you everyone for participating this has been great yes and by all means like as we've been talking about many of these things some of them already have sessions some of them don't have sessions yet but check the top of the top of the etherpad so the question is whether or not we should move the sign up stuff to the top I think we can move it wherever as soon as yeah let's just make sure we don't somebody grab a copy of this before it gets completely mulled but yes alright thank you very much I think we're out of I think this is the end of the time that we're allotted right oh it does say 10 more minutes I didn't mean to rush everybody out so yeah but you get 10 free minutes back if you want to back so what do you all think anybody else have anything else to say yeah I want to ask the question so is it fairly clear that we would that the something that needs to happen is to move to a richer format to store like I don't know RDFA HTML or something like that or is that not clear because that's something that's not clear in my mind besides having wiki text as the representation of the source like visually for editing the storage is it clear that the storage needs to go in that direction I'd say it's not entirely clear you could store all these annotations as character ranges on the wiki text out of band it's not clear that you'd want to do that so it's differentiating like the storage and the source content for it is that something clear? so we have some experiments like flow stores HTML native and when you edit wiki text in flow it gets translated behind your back so I think we're certainly in favor of more experiments of that I think the really interesting thing about Parzoid is not necessarily HTML representation which is all great and all that is one trick translation between formats and so I think we could do more experiments of that do annotations as character ranges of wiki text and preserve them behind the scenes so you're not aware those are there so I think we could do more stuff like that I don't think it's clear I don't know a better representation yet to express this naturally yeah you can imagine it but I'm open to suggestions and by the way the one thing that I do want to weigh in with at some point I don't want to cut if you have some comment to make here you can the other thing is also talking a little bit about the distinction between the content format area that we're talking about now and the content access and apis area which is the session that Gabriel you're leading tomorrow is that right the the 1130 okay yeah so there's the so those are two different areas both of which I hope persist beyond this developer summit the idea with the format is talking about like how what is the thing that you're editing how to address that and then the content access in apis is about like what do we transform that data into what layers do we put around the data to make it to achieve some of our other a lot of our other goals that we have as a concrete example section tags their sections of the page are currently there's a separate api for both loading and storing separate sections that are not represented in the back end we store the entire article together so there can be some mismatch maybe we should change the back end to store sections separately in the future but we don't have to decide both of those at the same time and by the way there's a ton of overlap between these two areas that's definitely was a little bit controversial creating these as separate areas but that's what we're trying to do now is to basically treat those as overlapping like just acknowledging that no division is going to be perfect let's talk about sometimes we'll talk about both of them together sometimes with section tags for example the two are just going to get intermixed there are going to be other cases where we don't have to talk about the format we can just talk about the infrastructure thanks for this fascinating session on content format is there a history or a space for media wiki test projects for a kind of Google X laboratory and in that regard I'm curious whether there might be a way to explore some of these interlingual questions in terms of various ways of sort of making language data more granular a moonwalk scenario even in a kind of online 3D virtual world media wiki format as just one example and I'm just sort of throwing that out there I'm not sure if this is the space to explore that but both wiki media or media wiki X projects as well as content format otherwise construed than wiki text perhaps in a 3D virtual world are my two questions the existing mechanisms we've got labs which is a general purpose playground to sort of play with our data and we have separate wiki projects you can imagine wiki source for example as being a particular example of what's the markup we need for this particular content and I think there's a video project that's new, fairly new too to explore video content so that's traditionally how it's done in the previous client-driven API session I mentioned that one of the reasons I was excited about that was the idea that you could do new UI experiments as well I think the question is good I mean I'd be interested in finding better ways to sort of allow experiments yeah labs is just our our virtual machine provider we provide free virtual machines for wiki media developers and under that we've got a thing called tool labs where we provide a replicated MySQL database which has all of the articles from wikipedia so you can access that you can easily build overlays on top of wikipedia in tool labs and as far as the areas go I mean this is probably the best area but once again this is a great example of where it's not necessarily clear where one area begins and another one ends in some of the one other possible answer that question is the software engineering area where we're just talking about how we're putting it together more than we're and you know looking at it from more of a sort of computer science lens then from then basically from the more common that might be appropriate for the sort of google A B testing stuff where some of their experiments they sort of transparently redirect half their users to one version of a site and half the versions to another so if you're doing like really fine-grained experiments like that and you need to redirect to a site and then you can go when they hit the URL that's that working area right? Yeah so yeah I mean it's an interesting concept I wanted to ask about if there's been any thoughts represented here about native apps and the kind of semantic content because it's all pretty focused on the web but these platforms have a lot of users and use cases that make it too much to the back and makes them those platforms unable to represent content like more richer media content that we would like to show to Yeah I think the long-term solution is to come up with better ways to define the core media wiki experience so it can contain a lot of these other pieces I don't think we have good answers for that but that's what the software engineering thing is about is our core software what you need to do stuff I think the component conversation that we had earlier also is very relevant to that because it's also partly about formatting media differently on different devices and having all the information in there to support that To me the software engineering discussion is going to be mainly about how to implement these things if you want to have a conversation about what the concepts are so what wiki is, what revisions are, what content is this session here and this working area here is probably the best venue Yeah Yeah Oh Hi, sorry It's Jack Taylor again I was I really like the idea of making wiki text more semantic and separating the content from the presentation I think this is going to be maybe very hard to do a lot of time it's not just templates that we have putting this content in articles the actual tables and things are right there in the wiki text itself so would we be able to make this transparent to editors or would we try and get people to change the actual wiki text or to something else that makes it easier to make the content semantic and how would we approach that would we ask them to do it gradually or all at once or how would we go about doing that Really quickly Andrew Green from Fundraising Tech I don't know if this was mentioned before to concentrate fully but just to put this on the table if it hasn't been mentioned yet as something for a long-term research area if we're talking about semantic structure data as one of the formats in which content for articles is stored is there I think we should think about natural language generation so not just creating tables very schematic or formal structures for people to read on the basis of structured data but actual pros and I want to also Sorry what? Yeah for example for wikis which don't have a large article on a topic but a large amount of structured data maybe some of that could be expressed in pros you know I heard questions about this topic maybe from colleagues here from Google I think understood so I want to ask maybe afterwards we could talk to hear what kind of projects concretely they're interested in and maybe if other entities are looking at other corporations other people are looking at how to structure data or get structured data from wiki projects how could they also contribute back so maybe Google might be working on natural language generation based on structured data which is a huge project how can that is that an open project is that something that we could also integrate with in open free content and free software kind of way Thank you very much sorry I just wanted to quickly mention We've only got 20 minutes for our coffee Alright so 20 minutes and then the next one in this is about the wikimedia Yes we designated the parallel session as a content format breakout session Alright thank you