 Hi, my name is Chris Hiller. I work for Oracle. I work on the Zorba X query processor, which I just wanted to show up a little bit later. So, go ahead and get started. What I'm going to talk about today. So I'm going to tell you who, what, where, when, how, and why of JSONic. What I'm not going to be showing you, even though the talk is titled Implementing JSONic. I'm not going to be showing you code. It's not that level of detail we're going to be going into. But before we get started, I'm going to have a quick survey. So, who's here? First of all, who's here at Jonathan's talk, just in this previous room? Most of you. Okay, so this is going to, some of this is definitely going to be a little bit of a review. I'm going to have a slightly different focus. And we'll go into a little bit more detail about some of the things that you talked about. Who uses X at mouth? Most of you. Awesome. Who uses JSON? Also, most of you. Awesome. Who uses X query? A lot of you. That's good. All right. Who works on the implementation of an X query engine? Yeah. Yeah, now I know who it is. Anybody else? Got anybody from Mark Lodzik here? Who works on the implementation of some other kind of query engine? No SQL database or some kind of couple. So, you guys are definitely the main focus. My main audience today is to the implementers. What we really love is to see JSONic added to other X query engines, and especially to no SQL databases. But I'll give enough of a taste of JSONic, so I'll encourage all of you who use JSON to download and sort of give it a try. So, real quick overview. And again, this is going to be a refresher, so I'll go as fast as I can. JSONic. It's query language creation data. Not too surprising there. So, it's everything that a fully featured query language should and then some. It's got a lot of features. We're going to demonstrate a few of those as we go. And it was defined by, as an extension, to X query 3.0. It's also defined by several of the same people who wrote 3.0. So, why would you do that? Why would you start from X query? Well, X query has a long history. It's over a decade of design developments, put into it. Some of the brightest minds in the data and query worlds have contributed to it to make it what it is today. It's robust. It's flexible. Has power for query, joins, transformations. It's got extensions for updates, scripting, full text queries. It's also a lot of implementations out there already. Over 60 commercial and open source implementations are known to exist in many languages, many different environments. So, and it turns out, designing good query language is actually pretty hard to do. So, why reinvent the wheel? So, I'll flip over to my first little demo to show you a little bit about what we can do. This is actually going to be kind of hard. I've got a mess here, and so this is live data here. What we've just done here is I hit Twitter and I asked for the last 100 tweets that contain the word noSQL now. This may look familiar if any of you happen to have seen the TSS talk about 20 Demsack earlier this morning. And I've done a lot of filtering here. I get back the results of that, of that and then I get the text from each one of them. I go through tokenizing into strings using full text querying. Then I eliminate all words that are stop words and all that kind of thing that we're not as interested in. I put them in the lower case. I strip the diacritics so that if somebody happened to have tweeted in Spanish or had a different accent, we don't care so much about that. I get a call about those words, get a count, order them by that count, and here the result is a list of all the words that people who have tweeted about noSQL now have also tweeted about. This gives us some interesting information. Apparently William and Knight is fairly popular among tweeters that are among... MongoDB, Keynote, Jsonic. Hey, look at that. A lot of good information here. In fact, kind of too much information in there. What happens if I do this? There are things that are actually more than nine, so I just limited it right down. You might see also here that I have taken Json input and I've produced XML as output, but I don't have to do that. I could just as easily have a slight variation on this same query that instead produces a single output, which is all of that same information as a JSON object where I just put the key is the word that was a question and the value is how many times it was tweeted about. Now you can see I've taken out the order by class because JSON objects aren't orders, so there's no need for that. So this demo was run on Xorva, which is an open source Xquery engine. So let's talk a little bit about what Xorva is. Open source, cross-platform Xquery engine. Here's the URL. Please feel free to download it, give it a try. I've developed a flower foundation with support from Oracle and the person myself and some others as well as 28NT and implements the full catastrophe of the XML and Xquery specifications. Query and update, full text scripting. There's also a ton of additional modules for all sorts of different things. You saw we already used the ancient P-modules to download things from the web there. And this is the validation of JSONic. We added JSON support to Xquery in about a month. This is a team of three, maybe four people working on it. So the take away if there is one from here is that given Xquery, JSONic is not that hard. So why should you care about JSONic? Well, if you use JSON not too surprisingly, it's ready to use today. You can do some good query. It's not bound to any particular vendor. If you use Xquery though then now you have access to a whole new world of JSON data. And again the people who really want to address people who are implementing a JSON based NoSQL engine give you users more control. So let's begin. I'm going to go into some of the detail about how it is implemented. And I don't expect people to read this slide as is unless you really love language grammars. But the take away here is the great bit in bright green over there which is this is really not that hard. We've got four productions here and that added direct constructors. What are direct constructors? Well, here's an example. So this is a query. That's the result. The query looks a lot like JSON and it produces something that is in fact JSON. So it looks almost exactly like JSON. If you're familiar with Xquery you know that you can also throw up XML element constructors which look a lot like XML. But the more interesting stuff comes when it's not literal. Because everything, every value in there can effect be an expression. So there's a very simple expression 1, 2, 10 which says give me the values from 1 to 10. Throw it into an array. So this is the composability of JSON. It says that anywhere you have a value you can have an expression build up your query from that. And just to complete the list, that's how you construct an object. So this is actually just a refresher of the stuff I just showed. If you don't have literals you have composability. Alright, so we've seen the syntax for defining for constructing an item. But of course the set of data that Xquery was designed to query is XML, not JSON. So the Xquery has a data model which is defined by XML. So JSONic we extended the data model with two new JSON types which are called structured items. Now in Xquery you define non-atomic types in terms of properties and accessors. The properties for an object is just a simple set of key value pairs and the accessors are all you might expect. Give me the list of keys and the value for a key. Array is an ordered list of members which are also all items. The accessors are a size and a value for a particular key. So the take away here, again the white-green, is that this is easy. These are very simple concepts. They're much simpler than many of the concepts that are already there in Xquery. An XML element is probably the closest thing to a JSON object and it's got a huge amount more baggage associated with it. Don't need to worry about any of that. An array is something like what is called sequence in Xquery. An array is much more straightforward. This is straightforward and easy to implement. This slide is a little off the beaten path so feel free to fade out for a second but I thought I'd mention it. One of the cool things about JSONic is that now you have JSON objects and values in your query language and having access to those structured items is actually very valuable. XML doesn't have any corollary to a map or an array and these are obviously things that are commonly used in almost any programming concepts. So JSON arrays and objects fill that void. They're more programmer friendly and if you fall back to using XML to that, you're actually making the query process work a lot harder. One of the reasons for that is that one of the reasons it is much easier to work with JSON items is that they don't expose their identity. What does that mean? It means the compiler has the right to reuse those objects. It doesn't have to worry about the ordering of them and it's just easier to implement. This is actually telling you that it's valuable to have these things even if your use case doesn't necessarily focus on JSON. This opens up a lot of interesting programming options even for straight XML Xquery. As a side note, JSON items do have an identity. It's just what they're only used in the context of an update which we'll talk about a little bit. The Xquery model is a very simple one which most of you I'm sure are very familiar with. You've got objects and arrays and then you've got a few atomic simple types. You've got numerics, you've got true-false, you've got null. Now Xquery already supports most of these types out of the box. So JSON simply reuses them which means there's no implementation at all. So for strings, we use access string. For true and false, we use access Boolean. For numerics, we use either integer, double, or decimal depending on the exact form with the number involved. The only thing we need to add is that one over there, jn null, which is one more thing you've got to add to your type system that is the simplest possible type you can imagine because it has exactly one value, null. So it's a singleton. You don't really have to do much of that. Brief aside, all of these other types are now available to you if you're working in JSONic with full catastrophes of XML. That JSONic doesn't limit the things that can be stored in an object or an array to be the things that are in the JSON data model. You feel free to throw in date times. Do things like compute dates, compute date deltas and all that. There's functions already built into Xquery to do that for you. Full disclosure, the serialization of those things is not entirely determined yet, which is why I don't have a demo showing exactly how that works, but internally it works just great. And the other thing I want to take away from this is that I've just covered the entirety of the JSONic data model in two slides. So we've seen how we declare JSON data using direct instructions. We've seen how the JSONic engine stores that data in the data model. Now how do we get the values back in our query? Now this here is the only semantic extension that JSONic really applies to the language. So in Xquery 3.0 we have a concept of function items where an item may be a function. Item represented by this variable, for example $L, may be a function and you can call it using the syntax by throwing parenthesis out of it, basically like a function pointer in C. So that syntax isn't defined for any other item, so we reused it for objects and arrays. So let me show you kind of a wacky, but kind of entertaining example of that. Just to keep things interesting. Where's my cursor? It asks. So it's another variation of the Twitter query, but I've done something a little bit different with it. So here I filtered out all the tweets that have geotagging. And I've gotten the coordinates of that and I've thrown those coordinates off the Google Maps. And I asked it where the nearest Chinese restaurant in person understood that tweet. And then I constructed a new item. Oh, I'm sorry. Doing that I composed those bits and pieces. This is using the accessor syntax that we just talked about. Got the first and then second coordinates. Set it off the Google Maps. And then I create a new object down here where I'm using one result, using the accessor from the original Twitter result. And then two bits of the result I got from Google Maps. And here it is. So it looks like Mr. T. Howard 37 here has been tweeting a little bit with his GPS on. And it looks like he's actually moving at the time, because the closest Chinese restaurant has changed during that time. So the specification on the jsonic specification defines the semantics of this, strictly in terms of dynamics. So in terms of dynamic indication, which is the behavior is determined runtime. However, Zorman we actually have static type checking. This is an optimization we can provide, which means that we can determine statically, or at least we can determine statically that a particular variable is an object in our array. And we can create the query execution plan that says go ahead and get those objects out right away. Alright. Actually moving faster than I expected, so I'll slow down a little bit. So briefly here, Xquery has the ability to update. So it is jsonic. So this means that you can not only query data and do things with it and create new data, but you can also manipulate and modify that data in place. And this is particularly good when you're interacting with the REST API, because you can get an object, you can manipulate it, and you can put it right back to the same API. And I have a small demo of this as well, and this is an issue that is near and dear to my heart, which is our bug database on Launchpad. So Launchpad has a bit of an annoyance where we can mark things for a particular milestone and mark them committed, but then when we actually mark that milestone as shipped, it doesn't change all those bugs to say that they're released. So I've written a very small query here, which does that for me here, and I'm hitting the REST API, I believe it was the REST API, this is a JSON-based API from Launchpad to get all the bugs that have our most recent milestone 2.6, which was a week or two ago, that were marked fix committed. And I iterate through all of them, and I say replace the date of fixed release, which is the current date time. I don't even have, I can run this query again when we do 2.7, and again when we do 3.0, because I don't have to modify the date, it's using the stuff right out of Xquery. I replace the status of fixed release, and here are the results, so you can see fixed release now is there, uses gqtc, status is now fixed release. Now, I didn't actually finish this demo such that it actually then took all those bugs and posted them back to Launchpad, because that would involve some authorization issues, but we can't do that. Launchpad is based on OAuth, Zorba has an OAuth module. This would totally work. So that's my green friend down here at the Telsio. The implementation here is pretty straightforward, because again, the date structures we're talking about here are pretty simple. Those, you know, the kind of things you have to do are insert into an array, or delete from an array, rename a value in an object, not in particularly complicated stuff. So, wrapping up, here's the things that we've actually talked about, the yellow stuff. All these other things in white are mostly corollaries to the kind of stuff we already talked about, the update permatives, the actual implementation of the update statements, the item type grammar productions, are ways to identify and use the data model types. Several built-in functions are for calling the accessors on the data model. A lot of them, other ones, are for handling input and output. So we have the ability to parse JSON, which we saw in several of the demo queries, also serialization to then returns the results, and the corners. Error codes, one new option, a few little things like that. But the takeaway here is that that's it. If you're starting from xQuery 3.0 at the very least, here's one slide that tells you everything you need to do to have fully functioning JSON to your processor. So something that's definitely going to be of interest to everybody here, mostly, I guess, is that, okay, that's great, but I'm going to come up. Or I'm using Couch. That's where my data is. Yes, it's in JSON, I'd like to do things with it, but they're not running Sorbo yet. So what can I do? Well today, you can download and use Sorbo as it is as an external tool. As an external programming language is part of your toolkit for solving your application problems. It's pretty much the same as you would use Python or Ruby or even JavaScript today to connect your database, get information, process it, and return it to your application. The difference is that you're doing it now in a language which is designed from the ground up for JSON manipulation, for manipulating structured data and working with it. In particular, if you're using CouchDB, you can get stuff used in the REST interface and manipulate it and put it right back pretty trivially. Soon, I haven't promised soon, that there will be a JSON module as part of the open source Sorbo download, which means now you'll be able to actually get this up directly from Mongo, in binary form and interact with it in that form. And we'll put it back to Mongo as well. Also from the third party, there is an extended version of Sorbo with Mongo integration, which includes query pushdown, which means that they can do things like you can have your X complicated query and any bit of it that it knows that it can actually ask Mongo about, it'll do. It'll pass that information to Mongo, take it back, and then complete the rest of the query locally. But what, again, we'd really like to see the ideal is we'd love to have integration from the database vendors themselves. This is something we really want to work for, and we really encourage you to download and play with Zorba and decide that you like it, and help us encourage them to do that kind of thing. And Zorba is an open source project. We'd love for them to start with it. So that's actually all I have, so please. What kind of integration are you hoping for from the demo? Well, most of them have some form of what you might call a query language, or at least some way to get the data out of their system. The problem is that every single one of them has their own version of that, be it a functional language or a template based thing from Mongo or something like that. What we'd really love is for each of them to be able to have a jsonic processor as part of their, at least as part of their client side, that is capable of the most efficient interaction with their back end. Something like what, I should say 20 times, I guess, done with MongoDB. It would be very nice if that kind of thing could be handled, and it would be very nice if it could be handled consistently across vendor as well, because it would make it much easier for people to try out different, most equal vendors and see which one works best for their problem. That's your question? How is that SOBA mechanism integrated into a database, and how well does it work with multi-threaded application? Zorba multi-threading unfortunately don't get along terribly well right now. Zorba itself is not integrated with the database at all. It's a standalone query processor. I actually can't speak too much to the low-level Mongo integration. I know it exists, but it was done by a company other than myself. The guy who did it is right there, so feel free to talk to him. So the database has the Oracle database particularly has the capacity to extract the data and convert into the XML tag. So they are the engine. I think that they have, they can incorporate this kind of algorithm or engine into the database and produce a JSON output. I would certainly hope that they could. I suspect there may be some legal issues about them integrating an open source project directly, but the specification is free for them to use. I think probably a better fit would be Oracle's XMLDB product or most equal database that they have a booth about out there that they're starting to discuss. We would love to work with them. How about datetime, JSON doesn't have native datetime so how do you translate between the two? Because the link would assume that you want an extra design with datetime type. Do you have any plans? This is not only a plan, but half of an implementation serialization of JSONic right now says basically when you're serializing a JSON object or array, if you come across anything that isn't JSON then there's a, it creates an object on the fly which says this is not JSON, but it has this type and this value serialized or whatever. So the obvious corollary that would be that we need to have a parser that can recognize that and parse it back in and create the data model as it is. That's actually something we're hoping in when we're pretty zoomed. That serialization site is already there. If I can speak to the datetime thing databases go ahead and add that sort of thing to it. And the thing is that they all do it a little bit differently. And this is just another symptom of the JSON of those people, vendors not acting like community and not offering a platform to the users. This is exactly the sort of thing that users should be asking if they're going to be doing the same way. If they could produce that output in a fashion that looks like the way external schemas has to represent datetimes then we'd be able to read it and use it and produce it the way we want. It looks like an interesting area of crossover between what you're doing here with JSONic and JSON-LD. It allows you to define a context where you can specify the data types associated with properties of JSON. So for instance, datetime, date, duration, and all that stuff is defined in another document, which then can presumably help your query optimizer to have advanced knowledge to respect these particular fields that are going to be in a datetime model. Right. Certainly the case that when you're dealing with XML documents defined by a schema that Zorba is capable of using a schema information for optimization purposes. So yeah, if there is anything like a schema language for JSON, be it JSON-LD or something else, then I would imagine that would be a fairly easy optimization to have. There's also a JSON schema effort out there. I'd be very curious to see how those progress and whether they have any take up because a lot of people, I mean, I've seen at least some hypothesizing that the day that somebody comes up with a JSON schema will be the day that JSON dies. I feel confident that it will happen. At the very least databases probably can do a lot of good things if they can define at least a minimum of what needs to be there for something. And you can already do that with Mongo and a few others today by specifying indexes and they can reject things that go in that don't have the correct index fields and so forth. That's the first step in that direction. Yes. One of the things that we specifically didn't want to have in JSON anymore. Was there for a reason? I mean, it did solve a problem, so perhaps someday we'll re-archive again. The bad word, namespaces. Whether somebody would want to add namespaces and change something, the answer is probably somebody will have a problem eventually that will be solved by that, but whether that will be enough to actually have it happen, I kind of think not. Chris, you must have thought about that. I'm just thinking, if you have a JSON data in your database, you talk about the X query and this on it, presentation of queries. So I'm just thinking how you can have a layer, so you can then on top put a SQL query layer so it can kind of unload into X query and do the search and all that. So how do you envision this kind of a layer which can be put on a SQL query? The problem is, and I don't know if you're here for Jonathan's talk, but he wanted to more detail about this, but the problem is the SQL data model doesn't map very well to non-structure to semi-structure data. And so, I mean, as much as SQL has a tremendous amount of history and a tremendous amount of tooling available for it, it can't really do everything you need to do if your data is in JSON or so any layer like that, at a very least, would by necessity be somewhat crippled or at least limited in what it could do. That's not to say it's not useful. But, and I could envision a trivial implementation pretty easy. I mean, a SQL parser is not particularly complicated to write and all you have to do is say, okay, I'm asking for these particular things that I know those are the names of and as long as they map to the object that he keys in your Mongo database, for example, pull those objects and then put them into your query. I would be, I don't think it would be hard to do. I would be a little bit surprised if it really was a good solution for very many problems. So, does that mean that you can envision some kind of X queries coming into the SQL or whatever you query into your data, in some of the functions which may be translated into X queries, that's how you see people interacting with the data in the cloud? I mean, in my sort of ideal world, I think SQL doesn't exist. I mean, it's I mean, it's, there's a tremendous, I mean, there's a lot of problems that a big SQL database is absolutely the right problem for and it will continue to be the right solution for it. It's not my paychecks, I'm not going to say anything different, but absolutely, there's a, you know, I'm verifying their business model. There's absolutely a lot of significant set of problems that they are the right solution for. However, there's a significant set of problems that they're not the right solution for. And I think the fact that they're already working on no SQL project their own tells you the best case. And once you've stepped away from that model, I think trying to shoehorn SQL back into it is just going backwards. So actually, my question was not about just SQL, it was about how users are going to interact with that right? So SQL, the reason I brought SQL is because people understand SQL. Fair. You understand the presentation, right? So I was just thinking what would be the way users are going to interact in an easily understandable, interpretive manner to get that right? Do we have to keep them X-query? Yes. Yeah, and word yes, but it's a A, X-query, especially JSON without XML stuff on it. It's not that complicated. It's actually not that much harder than SQL at the base level anyway. It's just that you've got a lot more levels you can grow into. And B, my boss would like to say something about it. Can I answer your question? There are a lot of degrees why you would still like SQL. The major reason being there's certainly reporting tool right now there, which interacts with data in SQL. And before you rewrite all those reporting tools, which are going to be in X-query for a long time, the only way to interact and make those reporting tools work is to interact with SQL. We have a PhD student in Zurich who will work with us and keep through the translator from SQL to X-query. So you write your SQL automatically in X-query if it's generated. So you're not in X-query. I actually want to talk about that. Okay, how are you doing? Just an F1. No problem. So let's get that out there. Anything to ease the introduction? I think hopefully, ideally, that's a short-term solution where short-term and oracle times is fairly long. But it would be still, I think, anything that uses adoption is good for us. I like to be doing a lot. I think the whole, I think, the one thing that we're missing is a mapping mechanism, whether it's a schema-based thing or an application that says how do we get data types in XML into serialized, into JSON, the way we want them to process, because the consumer piece may not want the standard X-query style of data format. First, they may be producing data on that X-query doesn't like. And so we need to map these data types over to, because it's not only a realistic thing that different database vendors are going to embrace the X-query data style or other data type formats. We need some way for them to get to their name with format. And some standard vectors, whether it's fuck standard functions that we call, or iterations, or external schema, or external integration that maps those. I think that's a critical part to this language. William, is this version of Trizor, but have the JSON extensions enabled on it, you know? So, I might be... We're remembering the idea as a moment JSON type type, that data I need and stuff like that. I just want to wonder if this works. So it does work. This is the way we do it right now. I've created an array that has a single item, which is the current date. It has an access datetime, a schema datetime. And what happened in serialization time, and not before, is that we created an adjacent object that says, hi, I've got a value which is outside the bounds of JSON, and it's type is an access datetime, this value is this, and so that's all falling back to the schema way of handling things. And so that is round-term, right? I mean, that's enough information that I could parse that back in and have the same thing. You can put it in theoretically, you can put it in type you want there. So you could say type is MongoDB. If we need it to, that kind of thing. Or, I guess my suggestion would be that everything gets filtered into the x-ray by looking at things on the way in, and then as and when it needs to go back to something different, it gets filtered back. That's why Flower Foundation, which is, as I said, heavily sponsored by Oracle. All right. Oh, do you have one more back there? No. Okay, and I'm actually over time, so thank you very much.