 And welcome, my name is Shannon Kemp, the executive editor of Data Diversity. We'd like to thank you for joining our second installment of the new monthly Data Diversity Webinar series, NoSQL Now with Dan McCurry. Today, Dan will be discussing innovations in NoSQL query languages with guest speaker, Matthias Brantner. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A in the bottom right-hand corner of your screen. Or if you like to tweet, we encourage you to share highlights or questions via Twitter using hashtag NoSQL Now. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. And with that, I will give the floor to Dan to introduce himself, Matthias, in today's presentation. Hello and welcome. Hey there, Shannon, how are you doing? I realize, how are you doing today? Excellent, excellent. It's great to have Matthias here. He's one of the great gurus in query language, and I'm just really excited to have him not only here, but also be giving a demonstration of some of the stuff he's working on. Hi, Dan, Shannon. All right, well, let's get started here. First, I think just to review the format. We have about 40 minutes of material prepared, and then we hope to have time for questions towards the end of the webinar. Let me just start with giving everybody context here. So first of all, just to make sure everybody knows, what we're really trying to do is make sure people realize that we have a lot of material that we're going to be covering at this conference coming up. This is kind of a preview. Matthias will also be giving a presentation. We have links for his presentation that he's going to be giving at the conference. But what we really want to do also is give you a really overview of some of the really exciting things that are happening. The reason we call this the NoSQL Now conference is because we're talking about what people are, in fact, doing in production systems today. And I think guys from 28MSEC are a good example of people that are doing production systems using a common language, although a lot of their stuff is focused on the Mongo right now. We're starting to see a lot of other very interesting things happening in this space. To give you a quick background on the two of us, I'm one of the authors of the Making Sense of NoSQL book with Ann Kelly. And we hope to be seeing that in print next month. We're really excited. I just wanted to make sure everybody had a good background of Matthias. He's a doctor of science from University of Mannheim. He has a very strong background, putting a lot of research papers in, but he's very focused on practical real world applications. And we'll be talking about that in a few minutes. So let me just kind of give everybody context. Last month, we talked about the high level NoSQL patterns that we're seeing these days. This is kind of our taxonomy of the main types of NoSQL databases. And the two on the upper left, relational, analytical. Most people are very familiar with, and they have a good background for. And then the four new ones that are being added by the NoSQL world, the key value stores, column family stores, grass stores, and document stores, are really the new patterns. And one of the key questions is, do you have to have a different query language for each of these systems? And apparently today, I'd say that's kind of true. Relational systems, of course, all have SQL, and analytical systems frequently use MDX, although a lot of them also use SQL. They also have a lot of concepts like queues and categories and measures that are common. But the four NoSQL patterns have a different approach to these things. Key values are very simple. So they don't really have a query language, although most key systems do have restases, and they allow you to put almost any language on top of those rest interfaces. The three systems, column family stores, graphs, and document stores, each seem to have their own query languages and are starting to, people are asking, do I really need to learn all of those different languages, and can I use one language to query a lot of them? And especially if you're starting to think about agility, where you want to have a team of people that are trained in one, you don't have to learn different ones. And then the other thing to realize is that we're in a distributed computing environment. And this is very different than traditional SQL where if you have two different tables on different servers, you're actually doing a join between those servers and you're sending the data back and forth. With NoSQL systems, we tend to have a lot more distributed clusters of shared nothing computers, where we're sending queries around the network to query nodes. Those queries are being distributed around the cluster, and each of the data nodes are then responding. So we think that the role of query languages is more important for interoperability than the role of data moving back and forth. And then we really remember the fact that we don't really have these standards, and that's really preventing a lot of third-party applications from coming on board the NoSQL movement. One of the quotes I have is from Michael Stonebreaker, who was our keynote speaker two years ago at the NoSQL Now conference. And he very clearly said, if you have 75 NoSQL databases with 75 different APIs, that NoSQL movement is never going to be mainstream. It's just going to meet niche markets, much like the object-oriented query languages did in the 70s, 80s, and 90s. We have to remember that SQL is a platform today, and by platform we mean that you can write applications and make them portable. You write to one common SQL language interface, and as long as those databases have drivers, ODBC or JDBC drivers, your applications are relatively portable. The problem with the NoSQL movement is we don't yet have that common standard. And so today we're going to talk about that, and we're going to talk about a potential candidate and what people are doing. And that would allow new, great applications to be developed that could use a common query language and maybe even hit different types of NoSQL databases. I also want to emphasize very strongly that we don't have just one type of data that we're using, not just tabular data in traditional databases. We also have a lot of read-only data. We have binary data and graph data. We have data from red-collars. We have full text. We have a huge diversity of data types that we're dealing with in the NoSQL movement. But we're also dealing with people who are using this data in different ways. They need security. They need will-bats control. They need transactions. They need to do analysis on large sets of data, so they only want to deal with making some reads of it. They get the pre-computed sums and totals. They still care much more about search and the standards like XQuery, full text and Lucene. They care about things like spatial queries, and they want control over their clusters and remote data centers. They want both fast and reliable reads and writes, and they'll be able to fine-tune that consistency and availability. I just wanted to mention this whole thing about standards and what's happening now. We're very aware that certain standards, like SVG, kind of sound like a back burner for a while, and then certain things happen when Steve Jobs said, well, we're not going to have flash on our mobile phones. SVG becomes a dominant standard, and we're starting to see SVG in a lot of portable devices. It's very fast. It's very efficient. And other standards like that, you can't really predict when they're going to become mainstream, what key events are going to happen. And for me as a developer, I constantly am trying to struggle if I'm going to develop a new app or a database should I target. What's best suited to it? I would much rather write to one standard and talk to many databases. The hot-spoken to write, rather than have to write one app with many different queries and have lots and lots of different software bases just for the same application. So I all agree that standards very much are involved in lowering cost. The way today wouldn't be possible if it hadn't been a standard like HTML. And we did get HTML from companies like Oracle or IBM or Microsoft. It came from small, innovative companies that are individuals that are making these standards available. So with these standards come a lot of ability to save money. And I'm involved in U.S. federal space. I see standards saving money all the time. So big companies are also being forced to build services. Amazon Web Services is a really good example where Jeff Bezos said no groups within Amazon will create their own, everybody will create standard APIs and we'll show those whereas other companies like Google have pretty much limited ad hoc. And so we want to see can we develop corporate data services that are standards for these apps on different heterogeneous NoSQL databases? I think if we look at the lessons of jQuery we have to realize that adapters for different types of databases aren't inevitable. And then Microsoft created extensions for their browser and everybody hated having to put in if and else code for IE. Now we have one standard adapter, jQuery, for example, where we can write our web pages against the standard library and it does different things based on the browser capabilities. So I think this is inevitable. It doesn't come from browser vendors, right? It came from a third party and we're starting to see that in the NoSQL space. There are also other de facto standards and I just wanted to mention that even key value pairs like Amazon F3 have other standards. So we can now take a standard key value store app run on F3 and if you don't like the cost structure, NoSQL databases like Rieck have the same security models that we have on F3. So that's kind of the almost the S3 de facto standard for all those security models. But there's a lot of challenges with adapters. We want to be realistic and we want to see that each of these adapter systems has limitations and the time these limitations get ironed out but there's something we have to be aware of when we're using a general language to access very specialized functions and these NoSQL systems. But I think before we get big corporations joining the NoSQL thing, these are the types of questions that we have to be able to answer. So we've also talked about can we have one NoSQL database that does all these things? I'm not going to answer that question except to mention that at our conference we talk about these issues every year. Last year, the guys from 2018 MSEC had some very good presentations. There are other graph companies that are doing very innovative things with their graph databases and we'd like to be able to see whether or not we can have one language actually query all these different types of data. The other thing about this is the interfaces. How are we going to build standard interfaces for those so that we can get many different types of data running in one large data center with common interfaces? So that's kind of the landscape. I think in the ideal world, what we'd like to get to is where we can use standards and still deal with diversity and have it all run in a central environment. So that's really kind of a feedback to what we're trying to do here. So I think with that stage being set that the world needs standards to lower cost and the NoSQL industry, although is providing innovation, does need those standards. So with that, I think what we're going to do is we're going to turn it over to Mathias here and Mathias, I'm going to now make you the presenter here and we'll have you maybe talk a little bit about 28MSEC and what you guys are doing and your perspective on those challenges. Thank you very much, Dan, for the introduction and also the invitation to this webinar. I'm really excited about this. So before I get started with, as Dan said, with a demo, I'd like to briefly set the context and explain to you very briefly what we do at 28MSEC. So, and after that, I will show you a demo and walk you through a few changes on inquiries to give you a feeling and show you how powerful a NoSQL language can be. So what I would like to start with is I would like to start showing that today's world, and we all know that, is a pool of data which exists in a huge variety of data sources and also in a huge variety of data formats. And Dan already mentioned it. So it reads from completely structured data in a regional databases to flexible and nested data formats such as JSON and XML down to completely unstructured text. For this data to be valuable, a lot of processing needs to happen to turn it into actionable information. For example, the data needs to be extracted from the sources. It needs to be integrated to be filtered, transformed, cleaned, correlated, or aggregated. And people are doing this today, but mostly continuing and stitching together solutions around the technology that was developed in 1978 which is called SQL. And as you probably imagine, SQL was just not designed for today's NoSQL complex data challenges because it records first an upfront schema, and it works with tables instead of nested data structures as JSON, XML, and all records. And then in the advanced tech, we have developed a solution that enables people to get from data to actionable information far faster than ever before. And then part of our technology is a language called JSONic. Dan already mentioned it. And like SQL, JSONic is a declarative which means it's very productive. It's very easy to understand and use follow-up to SQL users in today's marketplace. And it's also because it's declarative highly-optimizable. In comparison to SQL, it delivers to them what we believe a set of capabilities that enables them to do far more with far more different data formats and with much less code than ever before. JSONic is an open specification. It has been co-developed by 28 milliseconds. People from Oracle and EMC. Recently, IBM has announced that they implemented JSONic in their WebSphere line of products. So I should give you a sense of where this is going. So I don't want to show you any more slides, but just get into a demo and show you how JSONic actually looks like. Here's my browser here. Usually, you'll be seeing the browser. So what you see here is our product called 28.io. What you see is that I have a list of projects that I've created up front. I'm going to select this product. This product called Mongo SF. What you see here is the file of the data browser. It allows you to browse your data and write queries. In this case, this project is connected to a MongoDB database. And the MongoDB database contains three collections. You can see them on the upper left, the answers collection, the FAQ collection, and the SIDS collection. You can one, and you can choose the content in the collection. So for example, here I can navigate through the answers collection, the name of the documents, and what you see is in MongoDB, it's a JSON document stored there that describes the answers of a subset of the stack overflow data set. So the answer has a question ID, it has an answer ID, it has a creation date, it also refers to the owner who is the guy who wrote that question with a user ID. Another important here is the FAQ question, which actually contains the questions. Again, the question has an ID, it has a last edit date, for example, and it has a title, it has some tags, and it also contains information about the owner. So this is shown, I'd like to show you the first actually sonic query on this data set. With this query, what we try to do is we try to get a list of all the answers, questions, the order by the creation date, and get the title and the creation date out of it. So what we do here is I have a chase on a query, seven lines of code, it goes through the FAQ collection, it selects all of the questions that are answered, it orders them by the creation date descending, and then it constructs a new chasing object, here's what we call return clause, that projects on the title and the date, and it's already been run, so here you can see the results. What's in about this? There are two things that are very important. The first thing is chase allows you to navigate in deeply nested hierarchical data sets. In this case, we navigate inside the JSON document and extract the fields with the name is answered. On the hand, chase sonic allows you to construct, to construct new JSON objects. In this case, we construct for each of the results a new object that contains the title and the date. So, seven lines of code gives you a thing of how chase sonic looks like and what you can do with it. Now, let's take a little bit more complicated query, more complicated query. In this query, we want to count the number of answers per user. What we do is we go through the answers collection, we group the answers, they are owner's display name, with each group, we count the number of answers and define a variable that contains this count. And we order by count descending and again, we construct for each group a faceNobject that contains the name of the owner and the number of answers. So, in this case, what you can see, the new funderist has posted 16 answers. So, what's important about this? It's a line of code, sonic query, very compact. It does something relatively complex if you're used to doing a grouping in an OSQL space. It's very brief, it reminds you or it gives you an impression that this is pretty similar to what you would do in SQL as well. So, we group a name, we do a count on the group and then return new objects. An example that I would like to show you is, again, a little bit more complicated. In this case, it has 18 lines of code and I don't want to go in all the details but just give you a more complex thing that you can do in those 18 lines of code. What we do is we go through the answers collection again, similar to the previous query we group by the owner's display name. With each group, we compute the average reputation of all of the answers of this one owner and then select the only highest reputated answers. Ordered them by their average reputation descending. Then we also select a set of answers that are the top scored answers for each of the users. So, what we do is we have a nested query shown here that goes through the answers or orders them by the score descending and then just returns the question ID. And being only interested in the top three, we use sub-sequence function to get the top three scored queries answers. At the end of what we do is we return a new JSON object for each group, contains the username, the total number of answers that this user posted, his average reputation over all of the answers and then we also list the top scored answers. But instead of just listing the question IDs that were returned by the nested query here, we want to do a join between the answers collection and the FAQ collection. So, what we do is we go through the FAQ collection and return the titles for each of the questions that matches one of the top scored answers. For result could look like. As opposed to four answers, he has a very high average reputation. In this case, he has only two top scored answers and these are the titles of the questions that he answered. So, this should have given you a feeling of some queries, a simple one that just did a fence and sorting, a little bit more complex one that also introduced grouping and aggregation and an even more complex one. That is the one that I just have on which introduces grouping. It shows you functions that you can use, for example, the compute average. It shows you that there are nested queries. There are even more functions like subsequent or distinct values. And it also shows you that the language allows you to do joins between section, which is the case of MongoDB, for example, not easily possible. So, to mention another field that Dan already mentioned that would be desirable for a no-SQL query language is full text. So, it also has some full text support. Let me bring this up. So, here I have a small query that imports a module, a full text module. So, you can structure your JSONic programs in modules, in this case, the full text module. What the query does is it goes through all the questions in the FAQ collection and gets the title out of it using the dot navigation here. And then, for each of the title, it's using the tokenize function from the full text module to tokenize the title. We use the next clause. We eliminate all of the stop words by using the isstopword function from the full text module. We put each of the tokens in lowercase and the grouping by the lowercase token. After we count the number of tokens in each group, order the count descending and return the token and the count in a new JSON object. So, what we can see in all of the titles in the FAQ collection, the term no-SQL appears 60 times. The pd in this case appears 20 times. So, these are 13 lines of code that are actually really, really powerful. They navigate in JSON objects and they're using a rich full text function library that shows you that you can do full text tokenization, stop word elimination. The module also has other functions that allow you to do stemming or lookup words in the Tesoros and compare them using Tesoros. So, that gives you an imagination of what modules can do and what other features are available in JSONic using modules. This is a very, very good demonstration of a lot of the things you're doing. When you read the code, it looks very much like XQuery. Can you tell me how similar it is to XQuery? Sure. Yes, that's a very good observation. So, JSONic is a superset of XQuery, which is the W3C standard for creating XQuery data. What we did is, we extended the language and made it a superset and also introduced support for JSON in XQuery's data model and primitives that allow you to work with JSON. In a NoSQL space, there are usually two types of people, the ones that have JSON and the ones that have XML. So far, there was no real query language that allowed them to work with JSON. So what we did is, we took the 15 years of experience in developing a query language for semi-structured data and all of the concepts extended it with JSON and the JSON data model and this is the result. In a future example, I will show you how you actually can combine the processing of XML and JSON in a single query. Great. Was that a good question? Yeah, absolutely. All right, so before I go into that, I would like to show you another example. So we mentioned that data is stored in several sources and several formats and what we want to achieve with a NoSQL is to process this data across sources and formats. So what we did at 20 at MSIG is we extended JSONic with a module that allows you to connect to JDBC data sources. So that is what we do is we import the JDBC module and we have a Connect function that allows you to connect to a JDBC data source in case it's an RDS MySQL database hosted on Amazon. So what I'm going to do is, I'm going to do a join here between our JDBC database, specifically the answers collection and a table from the JDBC database. So in line number eight, what we do is we again read the answers collection. Similarly to previous queries, we do a group and here we group by the answers question ID. So the next we get within each group the maximum score for all of the answers in that group and we order by the maximum score descending. Then we select only those answers that have a maximum score of 150. And that what we do is we use the JDBC module to execute a query that allows us to get the FAQ data from the table that is stored in this MySQL database. The data set is similar or exactly the same than the one I've shown you in the MongoDB database. This time the data comes from the JDBC data source from the SQL database. So what we do is we do a select star from FAQ to get all of the data and this function returns the original data transformed into a flat JSON document. After that what we do is we do a join between the question ID coming from the JDBC data set and the ID of the answers that we extracted using the grouping here. So this is our join condition that allows us to join the answers on the question ID with the actual question ID from the FAQ table. And then we return a new JSON object containing the title and the max score. So what you do when you execute this is it gives you a title and in this case good reasons not to use the relational database as the highest scored answer with the score of 117. So that's 19 lines of code. A lot of code actually was spent on connecting to the JDBC data source. But what it shows you is that you can very easily join data between two different sources that is present in two different data formats. In this case a collection in MongoDB the answers collection and FAQ table are stored on an RDS, MySQL database, Amazon. Okay. So now I would like to switch briefly to a different project that I created here. It doesn't say anything. Next project what we do is we implement a use case of one of our customers. So today the Trademark Office publishes all of its data using XML archives. In this case the data is on the storage Google APIs and what we have here is a file called apc130601 which is the Trademark data from I guess the 6th of January in 2013. So what we do in JSONIC is we make an HTTP call to retrieve this zip file. Then we use another module that allows you to extract the file to get the text out of the zip file. So what we do is extract the text out of the file that we retrieved from the web. And since we know it's XML we use the part of the XML function from XQuery to parse the XML. After we apply a path expression and an X path expression from XQuery to get all the case files out of this file. And then what we do is we navigate in the file in this case we get the mark identification of each of the trademarks. We mark identification into uppercase using the uppercase function and bind it to a variable called name. Next we create a new variable called E for metadata here that contains for each trademark a new JSON object. And this JSON object contains the serial number of the trademark. So we contain a field called SN and the value is constructed by evaluating an X path expression on our library retrieved from the web. It also contains the name which is the mark identification of the trademark and it contains several other things that we use a little bit later to query this data set. In this case it contains the full text tokens of the mark identification. So the token string function will return us a set of tokens which then is embedded in an array into the token field. It contains a field called MP which stands for Metaphone Key which is for each token a special key that later on allows us to do a phonetic search on the trademarks and then it contains three more fields the owner, the status of the trademark whether it's live or dead, and the class. And each of these JSON objects that are constructed are going to be inserted in a MongoDB database called trademarks. It is we retrieved a file from the web. We unzip it and parsed it. So we got a lot of data out of it using XPath and construct for each of the trademarks in the JSON object which is later on inserted into the trademarks database. Actually take a look. Could I just review the significance of this? What you're really saying is that even if we have very complex XML data sources that by just using very simple XPath expressions we can parse that complex XML data without actually having to write XML parsers. We give it short XPath expressions. Is that a good summary? Yes, very good summary. And one of those features in this example come from the XQuery language to the trademarks collection to show you one of the entries that was created. So in this case the trademark with this serial number was inserted into the database. The name of the database is of trademarks is this. The full text tokens, there are three full text tokens. There are three metaphone keys here that are later used for phonetic search and the owner, the status and the class. And this trademark database contains around 8 million trademarks. And then what we do in a second program is we actually query the transaction when a user enters a search on the website of the trademark laureate that we built this application for. So what happens on the search side is the user puts in a search. In this case we use the example Saloon Africa. We do the exact same thing to this search then we did to the mark identification to a trademark name which is we tokenize it and put the terms in this case in lower case. And we also compute the metaphone keys for each of the tokens. And then we construct a courier that is actually not used. Just a few more, sorry. We go through the trademark collection and get all of the values that match. And what we do is we apply a ranking here. I don't want to go into details in order to make sure that the search is ranked. We use only the ones that have a score created in zero. We order them by the score attending and then return and you chase an object that contains the serial number. The name of the trademark that was found and the score. So in this case Saloon Africa is a trademark that was found the search that matches the exact search. Hence it has a score of two but it also the result also contains trademarks that only match Africa in this case Saloon which is something that might phonetically be similar to Saloon. And that's what trademark lawyers actually need to search for. It shows you another very interesting case that we implemented for one of our customers. So we got XML data from the web because we have a MongoDB database. We transformed the XML data or some of it into JSON stored in MongoDB and later on have queries on this MongoDB database using JSON that allows us to specifically search for trademarks that we want to do. And yeah, that's about all that I wanted to show you and I'm going to give a presentation back to Dan now. I'm very impressed. That was what I'm impressed by is the amount, the number of lines of code that you actually have to write seems to be very short. There's not a lot of these 10-page SQL queries that I've seen like I'm used to in the past. And just to verify, right now you're pulling data from other sources. Can you also write updates? So for example, could I take data from my Mongo and insert it into the relational database? Absolutely, absolutely. So I've only shown you examples that actually query data sources, but for example, using the JDBC interface that we have, you can obviously also issue updates on the JDBC data source and Chase Sonic also has an extension that allows you to do updates on JSON documents or XML data. So you can either manipulate collections in different entries in it or also modify single documents in a collection. For example, replace the name of a field name or delete a field from a JSON document or replace the value of a value in a JSON field. And then one other question. Several of the native XML databases that I'm familiar with also have an xQuery update function. Is there some way to get the hooks into those update functions also? So we don't have a function that allows you to do xQuery updates, but we have the xQuery update classification implemented that gives you a syntax to do so. Oh, okay. So you do implement that standard? You do implement that standard. We only have functions that allow you to modify actual collections because xQuery update doesn't specify that. Okay, that sounds great. So it sounds like we answered that question. One of our audience members asked, is JSON for reading data, how to write data? So I think you've answered. Yes, you can do insert, delete, and updates on other data sources as long as the API permits those things. Yes, that's an important example, actually, contained at the very end function call to insert a JSON object into a MongoDB database, MongoDB collection. That sounds good. So let me just kind of understand the status of where you guys are at as a company. It sounds like for each database that supports JSON, you do need to have some kind of adapter written so that your language composes it to the database-specific API. Is that a good summary? That is correct. So what we did is just begin MongoDB. To us, it seemed the fastest-growing SQL database. We have implemented a very deep support for MongoDB. We have a lot of optimizations that make that such JSON queries are evaluated very efficiently on top of MongoDB. So for example, we leverage indexes. The MongoDB database has indexes on the collections. We push on projections and stuff like this. But we also have a lot of other connectors, more lightweight connectors similar to the JDBC module that I've shown you that allow you to talk to Elastic Search, for example, or to Couch Base or to Cloud Ant. Are you doing any work with Solar yet? Are you doing any work with Solar, the Lucene? No, we started with Elastic Search because that's what a customer asked us for. But it would be very easy to do Solar as well. Great. So it sounds like the core language is up and the compilers and the things that interpret the language. And now you're really being driven by customers to build adapters to different data sources now. Is that a good summary? That's a good summary. Yes. If the language is up, you can use it. You can use it on MongoDB and all the connectors that I just mentioned. And if people have other needs to connect to other sources, then we might look into those as well. But I believe that right now we cover a very good set of connectors that people will find useful. Okay. And they could just go to the jsonic.com website. Is that right to talk about those? There are two things. There is jsonic.org, which is the open specification, which is not proprietary to 28 I'm sick. And it defines the core language, the core language and the core update language. And it also defines a set of functions that allow you to do common stuff, to work with streams, to work with daytime stuff like that. Basic stuff that you need. And then 28 I'm sick has extended jsonic with those function libraries that are at the moment proprietary to 28 I'm sick. That allow you to connect to other sources and do more stuff like also the SIP module that I showed you or the HTTP module. And so those are two different things. One is in our product. One is in the specification. And they're happy, obviously, to contribute those extensions back to jsonic if people feel the need. But we wanted to keep the jsonic core small in order to allow others to implement it and not overwhelm them with the features that customers in their scenarios might not need. Okay, great. Hey, Wally. A good question just come over the wire about whether or not a jsonic query could be spread out over a cluster of servers. So many nodes have those each run in parallel and then potentially send the results back into the original query node. That's a very good question. So as I mentioned at the beginning, jsonic is a declarative language. And being declarative, it allows you to do a lot of optimizations. One of the optimizations that you can do is actually parallelization, automatic parallelization. And in our product, we actually implemented that. So what we can do is we can take any of the queries that you've seen, we look at it and decide if it's parallelizable, and split the execution over several processes or potentially machines, each operating on a subset of the data, doing filters or aggregation on this subset, and then later on combine the results. So that's kind of a map reduce runtime model. But yeah, that's a very good question when you can actually do that. And that allows you to use jsonic in big data scenarios. Okay. So it sounds like you can certainly use jsonic to do that, but no SQL database is going to have to implement an adapter to know how to send the queries out to individual nodes and cluster and get them back. So in our product, in our connectors, it really doesn't much depend on the data source. So we could, you could get data from a JDBC data source and out of that, parallelize on the result of that. Or you could get the data from a web service and then parallelize the execution on the result of the execution of that web service. Now, it depends a little bit on the query, but you can, with a native source, you can parallelize in the 20th msec runtime your queries. Okay. That makes sense. We just get a question about, is jsonic being consindered for as a standards body? I know that many of the people that designed jsonic, of course, all came from the W3C XQuery working group. So I know you have a lot of very senior people within your organization that have experience with standards. Do you know, is there any organization other than kind of a loose affiliation of the people involved in the language that's actually considering the standardization process for the language? So, yeah, as I mentioned at the beginning, at the moment, jsonic is co-developed by people from 20 msec, Oracle and EMC. And we have a bunch of independent implementations being in Pascal from, yeah, somebody that just contacted us and let us know that there's an implementation, and then also IBM, who hasn't been contributing to the standard, to the specific but implemented it in the web sphere. So we see a lot of traction happening there, and we're very excited that other vendors implement jsonic as well. But in terms of standardization, it's jsonic at the moment, and we are still thinking and also waiting to see where this is going. It might be the W3C. Yep, that's it. By chance, are you coming to our NoSQL Now conference in San Jose? Yes, I'm going to be there, and I'm going to be speaking here in the title called Do More with Jsonic and MongoDB. It's a title to talk about. And so having added to the company, we're going to have a booth to give more and more than talk to people and understand their use case. Yeah, fantastic. Okay, so I think we've nailed that question there. So another question that came up, is there an equivalent for writing stored procedures database via jsonic? That is, I think a lot of people out there know about stored procedures, but I think that people don't really know that jsonic is a functional programming language, and so functions are kind of built in. Do you want to talk about that? So as you said, jsonic is a fully functional language, and it allows you in the language itself to design new functions. You can just declare a function and implement the body of the function using jsonic. And what you can do is you can group those functions into modules, similar to the modules that I've shown you, the full text module, the HTTP module, or the archive module. All of those are groups of functions that are available in the query processor. And obviously, the user can do the same. The user can only, at the moment, in 20 MSEC, only write the query using jsonic and not using a host language like C++, but we might be looking into that as well. But yes, in 20 MSEC, you can define something that would be able to store procedures. Very familiar with Priscilla Wamsley's FUNCT-X. She has all these very, very useful functions. Is there a way that I could potentially call those? In 20 MSEC? Yes, we actually ship them as well in 28.0, and you use the jsonic extension to xQuery, then you can call all of those functions directly. Oh, fantastic. All right, here's a really wild question for you. You're familiar with the EX path. Are there standards that are, people are putting their extensions for the EX path, which is for doing things like FTP and authentication encryption? Yes, yes. So, what is EX path functions up into jsonic? So, exactly the same exact context. EX path is a set of modules that are specified in the W3C community group, I think. And you've actually in the demo seen one of the modules that is specified by EX path, which is the archive module. This 20 MSEC together with the base x developed, and we implemented it, and you can use it from xQuery or jsonic. Very good. All right, I think that's really nice because there's just so many functions that people need to use. I just, for example, I was working on a project a while ago where I needed co-auth authentication to be able to get things from Twitter feeds, and to be able to use those libraries means that I don't have to rewrite co-auth function modules over and over again. What's really interesting about this is that as a SQL developer, I expected to be able to from SQL to Twitter and do a query and get the data back. But in xQuery, it's like xQuery is approaching language, so you kind of expect all those functions to be there. What's your take on the difference between when people come to a language like jsonic, what's the difference between the framework you have in SQL and the frameworks that you have in jsonic as far as using external data sources? How we see jsonic moving forward is to be more of an orchestration of the language between data sources and formats. So what you have is you have sources like the web, like a relational database, or like any noSQL data store like MongoDB, and then jsonic being orchestrating the processing between all of them. And since it's declarative, what the user can do or the query processor is make sure that it chooses an execution plan that is efficient for the data sources that are being used. So we see xQuery not really as a query language for one-picture database, but more as an orchestration of the language between sources and formats. And so other function libraries, like the full-text module or an integration with our statistics tool. Okay, so my take on that is that if we had a team of people in a big company that knew jsonic, we would have to be writing ETL and SQL and have all these data languages for doing data quality and data cleanup. We could have one team that knew one language and one tool and one framework in its reusable, functional, declarative programming language. And all of the ability for doing agile data transformation would go through that one group. Does that make sense, as far as you... I mean, it seems like noSQL has kind of put a persistence problem on the back burner, right? It's no longer a problem. Can I take these json documents and store them and retrieve them? They're still now, right? And you don't really have to use as many joins and you don't have to worry about all the details of how to generate an object-relational layer. Now the focus is on who's the most agile and then that comes down to who can transform data the fastest. Is that a good summary? This is a very good summary, yes. So we believe that the rise of... as Michael Stoneprecke said, there are no SQL databases that appeared. They're all very, very good and they all solve very important problems, a lot of the persistent problems and the scalability ourselves together with the cloud. But we believe that the first thing still needs some work. And compared to SQL, we actually made a step back because we have many different languages and people started to develop a functionality that should actually be in the database themselves. For example, if you take MongoDB, which can do joins on purpose between collections. But some of these people still have to do joins and so what they do is they take their host language, for example, Python, and write 300 lines of code Python program to join to MongoDB collections. And we believe that developer time is more valuable and that not every developer should invent a new join algorithm. And that's where we believe that JSONC will be very, very helpful. Okay, that makes sense. I have a couple of other questions that came in. I'm going to try to make sure that everybody gets at least one of their questions answered. One that came across just now was data analytics. The fact that I know a lot of people are starting to run R as their language and R also has a functional programming bent to it. Do you foresee in the future having libraries that could call an R artistic package in the future? We're actually already thinking about how such a module could look like. There's already a prototype that allows you to call R function, put data into R and get it back. Yes, that's definitely something that we're looking at. Because as I mentioned, we see JSONC as an orchestration language. So we want to build up on the best-of-preed solution out there. And R is definitely one of them. Right. So I think my experience is that using R is fantastic once you have the data all cleansed and in their exact right format and things. And you effectively just send it to a staff engine and get it back. So that could all be done through a JSONC module, then, is what you're saying. Exactly. Okay, makes sense. Let's see. There's a couple other questions about performance. If you wanted to benchmark a query that used the native Mongo... Is it templating language? Is that a good description? Yes. So most NoSQL systems start out with a... Instead of a fully functional program language, they start out with a very simple templating language which may not support recursion and functions and external libraries and things. But I want to... Questions seem to imply that they'd like to know what the additional overhead is from adding the JSONC on top of a Mongo system versus the templating language. That's a very good question. So Mongo has an excellent job in very efficiently executing all of the queries that they can express in their query language. And if you write the query in JSONC that has the exact same semantics, it's our job to make sure that we get the exact same performance. And we can only do that if we delegate the work to MongoDB. So we would act as a translation layer between on-time and the MongoDB runtime. And in this case, you would have an additional network hub for the query, so it will not be the exact same performance, but you will get a similar performance. Now, obviously, if you're interested in a 5 milliseconds response time, then this additional network hub will kill you and you should continue using the MongoDB API. What we really aim at is the more complex queries that people are struggling with and implementing using their host language. So I mentioned the join already, and we see this in all use cases where people are using MongoDB. And if you look at the MongoDB users mailing list, there are a lot of questions about joins or more complex queries. And in this case, what you need to do is you need to compare a genomic program against the handwritten solution in Python, Ruby, or Java against MongoDB. And in this case, there are two things to consider. There's the productivity in writing this query and there's the performance in evaluating it. And then you leave that by pushing the pressure onto a query optimizer like it is done in relational databases. In the long run, you will achieve much better results because it shows those problems across people and not a particular problem that somebody has. To give you an example, what we do is if there's an index declared on one of the collections in MongoDB and you want to join it with another collection, then we would actually leverage that index and put into an efficient, hash-based index join in our runtime between those two collections. And no SQL developer would need to do that. We'll take a lot of time to get to the same performance as we do. That makes a lot of sense. Coming up to the top of the hour now, I think what I'd like to do is maybe wrap up and thank everybody for joining us. I know we have a lot of attendees that have stuck with us for the whole thing, but obviously it was very interesting. I just wanted to make sure that we have a few references. We'll make sure to get some of the questions that people ask out in a follow-up. This is a great session. Thank you very much for joining us. And we want to make sure that everybody knows that the NoSQL conference registration is open. You can see not just Matthias, but other people all the way up there. We have the link here. You can register now. And I think with that, we'll wrap it up. Shannon, do you have any last words for us here? Thank you. I think you pretty much covered it all, Dan, and Matthias. This was a fantastic presentation. And of course, in addition to meeting Matthias, you can meet Dan at the conference. Dan is one of the members of the conference itself. And again, everyone, thank you for the great interaction and the great questions. We'll get the links to the slides and links to the recording out to you within two business days, so by the end of the day Thursday. And with all the additional information, Dan, just mentioned. So everyone, have a great day. Thank you again so much for the presentation. Thanks, guys. Thanks, everybody. Thanks, Dan. Bye. Thanks. Bye.