 think we'll get going just because I know I'm the last person between this and drink so far be it for me to to get in the way of that I'm still far from EPAM but this is supposed to be as taught by it from Ilya Guralik my colleague Ilya had to get pulled away at the last minute Ilya is Russian I am not the only difference other than that between us is that he can speak Russian and I can't at least not before six o'clock so I'm gonna be other other than that we're sort of interchangeable I don't think it might be saying that but what I will say is at the beginning this is hopefully in form or it's the last of the session so please if anybody has any questions or comments during just just put your hand up and and and just button frankly so please don't want this to be a one-way the title there I'm not gonna read it out because it's very long but essentially we're looking at a time series database and how we might use that do do that for use it for analytical applications I'll show you some of those rather than just talk about it so first of all time base is a time series database most simply but it's more than that to the extent that it's a it's a streaming system it's a messaging middleware call it what you will so it's time series dare I say traditional time series on one hand and a streaming capability on the other those too often don't go hand in hand and hopefully I'll show you why we it does very much do that and why it does that more importantly it's been out there for 15 years that's because when I say we I was with a company called deltics we were acquired by EPAM two years ago and in that with our deltics hat on we we still do but under deltics we worked with systematic and quantitative trading firms and the time series capability sort of part and parcel of that and we built our own really to have full control over the stack so it's been out there 15 years and I'll talk about the the use cases more specifically later but it's designed to be very fast which befits an algorithmic solution and with very large data sets sub microsecond as you might expect in that environment mentions rich data schemas and polymorphism again I'll show you that just sort of have that as a placeholder very importantly though the API's python Java C++ and .net very important and some of the applications I'll show you will be in I think all of those certainly Java and Python we've got examples which I'll show you very shortly so the reason we're here is that we contributed this time base to the open source community via Finos earlier this year as a community addition the first question will be what's the difference and I'll show you that difference it's actually very little the current this is not a teaser application this is not a teaser product to get people to upgrade to enterprise it really isn't I can go into that in more detail we actually have online what the differences are but essentially they're there to do with connectivity to exchanges the kernel and the API's are exactly the same but we can drill into that if anybody's either concerned or interested about it being a teaser edition which is not so some of the use cases again I mentioned that background is essentially an algorithmic trading signal generation systematic trading so model generation and backtesting of models or algorithms against historical data is sort of use case number one for that world and going alongside with that is all of those other things I'm not going to read through them but but essentially this idea of using historical data to back test and then when you're ready which is a decision point that has a lot of things going into it we then deploy that for live trading and that idea of using historical data for in this case back testing that example and then for live trading that that sort of switch from history to real-time is one of the reasons that we call this because it is a time series database and messy middleware that that transition from historical to real-time which is not necessarily a natural transition for us is a natural transition because it had to be for our business domain that we're using on I'm not going to go through that any questions so far okay so the history there's the history the most two important ones on there are the cloud support not surprisingly a lot of demand for cloud deployments over the last few years so two years ago we really made that an emphasis so a lot of our new clients deployments now are on AWS using Kubernetes to to to deploy and scale and then this year as I mentioned earlier it's now within FinOS I'll show you where the repo is very shortly and just want to touch on this because the FinOS people asked me to talk about why we decided to open source this given that it was and still is a commercially licensed product and the reason for that is it's a fairly well with the simple lots of other matter so when we were with Delta XO pre-epound we were a 90 person engineering firm traditional software house building software taking it to market supporting it and we were selling commercial licenses usually annual subscription so obviously people do make money with an open source business lots of very good examples we weren't smart enough to figure out how to do that even though lots of our engineers were and are or are and were very supportive of the idea of open source we just couldn't figure out how to make it work commercially we got acquired by EPAM two years ago EPAM for those of you that don't know is a is a very large engineering firm with doing digital transparent transformation engineering services consultancy with 40,000 engineers it's New York Stock Exchange listed many people haven't heard of it but it's been going for 20 years and is large and it's it's you can look it up it's very successful but it's very much supported it is very supportive of open source and it fits the business model so when we became part of the EPAM family it was a natural release if you will for the philosophy to get out into the open source community and I have to worry about paying for it because somebody else did so that's how we got there and start of our journey as I mentioned really it's this year there the resources I'll show you those when we get on to the real-time stuff the live but time based on info is is obviously a website resource there's a lot of information there probably too much people tell me but the architecture documentation API's the differences between enterprise and and this community edition which I said is not very much but that's on there and how it stacks up to other of the time series databases that most people are familiar with KDB being you know great example of very very good solution one tick and then some of the other open source compared on time based on info and there's the repo there you can download it obviously and have a good time now so I mentioned at the beginning this is more about time series and how do we use this in analytics and deployment examples of deployments for analytical applications so before we was part of that I'm going to talk a little bit about time series data now in financial services we tend to always think about or first and foremost is market data whether that's level one two or three market data tends to get the the lion's share of attention in respect of time series data because it's very pervasive it's very important and it drives a lot of things which come on to orders and executions of trades is clearly another very common use case but then some of the other ones probably increasingly less common but but there they are the satellite data and the consumption data in the news social media sentiment there was a probably a five years ago there's a lot of research done on that from a first signal generation there was a big push on that we got involved in it quite a bit we did quite a bit of research using our tools and time is in particular to do that so time series the point of this is time series is not just market data that is foundational and very important but we shouldn't preclude ourselves and one of the things I'll get to show you a little bit with time basis that you define the data structures we don't just say here's here's a set of data structures go work with them you define your own and there are ones some there that are regular ones you would see but also others that may be less obvious so shipping data who's you know there's no standard data structure for that so they're the types of data the where we use them in financial services you can keep going down with this list but the point of this list here is really to highlight that we use them whether we know it or not historical time series and real-time and some examples there of the different uses using different of those real-time and historical now the reason for putting up there's a lot of common words there so risk management whether it's historical simulation value of risk or whether it's real-time trading risk position risk it's risk trade surveillance whether it was after the bad guys did some bad things or whether we try and catch them real-time it's surveillance our goal is obviously whether we're back testing them or live deploying them we need historical data and and real-time so the the point here is that this there is no hard distinction between historical and real-time real-time now is now historical by its few microseconds since I said that so this interface we don't see one essentially the difference in real-time and historical data for us in our world is the timestamp and if that's now it's real-time and if it's not historical it's it's it's and so we have this concept of the essentially a moving window where we maintain this this moving window whether it's measured in microseconds millisecond stays months or years it doesn't matter literally doesn't matter but but as new data comes in it moves along and again you define the width of that and that's very important when we get to the analytics so a lot of the common analytics that we use whether it's simple moving averages or correlations or cointegrations or even volumated average price the most simple is by definition set of data bounded by time so the concept of the moving window whether it's real-time and historic we don't care we shouldn't care and when we get to again show some things on the screen and if you do dig down on the documentation the API is the critical point here is the API is that you use to build applications are the same whether you're listening to real-time streaming data or whether you're listening to real-time streaming historical you don't have to worry about where am I now versus the history it's really important and when we again because of our background when we're back testing models and we need to deploy them lat now we don't want to then rewrite them because we spent all this wonderful time and energy building a model and we tested it in Matlab or some historical data set and then we hit the real world and we have to do a real-time and we do gee we gotta rewrite it all so it's very important that this this interface there is no effectively change in the the API is between historical real-time so that's how we describe that the other thing in analytics nowadays we expect or our users do people generally expect it to be Python is very commonly asked for now it's an expectation due towards like Jupiter no put Kafka there's open source in particular is expected to be used as opposed to you know back in the day where you had sort of fixed forms where you had to fill in fixed parameters and then you ran your analytic nowadays that the people expect the to do it themselves essentially with tools that are very accessible and of course everything is expected to be deployed on the cloud so the point of mentioning these things is that we can't predict whether or we shouldn't be able to and if we do we're restricting ourselves if we predict whether our users are just going to use historical data or whether they're just going to use time series data we don't know that and we don't know whether they're going to need tick data at measured in milliseconds or microseconds or whether it's measured in days weeks or months so we can't we can't design for any one of those things we have to design for all of them which again makes a engineering challenge which wrong way so this is this is I promise you the last slide before we get on to some actually playing with some stuff one of the when we build applications analytical applications those considerations earlier that I mentioned sort of boiled down to the game this isn't exhausted but these are some of the things that we have to consider and the first one there in the days of the you know the tradition I think we can use that word traditional relational database or Oracle or CyBase where you have these wonderful store procedures working away on the server were given whatever timescale you're looking at we're typically very fast you did your processing on the server you had the power of the service server to do that crunching and that was very good from a performance perspective that is very good but it's clearly restrictive because you're doing the count you're somebody else is doing the calculations and then you're giving the results to the person your consumer has to use it like it or not so doesn't make it right or wrong it just means that that's a restriction you trade off speed for flexibility or lack of flexibility the other extreme is the opposite of that is where I'm not doing anything to the data I'm just going to give it to you and you're going to have to deal with it right you delegate we're not doing anything and that's in the more the streaming paradigm and then obviously there's always the compromise the hybrid between the two where we do some processing on the server and then we deliver a filtered data set to our consumer to basically restrict the amount of work they have to do from filtering out stuff they don't need so there's not rights are wrong in any of those there are definitely rights are wrongs for given applications but considerations nonetheless and then data compression which tends to get sometimes gets forgotten again if you're streaming over the internet clearly you are not clearly but hopefully you have some sort of compression algorithm before you deliver it over the over the internet and then are you compressing it on the disk as well that comes back to how quickly do you need it so so those are important considerations there and because there is an overhead obviously of compression whether you're doing it on disk or streaming it and then content the CDN content delivery network is a sort of question mark because that's the maybe you know where where the world is going in certain certainly market data there are now CDNs that are you know purporting themselves to be the way to receive market data anyway going forward so just some considerations and then what I'll do now is I'm going to switch over I've got some demos as you can see and see if I can get them running this is always a yep okay so I've got a few here this is this is a as you can see this is time based on the left so this is a GUI a front end just looking at at this case this is streaming this is real time this is now this is market data from from Gemini so just on the left hand side here these these are called streams so that's I guess a fairly accepted word in time series land tables might be another somewhat crude away to do it and the one I've highlighted here is a data structure called Gemini and Gemini as you may or may not know is a one of the new crypto exchanges and what we're doing is is subscribing to prices from Bitcoin for crypto so big BTC and ETH and Litecoin so what we're seeing here is just is a real-time updates as as these coins come in so nothing this has been done there's nothing new here but I just want to scroll along and give you some idea I mentioned earlier about the data structure so this is a data structure here there's lots of comms on it the meets is at the end here so just as a note this is an incremental update so we're supporting in this particular data model both the idea of receiving a snapshot so most exchanges will send snapshots periodically or only send snapshots and then incremental updates to the order book between snapshots so this this what we're looking at here are the ones on the screen it may change or incremental updates and if we double-click we can see that data structure a little bit more there and if we look at the JSON view there we have it in JSON view and fairly detailed lots of timestamps lots of detail there so there looks like we're deleting the price the previous price from level 19 of 65,000 price gone down today by the look and then we're inserting another we're putting a new level in level six at that price so that's that's just streaming data but I mentioned I've been mentioning many times is well that's great but what about history so I'm changing now to look at just BTC USD and there it is flicking away if I click view there's my history so the API that pulls this data out this is just a very simple screen obviously it doesn't care whether it was real-time streaming or its history it's the same exact same-day structure this one static because it's historical it's a few microseconds ago or milliseconds ago this is actually a few days ago but it's the same view it's the same query it's the same everything's the same except the timestamp so the other things I can do here I can graph it okay I'm gonna go through this stuff pretty quickly because there's nothing particularly new here I'm just trying to show you the sort of things you can do with the APIs so let's make this a little bit bigger we'll look at last 15 minutes of BTC USD from Gemini so what we're doing here there's a reason I want to show you this we're doing I mentioned earlier about so there we are this is a little graph of we can drill into that the order book each of these lines is a level in the order book bids and offers and the crosses of the trades the reason I wanted to show you that is because I mentioned this one of these idea of do you do the processing on the consumer or do you do it on the server and send the results so here we're actually doing a little bit of both and here's a very simple query there it's very sequel like as you can see and it's we're doing that here as well as doing some local manipulation and one of the things I want to show here if you can see that or not but CMA SMA CMA and EMA simple compound exponential moving averages very simple mathematically very simple but a predicated on having this moving window of time because as soon as you get a new tick in you've got to recalculate everything so those those functions are recalculating every tick maintaining a moving size of a window that has been predetermined this one's one hour by the look of it on here so that's why I wanted to show you that again just in terms of data structure I mentioned earlier about you can create your own the schema here is looks like that it's I mentioned polymorphism earlier you can create some pretty rich and complex data structures here and they say you the user of time base the CE you can download it you create your own data structures these are ones we did earlier as they say okay so that's this is our front end the one that we built more in the spirit of open source here is if anybody's familiar with Grafana this is we just sort of literally lock this up this is basically the same data this is crypto this is BTC USD coming out of the same venue but using Grafana to visualize it and again we've got these this idea of the query here you can you can define your own query down here so very very sort of hopefully straightforward I'm going to keep any questions yes yes so market yes we always show things that move quickly because it looks better right than than monthly the the this can be used for us data a lot of our people we work with will do monthly we balance things for example so then the time series is month by month and by definition doesn't move very much for reference data or static data that changes maybe less or maybe more but less frequency than this moving stuff this will work it's not it's probably not the best use case you're probably better with a more sequel like if you're the downside of time series in general and this and this included in that is the concept of joining and doing complex logical joins this is not well suited for you can do it you've got a right code for it but it's not not not optimized for that so depends on the use case but it's not I wouldn't say it's a natural it's a natural fit massive repetition yeah it'll make a data model as I mean the traditional data model just hates this stuff because there is so much repetition that it's that it's yeah yes please of course yes yes you can and that's often done I see with reference data so often let's see that ties in quite well so when we have people who have a man which is most people who have an existing reference data store they don't want to copy it over but they want to access it so we'll actually combine the time series with the underlying reference data that has been that is managed and looked after and clients so very much so I wanted to watch show this one as well because guess what it's it's it's BTC USD again but more importantly this is perspective so for those of you who are familiar with and if you're not hopefully you will be now with the perspective open source contribution from JP Morgan this is it so this is again the same well this is ETH USD as opposed to BTC but it's the same data source other than the symbol being different and we just literally I'm not gonna say knocked up because I would imply sloppiness but from a speed perspective we put this together pretty quickly for this demonstration to use the the perspective tool that's available in the Finos repo to visualize essentially the same data set and it's actually pretty cool we actually quite like this so this is a traditional order book for you I'm not going to go through this you got bits and asks and different levels ticking away but they've got these really cool you can representations historical which is sort of neat and see that very often if I've never seen that and they've got a vertical chart as well so this is this is really we're actually like this a lot because this is bringing the power of of somebody else's expertise on this particular visualization tool and integrating it with our API's on top of time base so this is a very real and hopefully useful example of building an analytical application that is that is putting in this case to open source tools together in no I did say I would show you conscious of time I mentioned time base dot info earlier so this is this is this is that a few things I wanted to point out on here the architecture obviously you can go through this there's a lot of detail in here that talks about the design and the and the architecture of time base and essentially it's okay I'm not going to go through this but I would if you are interested this is a people tells me put too much stuff up here given it still a commercial license but but but it's up here and we think it's pretty good and the differences between time series common time series commercial and open sources there the differences between different message brokers because it's this messaging idea as well are up there as well and documentation the API's be since you were a development organization so there's a lot of API documentation up here as well yes please so what I'm sorry yes we can't not yes we have done is it part of the product no it's not but yes we do do we can't do that we have done I should say but not it's not part of that this time base yes scale yes it's really fast it's a good enough the it's really fast and does terabytes it's it's we've got metrics on here but but let me most of the clients that we have we are working with terabytes small terabytes not about hundred terabytes so terabytes is a sort of normal size if that means anything from a speed perspective we we we used to have this measure which is still true it's probably fast now but it was from a streaming perspective a million messages I'll come back to what I mean by that in a second messages per second per core with our basic benchmark and a message going back to the granularity that's a level to it's every time the you know the order book moves if you're just you know doing top of book it's that if it's daily data then it's that so each of those is a message and a million per core is essentially the benchmark message we got up to a billion I mean it's not all about speed but it is we got up to a million a billion messages per second when we spread out on a nice number of cores on on AWS but it's measured in millions per second latency is a microsecond measured and and size is typically terabyte times so we're not gonna say it's the fastest it's very fast this you know there's always a speed race but that's gives you hope does that give you some sense I'm trying not to wave my hands too much but it's give you something yes please that's that's a beer conversation yeah no not offline you take that one to the bar that's that's a that's a bar conversation oh sorry yes yeah so the other thing I should have finished with what I was answering that question about so it scales down the scales up so this is not running on there but we used to do all our demos on two core laptops running fast queries typically so you said US equities level two that maybe is like a 16 core machine on a single M2 x5 is that what that is something like that we normally just use one compute node on AWS for most use cases yes yes yes yes absolutely and then and that's why I don't need to tell anybody about why it is useful but but if it isn't 16 cores and it happens to be more then that's not a big deal so but it is designed it and the reason it's designed this is a little bit of history but a lot of our engineers are in are in are in Minsk and back in the day 15 years ago they didn't have all these great machines they had very old machines and they didn't have they had to design very good software to perform on the machines that they had so that's that's what they did I think well I think I know I'm getting waved at so any any more questions any more questions before we wrap up I was pretty much done I was just showing you some some examples but any more questions otherwise very good thank you very much