 Thank you everyone we're going to get going. My name's Stuart Fawr from Epam Systems. I was going to be here with my colleague Ilya he's who couldn't make it and they're going to talk about that subject on the screen there. There will be a small prize for anybody that can sensitively reduce the number of words in that to fit on one line, so I'm not going to read it out because it's too long, but that's what this is going to be about. I've got a few slides which I'll obviously go through now, a Hopefully Time dependent will get on to actually look at some of these applications and look at them in real time so Some real stuff in other words and maybe even some code. So I'm just just moving on. Time base CEO community addition time base is a products that we Has been out there sometime in a under a commercial license. So it's not a new product. It's new to the open source community. fel y gallwn i ein bod yn cymryd yr ystafell cyfrifyddiad ac ymlwg cyfrifyddiad me'n llawer yn cynn Felys 180. Felly, y thyll yn cyfrifyddiadau cyfrifyddiad ac ymlwg cyfrifyddiad, ddim yn cyrraedd. Nid yw'n gwybod mai yw'r hyn yn cyfrifyddiad oherwydd mae'r cyfrifyddiadau cyfrifyddiad yn cyfrifyddiad. Mae'n cyfrifyddiadol ar y cyfrifyddiad ddod 15 yrwydd, a ddim yn cyfrifyddiad oherwydd cyfrifyddiad. byddyn yn gweithio y cyfraddraeth cofnodau hyfforddiol, a'r byd o'r ddod y cyfriddor P. Rydym yn ymweld eich cyfriddor polymorthol ac mae'r cyfriddor sydd wedi a wneud ei hun i ni fuddell yn ynes go iawn, a'r cyfriddor daeth hwyl a phren gael eu cyfriddor cyfriddor a i fynd yn cyfriddor cyfriddor cyfriddor cyfriddor. Rydym yn ein bwysig, rydym yn gwneud ar y cyfriddor cyfriddor sirau yw'r eich anty乜 oedd yn ceisio eu hoffaeth. Dwi wedi bod i afael y bwysig arall y prosion. Mae'r bod wedi bod ei fyddion ond yn gweithio ymwylo ar y defnyddio'r sicrol. Ond mae'r g passemau yn tha i dda ar yionedd iawn. Yn y gwaith i'r gweithio g Niger, ac rydym yn debyg yw gyffredinid iawn. Mae'r bwysig ychydig yn dddangos i wedi gweithu â'r gweithi. Ond mae'n adegoedd hynny'n hefyd. Dyma'r effeithio'r busn ac mae'r gweithio'r iddyn nhw yn deall, ond maen nhw'n ddweud, y gallwn gweld gwahaniaeth, ac mae'r gwahaniaeth mae'r definifio'r hyn yn ddweud hynny a'r hystorychol ddatâ. Mae'r ddau'r llwys yn y cyd-dweith, ond mae'n gwahaniaeth, ond mae'n ddweud hynny'n ddweud hynny. Sefydlu gwahaniaeth, ond mae'n ddweud hynny'n ddweud hynny'n ddweud hynny'n ddweud. data is I'm not going to say unique to the world of systematic or algorithmic trading but it's certainly a very very important feature and I mentioned earlier about it being both a historical time series database and also real time streaming and we can see there we have some use cases where it is used for what used to be called complex event processing which was its own domain a few years ago but it's still done now not used so much as a name but the whole idea of real time event processing where you define the events or the events are defined by somebody else is a very real use case that we use time base for in terms of again the history I'm not going to go all through that but but the most notable thing although I do want to point out is is where we started about two years ago deploying on on Kubernetes for the cloud obviously that that was a big undertaking undertaking for us essentially taking a solution that was built in the world of private data centers and on-premise installations in very much into the cloud world where most of our deployments now are on one of the the various clouds and Kubernetes was a big part of that we still have a number of a large number of installations in traditional data centers on bare metal but the cloud is very much where most of the action is now so again the final piece on there was the release to the open source community as time base CE and I thought I'd just go through the reasons for this submission and because because we when I say we we this business unit within EPAM we were acquired two years ago so so we are a unit very much and fully part of EPAM but time base was developed pre EPAM in a traditional normal commercial software licensed company of 90 people plus minus so we have we had and we still have a number of engineers that were philosophically very much aligned with open source but as a small company with 90 people we couldn't figure out sensibly how to do that at a business level to make money from open source obviously people do it we knew people that do it but it wasn't our MO so we didn't do it once we became part of EPAM EPAM is very supportive of the open source community and and from a business model perspective it's very much goes along with it you heard Chris earlier today hopefully talk about how we view open source in EPAM and and so with strong support and a commercial business model it was a natural step for us to release to the open source community in terms of resources we'll actually dig into these later hopefully time permitting but time based on info is publicly accessible there's a lot of information on there some people say too much there's literally the architecture the file system how it's how it's formed how it's configured the apis are on there one of the earlier slides they mentioned about java c sharp sorry dot net c plus plus and python apis are all up there and a stack of other documentation and the repo is there as well for for actually accessing the code and and the other documentation that the finos there so one of the things the title is about using time series data and time based in that for analytic applications so just talking a little bit about looking at this types of time series data that exist in financial services we tend to always think about market data because it's fundamental and indeed is the better rock of any financial one or any but most financial applications orders and executions of course are up there earnings data other sort of fundamental data is up there as well but then less obvious maybe analytics on the it stack that's particularly important in the in the high frequency trading world where you need to know where your latencies are coming from so that sort of metadata if you will is classically analyzed in time series data to actually understand where your latencies are and how to improve them less obvious but equally not equally important but from another dimension if you will whether other types of data some people call it alternative data of course but whether stats on weather patterns both forecasts and historical consumption data from stores whether it's aggregated or individual stores or companies I should say satellite data was very prevalent a few years ago basically firms were taking pictures of parking lots of big well-known retail stores and then selling that data and people making predictions on earnings based on how full the parking lots were which sort of worked for a little while and then didn't work and news news data social sentiment from news and social media is used for generating trading signals and putting other context into other events that are going on so all of those are time series data less obvious maybe but very important and people are trading and generating trading signals based on those other types of data in addition to market data not instead of because it's all predicated on making some either prediction or correlation with market data so market data is still very important and fundamental but we shouldn't forget these other types and in the context of time base that's very important because one of the things that is is fundamental to time base the community edition in particular is defining your own data structures so it doesn't just assume that time series in in financial uh data financial services is market data you can define your own data structures and schemers for these other types of datas and others that that are not up there and in terms of what operations this data is used for time series data is used for we tend to think of or we don't or i do tend to think of operations using historical data or or uh real time and and there again that's by no means close to being exhaustive but tca would be a classic example of using historical data to analyse the performance of order execution been around a long time um very simple to do but requires historical market data backtesting by definition is historical market data uh value at risk via historical historical simulation the bedrock of banks risk for for many years is by definition on historical data trade surveillance is and then if we look at what uses real time data we sort of actually see a lot of the same words and that's not surprising so front office risk by definition real time risk requires real time data running our goes by definition is operating on real real time data trades surveillance trying to capture bad stuff going on in real time and of course complex event processing so the reason for sort of putting those up there is essentially it's highlighting the fact that we have a simultaneous use of both historical and real time data and going back to the point i made at the beginning about time series data is sometimes thought of as dumping into time series data warehouse will do some historical analysis and that's it um and that's not it it's very much this idea of a simultaneous use of historical real time data and at an engineering level that's not trivial so if you've got a lot of data measured in terabytes typically um and millions of messages per second often in terms of frequency of updates on new data points and you're piecing that together with real time in other words it's coming in very quickly building up very quickly but you also need a historical context to manage those two concepts is nontrivial it's done it's doable there are lots of people not lots but there are several people who do it very well uh time time base is one of those and it's now open source so um so this is just a schematic way of looking at that this idea of of historical data and and real time and real time as soon as it's come in is now historical of course so we have this concept essentially of a moving window a moving window of time and in that moving window at any one point in time is a set of data and and and keeping track of that moving window of a set of data whether that's measured in milliseconds or seconds or days or months or years doesn't matter the concept of there is this moving window so when we do very simple math functions like moving averages um of the various flavors um various technical indicators they implicitly need or explicitly need both this concept of real time data and historical just to keep up to date and again we'll look at we'll look at some of those um in a medicine wheel actually being used so this concept of the moving window and stitching together real time and historical is is pretty fundamental to applications um in financial services and looking at uh what users expect nowadays again this is all in an analytical concept the idea of giving somebody a screen to put in some numbers and press a button and get an answer is would be nice if it still existed it tends not to of course um people uh particularly now are using Python to do their own analytics and and therefore is a very free form you can't predict what they're going to do you can't predict what data they're going to need you can't you can't predict how how often or how how much they're going to need so pythons you can notepad, Kafka or the data structure level tools that should be are supported um by us and sort of need to be supported along the idea of giving you're not restricting what um a user can do an analytical user can do and we also the expectation now is of course that that is available anywhere so web deliverable feature web deliverable functionality I should say obviously the cloud is sort of part and parcel of that very much so but the whole point here is that we we have to support various different types of data different frequencies different contents different amounts and different analytics to satisfy and to to give power users power analytical users what they need and that that's a challenge but that's what we mentioned at the not at the beginning but near the beginning about the last two years we've made a time base very much a cloud deployed platform really in part to support some of these expectations and that sort of brings us to building or designing applications for analytics using some of these in this context so one of the things that used to and still does I should say occur is when is when processing was done on the server hosted the data so the classic oracle side-based rdms is very data centric very processing centric on the server so you basically did the math you did the fun you did the calculations on the server and you and you delivered the results very efficient from a speed of processing perspective not very good from a flexibility perspective but the bedrock of many bank systems today still or do you do the opposite of that do you essentially say I'm not going to do anything to the data I'm just going to send it out and you deal with it as you wish so you're delegating all of the analytics and the processing to the consumer and that's obviously very much the python world where they can do have do very powerful analytics but you're essentially asking the users to do a lot and stripping down the data server to something just streaming data and then obviously there's a hybrid between the two where you may be doing things like filtering a data set an example we often talk about here if you subscribe to the market data from the CME that's a lot of information coming in very quickly across various markets of energy and and and classically and commodities obviously so you may just be dealing in energy or or some subset of energy and only one contract pertaining to energy or commodities so you may filter the data set and then just deal with the relevant amount to you but still do your analytics at your end so that would be a fairly simple but um quite well used example of hybrid processing between the server and the consumer and then data compression comes very much into um where are you doing that compression well you're presumably doing it on the disk so you're you're crunching down your raw data into something that stores less than its natural format is text format but also if you're delivering that over the internet or not you may be compressing you may want to and Dela's will want to compress the data before it goes out of the data center and then but what does that add in terms of latency compression obviously has a cost so so these are there's obviously no one answer here but these are things that are fundamental to designing applications for analytics that are that are cloud delivered or running on the cloud and with that I'm going to actually we're going to run some examples and have some real time hopefully real time we'll see we'll see what works and what doesn't work just while I'm going to change over to the browser any questions that anybody has please just um far away just before I do the um do some example in analytics this is I put this up earlier the timebase.info but this is um where I was mentioning there's a lot of information here particularly under the documentation we've got the APIs over there under developer documentation just while we're here I will just very quickly show this architecture of of what's um what we have and this idea of um one of the things I mentioned about this idea of having a messaging system alongside uh uh as part and parcel of a time series database is this concept of of having persistence or not what we call durable streams and that is what it says I mean your data is stored and available um now and in the future or transient streams where you're just essentially using it as a memory broker that's part of the configuration so that's a so you can see there almost that it's a very it's a very stripped down system it does what it does very well but it doesn't do a lot of things that other systems rdms is particularly would do so the filtering I meant touched on earlier that's this um we call it qql show maybe show some examples of that qql you can guess what the ql stands for the q is quant so quant query language is you can guess what that sort of is but it's it's sort of simple um sequel like syntax to do classically filtering but normal grouping types of data some preprocessing if you will before you stream the data out to a consumer anyway that's timebase.info um a lot of information there some people say too much but it is what it is yes yes it's been based on a palmer on the cpsite absolutely all the time absolutely yes yeah absolutely that's a good um thank you yes that commercial pass it sounds uh you know it's a different different time as they say so this is a very simple um front end application that is as you can probably guess sharing uh market data from in this particular case it's nimex so um we've got some contracts there uh cl is some crude um contract i'm not sure which month crude oil contract and here we've got um i'm just going to just go back a step so we've got the these as many other time series do call and stream so this particular data structure called nimex is um not surprisingly holding data from the market called nimex but we can use this to to look at the schema uh i mentioned earlier that you can create your own data structures and schemas to reflect data that's not just market data but other data that's that may not be that common uh you can define schemas and and what we're looking at here is a graphical visual representation of the definition of this particular schema for for for supporting nimex market data but again this is all user driven um if we look at the data itself i'm going to go back to this uh if i can get there this contract yes of course go back to well if you don't move to the next one so i'll do the same thing here so this is a different data stream it's a similar ui as you can see now we're looking at crypto data from the exchange or trading venue kraken for for one of the many uh crypto venues and here we're looking at historical data we're looking at uh today actually but starting off from 740 this morning this is just charting charting data historically but uh we presumably want to the whole point of this is talking about analytics real time analytics and historical so this is real time this is the bitcoin price from kraken right now updating in real time so you see this thing flashing every time it has an update the actual data itself is in this fairly complex i talked about very briefly touched on polymorphism earlier but here's the fairly complex data structure storing an actual incremental update of the order book so this is this is the level two buried in here it says level two update so basically what's going on here is we're getting incremental updates of the order book from kraken for btc usd in real time so so that moving window this is it real time and then literally as soon as it's updated we get the historical view so this is the history of going back to the last week or the week before so that history is obviously by definition building up in real time and the same apis this is obviously an api populating a screen is the same api whether you're looking at the data a second ago a month ago a year ago or real time you're subscribing to the same data so from an analytical perspective given that most as i mentioned earlier many functions many math functions are implicitly using both real time and historical data simultaneously it sort of follows that you need one api and you don't have to worry about is that history is that now so this is showing you that in it will be in it in a simple use case i'm going to move on now to so some of you may have been to the grafana folks earlier talking about their tools this is grafana's um sort of i'm not going to say flagship it's not for me to say that but this is a very popular and for good reason again obviously a graphical user interface for showing both historical and real time time series so again this is real time this is btc usd again on the top left here we have simple bars these are i think five minute bars we can change the granularity of them and it moves every now and again as as a bar falls up and time time moves on below here we have volume again fairly fairly usual on the right hand side here this is a little bit more interesting because we have some simple analytics so i don't know whether whether you can see those but um we've got sma cma in ema so simple compound and exponential moving averages being calculated which again by definition have a moving window of time both real time and historic the the moving windows time goes through and we can see that moving as time moves on and if we look at here we can see some of the simple scripting language that allows us to do that um you can see there again it's very simple sequel like language defining these moving averages which is very simple for a very simple reason is our source of data is being updated in real time and having that history so that means that the person doing the analytics the consumer is doing the analytic but hasn't got to worry about the timestamp on the data because the same api is giving real time giving you real time and historic so with that um that's about our time slot any any that was all i was going to say formally but any questions yes please absolutely yeah we call that warm up so so do i am i going back 50 microseconds 50 yeah yes it's a parameter absolutely yep yep and that's a good question because one of the things we we did again moving more on the analytics moving away from the time based side of things the analytics themselves can use a lot of data so if you've got market data coming in at millions of messages per second which is quite typical or not typical but it's not unusual i should say then your analytic itself has to be pretty powerful otherwise it's going to choke so you can either solve that in probably more than two ways but at least two ways one is you can stay i can only cope with 100 000 a second i'm going to throw out every other every tenth or i'm only going to take every tenth one in that stream it's a fairly crude way but works or you can have a very fast analytic that can beat that and and so we our engineers of course did both and we have very fast analytics to not choke on data but also throttle it back as well for for where we're providing analytics that can't take that sort of take that sort of value anyone else all right well thank you very much i'll just leave it on time based on info but for more information but any questions please ask afterwards thank you