 So I'm also a real Chris Dev on Twitter. And I work for a small development company based in Trinidad called Chris Dev. And my talk today describes the challenges that I faced in building a stock market, surveillance and monitoring application and how we overcame those challenges using a combination of Django and pandas. I don't work for the Securities and Exchange Commission of Trinidad and Tobago. So my views are my own. And right. So our talk is divided up in the four sections. So the first thing we do is talk about some background to the project. Then we'll introduce our solution architecture. And then we look at our Python or Django reusable app called Django Pandas. And then we take your questions, of course. Okay, so in 2011 the Trinidad and Tobago stock market was forced to migrate to a new trading platform. This was because the company that made the trading platform had gone bankrupt some years ago and could no longer maintain the platform or could no longer would give in support on the platform. So the new platform was great for traders but didn't provide a lot of the regulatory hooks that regulators needed for monitoring and surveilling the stock market. I had done some previous work with the Securities and Exchange Commission so I asked to present a proposal for a system that could bridge the gap. The requirements weren't really well structured but the two things that they wanted was that the system should be web-based because they wanted to share whatever system was developed with other regulators such as the central bank, ministry of finance and so on. And the data we must, or the system would use would be complete and finalized data. I never actually found out what that meant. But hey. And they said if our proposal was accepted we would then work with key users to develop a set of more detailed requirements to the project. Well of course that could only lead to one thing. The first thing we encountered was a very slow procurement process. I mean, we expected that as a government kind of bureaucratic organization but six months between tender in the proposal and discussing and accepting and sign off to the project that's a little long. But actually more surprising was the fact that there's a lot of mobility. One might think people working bureaucracies they're there for life. But that's not really true. There's a lot of inter-department mobility, people leave and so on. So that these so-called key users to tell the truth there's only one left who by the time we finish the project and we deliver the system to the key users who frame the requirements in the first place they've all gone on, they've been promoted, they've left the organization and so forth. And what we found is that a lot of the people replacing them they don't have the kind of statistical training or technical background for this kind of system. And in any case you have to try to explain, promote, sell the system to people like a new set of people every three or four months which leads to a number of problems. You have long lead times to decision making and then if the requirements that we had they would not well understood by the final users because if people who specify any requirements they're no longer there. And if people who, so if people were using so why do we have feature X? Because that's a requirement. Why is that a requirement? So you have to go explain the rationale sell the project once again. So and of course people would say oh you should use agile methods to deal with such problems. But in government contracting you find that there's an implicit waterfall model so that you have to, the requirements is a deliverable. And once the requirements are put in place they need to stay there until the end of the project even though they're no longer relevant or nobody understands why they're there. So, hi. So despite these challenges we were able to deliver a system to the end users. We started ruling it out in the third quarter of 2013. It's called MAS, Market Analytics and Surveillance System and it's a pretty, it's a standard Django application with a couple of exceptions. So this is the basic architecture components here. The things that, the front is obviously we, why are you using pandas? And then you see that Red Hat Enterprise in X5.2 and Postgres 8.2, in subsequent slides we'll explain why. Here's the system that are grammatically. So the setup, maybe I went to the slides too quickly but we have some really ancient Dell machines as servers but that's a legacy of server consolidation exercise that was done in 2010 and again in government style organization there's a five year period. So whatever it's done, that has to stay and we can't upgrade to 2016. Of your seatabiller, a modern Django application because Red Hat Enterprise in X comes with Python 2.5 we had to upgrade Python and we are in the process of migrating the system to Postgres 9.3. If you know any diagram, we had, we have two servers where we originally had three but one was taken away from us because an exchange server failed so it took our server away and we had to do with these two servers. So we have two GoniCorn instances and we use nginx, wrong Robin, load balancing. The PG pool is because we were using Postgres 8.2 or 8.3 or 8.1 when we started as required for replication. So we replicate the database between these two servers so we get some kind of redundancy or. So when we finally migrate to Postgres 9.3 we might get rid of the PG pool and replace it with something like PG Bonser. The system itself is very read centric because so we only have one place that we really upload data. It's not a real time system because again the users or the client wants to only look at final data. So in terms of data, we retrieve data from the trading systems, servers, log, we actually have a request FTP or SFTP system job that goes up and retrieves this log file which is an XML monstrosity and we parse it and load the data thanks to LXML because it's really really a horribly formatted Windows file. And so aside from the trading data we also have a bunch of scrappy spiders that we used to get data from various top markets in the region because they have to use it for comparisons and that sort of thing. And okay so what is pandas used for in the system? So we obviously we do a lot of statistical and time series calculation so there's a 40 of pandas and then one of the things that we found that we needed to do because it wasn't implicit in the requirements is that we needed to link trades and orders and be able to rebuild the history of trade from when the order was originally placed to when it was changed and so forth. So we use pandas approach of split, apply, combine to reconstruct trade order books and calculate duration and duration statistics. Our interesting use of pandas in the system is the ability to do pivot table analysis on the fly. So we actually generate pivot tables on the fly. Pandas has a lot of facility for data cleaning and we actually have used pandas in our jungle views because it could be easily rendered so a pandas data frame could easily be rendered as JSON or CSV or HTML or Excel. And we have a project called jungle pandas which provides a manager and some other functionality so that we could easily render pandas data frames from query sets. So let's talk a little bit about denormalization and caution. So with a system that is heavily statistical with a lot of statistical calculations, obviously we don't want to do that on the fly. So we have daily summary data so tables or models for storing daily summary data and these are populated when we actually load the raw data. And we also gonna extend the strategy to also cash at the database level if you want the more expensive statistical calculations. Strange enough, we're not really making that extensive use of caution except at the level of the dashboard. However, we intend to use a jungle cache machine to do some query set caution. And the thing that is holding back both denormalization and caution is the fact that the user requirement is that they must have total flexibility over date ranges and security so there's no standard queries. So you kind of limited as to what you could cash because they want to put in any data range so that they can get the statistics calculated. We're working with them to change their minds about that requirement. So these are some screenshots for the mass web app. So this is the dashboard showing some of the statistics calculated. This is our activity filter which allows the users to drill down from a trade and they could click on it any trade and see all the order history behind the trade. And now you'd expect this should be easy because clearly in any real system, if you have a trade, they would have a key linking trades to orders. But for some strange reason in this trading system, there was no link between trades and orders. So you actually have to use Panda's capability of getting a bunch of candidate groups where this trade could be linked to and then find the actual trade so that I don't know if you could see there's a green thing there. So that's the one that is the trade up above. So I don't think that could have been done easily with all pandas and it's only about four or five lines in pandas or using pandas. So this is a pivot table. So in this case, we taken a trader and we looking at his trades over a particular period and we pivoted or we break down by security and client and we're looking at the number of trades and you can drill down from this into the to find out more about the trades and orders that made up this any sell in the table. So this was, we use foundation five and the, it works best in Chrome and the users love that because they hate IE for some reason, right? Not only developers hate IE but users do even incorporate environments. So high charts is used for charting, high charts and high stock. There's not open source. We use jQuery data tables for the interactive grid. The only criticism about that, about jQuery data is not responsive but it's improved a lot. Okay, so that's the traditional client if you wanna call it that. We also use the I Python notebook with this project. The idea came from one of our directors who wanted to give the ability of the analysts to perform ad hoc statistical calculations using the data that we had in the system. So we tried out, we did a number of experiments and then we came up with the best way to do it was to basically install anaconda Python on the users, Windows workstations, put the actual project source code and we have a minimal settings module and where the installed app only points to the models that you want to expose at the I Python notebook level. And we create a profile for each application that we want to use and I Python has this startup directory, so you put a script there that sets the part to include the application and sets the Django settings module so then you could import the Python objects in the context of a I Python notebook or sorry Django objects in the context of a I Python notebook. So then we can just start up a notebook using the command and then to make it even easier we create a batch file and stick it on the desktop so they could start and they believe it's kind of easy. So what was the users initial reaction? Ooh, you wanted to mentor a programmer, no way, right? Okay, so yes, yes, but a few weeks later the regulator, the central bank wanted historical data for mutual funds and they wanted it in a certain way. They wanted for each issuer, they want all the funds for that issuer, they want one spreadsheet, one workbook and a sheet for each issuer and that's like hundreds of about 40 something unit issuers and hundreds of different funds and they were gonna do it manually. So one of the analysts says, and they wanted it tomorrow here, so one of the analysts said, I heard you blabbing on about something that this thing could produce Excel spreadsheets, can you do something about that? So I sat down with the analysts and we came up with this, which basically did what they wanted, right? So when they saw that they became enthusiastic, right? And I put together some wrapper functions and I added some documentation to the notebook. So this is a section of this user's IPython notebook that they use all the time when they've been able to modify it. I just added the wrapper functions and I basically take some of the more complex stuff and I put it in the startup file. So, right. And now I'll talk a little bit about our Django pandas module and it's been useful in this project because we don't only use Django in the traditional way, but we try to use Django to the IPython notebook. So it has two basic modules, a IO module and a data frame module. So this is a typical Django model. This is how you use the read data frame method, which is currently the only method in the IO module. So it could, you could create a data frame with all fields, specific fields. You can give, you can create an index and you could use filters and excludes. The data frame manager is based on the pass-through manager by PM McClannan and it's in Django model utils, which is a great package by Carl J. So what you do is you overwrite the fourth manager with the Django, the data frame manager and that this gives you access to three methods. Two data frame, two time series and two pivot tables. The two data frame method does exactly what they read IO. The two time series method supports two kinds of storage models, the time series long and wide. So the wide is basically a bunch, each column in the off represents a column in the time series. A time series is a data frame index by a daytime object. In the long time series model, each, so in this structure, there's three different series and it's stored long. And when you use the two time series methods, it pulls them out and structures them as a data frame with a time series index. The two, we also, as I said before, support, but the real thing about it is, yes, you could use the, but the real is to subclass the data frame manager and build your own custom manager with the methods and business logic that you need. So when I started, my views used to look like this. I had a mix in with all the business logic to create a trade data frame. So my view classes were huge and then I realized I could just subclass my data frame manager and then I could put all the logic there. So then I just have a single line of clue that does what I had before.