 Hello, everyone, and welcome. My name is Eric Fransen. I'm with DataVersity, and we would like to thank you very much for joining us today for this webinar, a production of DataVersity in the SMART data webinar series. Our speaker today is Subitay Ahmad of New MENTA. Today, Subitay will be discussing applying neocortical research to streaming analytics. Just a few quick points to get us started. Due to the large number of people that we expect during these sessions, attendees are muted during our webinars. We will be collecting questions, however, in the Q&A box in the bottom right-hand corner of your screen. At some points during today's presentation, the layout of your screen may change. This is due to the type of media the presenter needs to show. Sometimes this happens, usually not, but if it does, please be aware that a drop-down navigation panel will appear at the top center of your screen, and you will still be able to access the Q&A and other modules using that panel. As always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and additional information that may come up during the webinar. This webinar today is part of a series held on the second Thursday of each month. We're glad you've joined us today, and we look forward to seeing you in subsequent months. I know a few words about our speaker. Subitay Ahmad is the VP of Research at New MENTA, a company focused on machine intelligence. New MENTA's technology, hierarchical temporal memory, or HTM, is a detailed computational framework based on principles of the brain. Subitay's experience includes computational neuroscience, machine learning, computer vision, and building real-time commercial systems. He has previously served as VP engineering at Yes Video, where he helped grow the company from a three-person start-up to a leader in automated digital media authoring. In 1997, Subitay co-founded E-Planet Interactive, a spinoff from Interval Research. E-Planet developed the Intel Play Me Too Cam, the first computer vision product developed for consumers. Subitay holds a BS in computer science from Cornell and a PhD in computer science from the University of Illinois at Urbana-Champaign. Please welcome Subitay Ahmad. Thank you, Ari. I'm really happy to be here and talk to you about some of the latest stuff that we're doing. Let me switch over to my slides. Okay, so hopefully you guys can see my title slide there. Okay, great. So some of you know, and as Eric kind of mentioned, at NUMENTA, we've been developing machine intelligence algorithms that are inspired by neuroscience and a pretty deep understanding of how the brain works. We've been doing this for about 10 years, and we've worked with dozens of customers over the years, primarily in the area of streaming analytics and the Internet of Things. In this process, we've learned quite a bit about the industry in general and streaming analytics and streaming applications in particular. So it's become sort of clear to us that these two worlds, the world of neuroscience and the world of analytics, are about to converge in a particular way. So today what I want to do is kind of share with you what I mean by that. It may not be immediately obvious where we are kind of in that process and where we're going. So I'd like to start with a customer story. Hopefully you can see this next slide. So this is a case study from one of those customers that we spoke and mentioned. And this particular customer is a pretty technically sophisticated customer and a major online retailer. And they do something pretty neat. What they do is every day they produce a forecast of their entire company's revenue. They do this on a daily basis. And the way they do that is as follows. Every night at 10 o'clock, they ship all the different departments, ship their data to a team of 10 analysts that's working kind of on the other side of the world. And this team works overnight, our time. They gather data from, you know, they look at all the data that's been sent to them, but they also put together data from the financial markets, weather reports, maybe sports, major sports events that might be going on and so on. And they put all of this together into a model. And then at 5 a.m. our time they send an email to the CEO and all the C-level execs that says, okay, today's forecast is, you know, let's say 63 million. And then the end of the day they repeat this process. This is pretty sophisticated. Most customers, most companies don't even do this today. And this forecast that they produce is very accurate. And they put a lot of time and effort into producing this number, and it's really helpful to them in running their business. But they wanted to take this one step further. So what they wanted to do with the next generation is instead of generating one number per day, they wanted to generate predictions every 15 minutes. And instead of doing a single global revenue prediction, they wanted to track all of their product categories. And they wanted to track it in all of their important geographies because there are important kind of local effects that happen that they wanted to capitalize on. And this basically would allow them, if they could do that, to react rapidly to changes that are going on in their business and, you know, regardless of which department it is in and where in the world it is. But this required them to go from one prediction every day to hundreds of thousands of predictions a day. And they were completely at a loss as to how to go about doing this. And there are basically two different problems here. One is that their data infrastructure was really cumbersome. The way they gathered the data and the kind of, you know, the way they're gathering external sources of data and so on was a fairly manual process, and this is not going to scale. And then what they really wanted to talk to us about is that the algorithm approach was completely unclear. Right now they have 10 analysts working all day long to generate a single number. And this just is not going to scale to the kind of predictions they wanted to do. And it's pretty amazing how many companies I've spoken with that have the same basic story. This is not just in finance. This kind of shift is happening in advertising where you might want to analyze click-through rates and the weightings of different, you know, advertising categories and predictive maintenance where you want to, you know, maintain and monitor large machines, expensive machines, environmental monitoring, energy prediction and so on and so on. And it's become clear to us that there's a pretty large shift happening in the industry from kind of this one slow prediction to lots and lots and lots of very fast predictions in a streaming manner. Now you could ask, well, why can't you just take the existing way of doing it, throw more hardware at it? Let's say you figure out the data infrastructure and you just throw a lot of hardware at it and just, you know, generate more predictions faster. Well, it turns out it's not as simple as that. And the basic process that machine learning departments use today is just will not scale to that. So in this next slide, this is a typical slide that you might see in a workshop or in a class as to how machine learning is done. And this is very typical of what happens today. There's a step where you get the data, gather data, you prepare it. You have to choose the algorithm properly depending on the type of forecast you want to do and the type of data. You often have to be very careful as to how you create your inputs to that system. You have to be careful about the exact training methodology used. Depending on the algorithm and depending on the data, you may want to use different types of training methodologies. And then there's a fairly complicated, almost a blackout of testing and validating your model, making sure that it's really working well on your data. And then if everything goes well, you'll deploy it. And hopefully by then, you know, the system will still be applicable. But more likely than not, you have to then repeat the process. Go through the whole thing again because now the statistics have been changing. And this cycle sometimes takes weeks to months to do. And then the previous company had, you know, optimized it to doing it. They were able to do it once a day, but they could only generate one number. And what's really going on is that basically the future of data is moving towards streaming data. Instead of doing a small number of slow predictions, where the amount of data that we're collecting is exploding, the number of sources of data is really expanding, and the frequency at which that data is coming is also increasing. And what there's a need for is to, instead of manually creating models, is to automatically create models. Instead of manually tweaking and readjusting parameters, you want systems that can continuously learn that can automatically adjust to the changes in the statistics. And all of these data streams are inherently temporal in nature. So you need techniques and algorithms that can deal with temporal data streams and sequences. And at the end of the day, you want to get back predictions or anomalies if something unusual is going on, recommendations for different actions. And as I mentioned before, there are two different components to this. One of this is that the data infrastructure that's available in most companies today is extremely batch focused, and you need a streaming data infrastructure. And equally important, you need a very different algorithm approach. The previous very batch way of creating algorithms just will not scale to this new world where streaming data is the norm, not the exception. So I'm not going to talk about the streaming data infrastructure as such. There's a lot of good work going on in terms of NoSQL databases, Storm and Spark and so on. I'm going to focus on the algorithm approaches because that's where there's been less attention paid to that. Science-based approach and how we think that will solve these problems. Let me first provide a little bit of background about Numenta since we are a little bit of an unusual company and I don't know how many of you have heard of us. So we were founded by Jeff Hawkins and Donna Dabinski back in 2005. Jeff and Donna are well known in the industry for having founded Palm Computing and Handspring. Jeff's actual interest all throughout has actually been in neuroscience and he tried to go to graduate school in neuroscience. And while running Palm, he also ran the nonprofit Redwood Center for Theoretical Neuroscience. And while he was running that, he published a book on intelligence in 2004, which has had a pretty major impact on neuroscience and machine learning researchers. In 2005, he decided to form Numenta as a for-profit company and the Redwood Center went over to Berkeley where it's still there as part of the neuroscience department in UC Berkeley. And within Numenta, we've continued to work on the algorithms and our neuroscience theory and there's been basically three different generations of our stuff. Back between 2005 and 2009, we worked on the first generation of our algorithms, which we call Hierarchical Temporal Memory. We primarily worked on computer vision systems back then and we released some applications in computer vision. Between 2009 and 2014, we worked on a very different generation, a second generation of our algorithms, which I'll go into a little bit more detail later. And these algorithms are really focused on sequences and continuous learning. We looked at a lot of different streaming data applications. We released our code as an open-source project on GitHub. We focused initially a lot on anomaly detection. More recently, since about last year, we've continued looking at streaming applications and we've started a brand-new research direction on a third generation of algorithms and we're going to look at sensory motor, inference, and the role of feedback. And I'm not going to talk about that today. I'm going to focus more on our streaming applications. There's actually a pretty rich history of incorporating big ideas from neuroscience and it has led to many of the events since in the field. And over the last 10 years, there's actually been an explosion in the field of neuroscience in terms of the data that is available and the sophistication of the experiments that the scientists are running. And there's a tremendous amount we can learn from neuroscience about the specific nature of learning and intelligence in general. I'm going to discuss a couple of the properties that will help us address the challenges in streaming data that I mentioned earlier. So here's my one kind of brain slide that I'm going to walk through. And within neuroscience, we primarily focus on the neocortex. The cortex is the majority of your brain. It's about 75% by volume. It's the center of most of your high-level thought and cognition. And what does it do? Well, at a very simplistic level, it's the best streaming analytics system out there. It receives a continuous stream of sensory data from your eyes, your ears, your skin, et cetera. It's continuously building models of that data and then outputs a stream of actions in the form of sequences of muscle commands. And we use the term hierarchical temporal memory to describe the class of algorithms that model properties of the neocortex. And there are four specific properties I want to talk about in this slide that's relevant to streaming data. One is that if you look inside the brain, it's actually organized as a hierarchy of nearly identical regions. So by hierarchy, we mean there's regions of the brain that accept direct input from sensory organs. Those regions then feed data into higher-level regions and they feed on to next higher-level regions and so on. What's really interesting is that these regions, by and large, are computing almost exactly the same learning algorithm. You can actually take visual data and feed it to the auditory cortex and it will learn visual features just fine. So there's actually a common learning algorithm that's appropriate to all the different sensory modalities. The second important thing that neuroscientists have learned is that the representation that's used, just like there's a common algorithm, there's also a common data structure. And we call this sparse distributed representations. Most of the neurons are silent. Only a few are on at any point in time, hence the term sparse. And the representations are distributed in the sense that no one neuron is critical to anything. You can have a number of neurons that fail and everything will still be fine. So the information is kind of distributed across the collection of neurons. The third important property is that all of these regions are mostly comprised of sequence memory. And this kind of makes sense. If you think about it, most of our sensory data is a sequence of information and most of the inferences we're making are dependant sequences of... sequential information. Similarly, most of our outputs is in the form of a sequence of muscle commands. And so most of the synases or the data that's stored in neurons actually has to do with sequential information, not static batch information. The fourth important property is that every region is constantly learning. There's never a point where the brain just stops learning and freezes for days or years. We are constantly learning. We can constantly learn new things as things change, we adapt. Continuous learning is fully automated. And if you step back a little bit, these are exactly the properties that we need in the new world of streaming analytics. Up Cortex, the implementation that we have currently is a very high capacity memory-based system. It's extremely good at modeling high-order temporal sequences. It makes predictions continuously and can detect anomalies. It is a continuously learning system. This algorithm has very few sensitive parameters. What that means is that you don't really have to tweak or tune the system as you change domains. Most of the parameters apply to all of the domains. And the current implementation, which is a tiny part of Cortex, runs in real-time on a laptop. This system models about 65,000 cells and hundreds of thousands of connections between cells. So I'm not going to go into detail in the algorithm itself. It's all described in a white paper and the full source code is available on Github at the URL there. Instead, I'm going to describe how we apply the algorithm to various streaming analytics applications. So we've released something called the HTM engine, which makes it extremely easy to instantiate and run a large number of HTM models. Each model is attached to a stream, such as a metric value that's changing over time. The data is then encoded into a sparse distributed representation, an SDR, which is a common data structure used in Cortex. The SDR is fed to our algorithms and a stream of predictions, anomaly scores, and so on are published in the database. And with the HTM engine, you can instantiate thousands of models on a single server. It's highly optimized and it runs all these models in parallel. And we've actually applied this architecture to several different applications. In this slide, I'm showing some of the applications that we've worked with. We've worked on applying the algorithms to detecting anomalies in servers and data structures. We've applied it to detecting anomalies in human behavior. We've applied it to financial markets. We've applied it to social media, detecting anomalies in Twitter streams, and we've even applied it to GPS and geospatial tracking. So I'm going to show some examples from a couple of these. I'm showing some screenshots of our GROC application, which does automatic anomaly detection for those who have their servers on AWS or Amazon Web Services. If you look at the chart on the left, in each of these charts, actually, the blue graph shows the actual metric value and the color graphs show the result of our anomaly detection. And red means that an anomaly was detected and green means everything is normal. So if you look at the chart on the left, it shows CPU utilization and a particular database, and you can see that there's a point in time where the CPU suddenly jumps up and then stays up. And the system detects that as an anomaly. But then because it's a continuously learning system, because the CPUs are just staying up, it automatically adjusts to that new value and after a while it's not an anomaly anymore. So there's no manual tweaking of thresholds or anything. That particular jump is an anomaly, and then shortly after that, that becomes a new normal. If you look at the middle screen, that shows data from a load balancer, and the blue value shows the latency of our load balancer on a particular website. And this data is extremely unpredictable. Most of the time the latencies are small, but every once in a while the latency jumps up to, you know, two, three seconds or whatever, and that's just normal. But in the middle you'll see that there was a period of time where the latencies were jumping up to that higher level much more frequently than normal. And the system was automatically able to detect that as an anomaly. So here's a system where it's an extremely unpredictable data stream. You can never predict the next latency you're going to get but it automatically detected an anomaly in the frequency at which the latencies are becoming slow. And here you can note that no threshold would have caught this. The actual instantaneous value of the load balancer latency was not abnormal. It's the fact that the statistics of that changed over time. The third screen on the right shows you one of my favorite examples. This has actually happened on our systems in our data center. And what happened is that this shows the CPU utilization of one of our servers. And there was a point in time where the Amazon API server actually broke down in the East Coast and stopped responding. And what happened is that Grock detected that anomaly an hour before it actually happened. So what happened is that before the API server actually failed, it started to slow down. It started to take longer servicing these requests. And the CPU utilization in our server started bouncing up and down because it would be waiting for some API request to come in and then a whole bunch would come in at once. It would have to service it. And so it would sort of bounce up and down. Now the actual instantaneous value of the CPU, say 55% or 66% or whatever, was not unusually high. There were definitely points in time where the CPU usage would jump up even to 100%. What was really unusual though was the way it was fluctuating. And this is an example of how our algorithms can detect temporal anomalies. Instantaneous value is not abnormal, but the actual behavior over time was extremely abnormal. And so because it was very sensitive to that, it was able to detect an upcoming breakdown in the API server actually an hour before anything showed up on the Amazon status boards. So these three examples, so kind of the unique value of HTM algorithms, particularly with streaming analytics. All of these models are created automatically. A user needs to know nothing about machine learning or HTM algorithms. You can configure hundreds of models in minutes. The systems are continuously learning as soon as they're brought up, learning continuously, and they automatically adapt to changes, as you can see on the left screen there. And it can detect very sophisticated temporal anomalies. Application is the same idea except we've changed the data streams. Instead of feeding it IT data or data from data centers, we're feeding it financial data. What you see on the left is a screenshot from our mobile application, HTM for stocks. It's actually available on the Google Play Store right now if you have an Android phone you can download this and use it. And what this application does is it continuously monitors the top 200 stocks and the most anomalous stocks are showed at the top there. And the way we determined that is we're monitoring three different metrics per stock. We're looking at the stock price and its fluctuations over time. We're looking at the trading volume and we're also looking at the volume of Twitter chatter about that stock. And if more than one of these metrics is anomalous at any point in time, they show up right at the top as being anomalous in both stock and Twitter feed. And then if it's just anomalous, if it's just the stock activity is anomalous, then it shows up under a lower category there. So this system will automatically detect anomalies in stock behavior as well as Twitter behavior. And you can see as shown in the middle screenshot there, the Twitter chatter often detects something unusual or you can detect something unusual from the Twitter chatter a little before it actually shows up in the stock market. And if you click on that, you can actually look at the actual tweets that are causing those anomalies. So this is a very interesting application. Again, it's the exact same idea, same engine underlying it. Just that the metrics that we're feeding it are totally different. GPS data as well. And what we've done is we can automatically detect anomalies in geospatial tracking data. And we can feed the HMM engine a stream of GPS data as well as the velocity of any object. And you can imagine using this to track a fleet of trucks, ships, airplanes. You can even imagine tracking people that way. I actually want this for my kid so I can know if something unusual is happening. If he goes off somewhere that he shouldn't be going, I want to be notified of that. Now, the Cortex doesn't receive anything like GPS data. So how can we actually take GPS data and feed it to a cortical algorithm? Well, the basic trick is that we figured out how to convert GPS coordinates into a sparse distributed representation. So this is a sparse high-dimensional representation that has all of the properties that we expect from SDRs. And then after the input is encoded as an SDR, the learning algorithm is completely agnostic to it. It's now in this common kind of data format and now the algorithm can just operate on it. It knows nothing about whether it's in GPS data or stock data or anything else. So we actually built a prototype system of this. We gave this to one of our employees and we had the system kind of, you know, continuously monitoring his GPS coordinates throughout, you know, over the course of several weeks. And what this screen shows is his trace as he went, as he commuted daily. And now this is a continuously learning system. So the very first day it started using it, the system was learning his traces. Now in the beginning, everything is unusual because it has no prior behavior. So you can see that everything is red here in this trace, that there are still points at which things are unusual. So you can see there's some, you know, some red there. There's, you know, there's some yellow as well. We're just somewhat anomalous. And this just had to do with variations in traffic and so on. It had pretty much learned his normal behavior and everything is green. If there's an unusual traffic slowdown, then it could still become red in the middle there. But by and large, it kind of learned his normal route and his normal behavior. And now we can look at over time what sort of anomalies does it detect. So this slide shows two different anomalies. On the left, you can see that he deviated from his normal commute. He, you know, went on down a side street that he normally doesn't go on, and the system instantly detected that as an anomaly. And we call that a spatial anomaly because his actual spatial locations were different from normal. The right-hand screen actually shows another type of anomaly, which is a temporal anomaly. So there he took a U-turn and went back along the same road that he normally does. It's just that that particular sequence was extremely unusual. He never normally takes a U-turn there. So we were able to instantly detect that as an anomaly. So even though the exact GPS coordinates, the GPS coordinates were identical, the temporal behavior of those coordinates was unusual and we were able to detect that as an anomaly. This screen shows a couple of other examples. There are sometimes when he took multiple paths and over time the system can learn both of those as normal. So whether he takes the top on the left screen, whether he takes the top path or the lower path, that's totally fine. However, one day he went unusually fast on one of those paths. And so that was an unusual change in speed and that was automatically detected as an anomaly as well. So this is another kind of temporal anomaly that can be detected. So all in all, we're able to detect really interesting geospatial anomalies without making any changes to the algorithm. Just by changing the way we encode our data into an SDR, we're now able to handle different sensory modalities. Interesting thing here is that all of these applications use the exact same code base. They all use the exact same learning algorithm. They actually use the same learning parameters. None of the learning parameters have to be tuned in order to apply it to one algorithm versus the other. And that is key to automation in these scenarios. And there is very wide applicability across many different types of sensors, as you saw, whether it's data monitoring a data center, monitoring social media, feed, or geospatial coordinates. As long as we can convert the data into an SDR into the common data structure, then we can now apply the algorithms to that modality. It has very wide applicability. Now, is this stuff any good? Does it actually work well? Can we quantify exactly how accurate it is on different data? So this is actually a fairly difficult task. It turns out that when you look at benchmarking, streaming, and anomaly detection, there aren't any existing benchmarks that really work with real data and contain real data, and contain the characteristics that we expect that we think are important for streaming analytics. So most of the traditional benchmarks in anomaly detection, they don't incorporate time. So in a streaming analytics application, the earlier you detect the anomaly, the better it is. As you saw in the case where the API server went down, if you can detect it an hour before the failure happens, that's pretty valuable. Rather than detecting, okay, the API server failed. So the earlier you detect an anomaly, the more valuable it is. Most of the benchmarks are pretty batch focused. They allow you to iterate over the data multiple times, whereas in a real-time scenario, you can't look ahead. You just have the data that you're getting right now, and the past data, and you have to make your predictions or detect your anomalies right then and there. And then very few anomaly detection benchmarks actually contain real-world labeled anomalies, particularly time series data. We look pretty hard for that. It's hard to find that. So we went ahead and created our own benchmark. Because we had worked with a number of customers, we had a bunch of data, and some of that data we're allowed to share. So we created our own benchmark, which we called a Nementa anomaly benchmark, or NAB for short. We implemented a particular scoring methodology that favors early detection. So if you have two algorithms that detect the same anomaly, the one that detected it earlier is going to get a higher score than the one which detected it later. It incorporates continuous learning, so the notion of changing statistics and then adapting to a new normal baseline. We have a number of real-world data streams that are labeled, some of them by our customers, some of them by ourselves. You can see on the right an example of one of the data streams. This is data from a machine, a large industrial machine, and the data shows the temperature of the machine over time. And you can see that there are three labeled anomalies there. The one on the left was where the machine actually was brought down from maintenance. The red dot on the extreme right was an actual failure of the machine. And the red dot in the middle was actually where a human, after doing analysis, saw the first sign of some unusual behavior that then led to the failure later on. So there's two obvious anomalies on the left and right and one not as obvious anomaly in the middle. In this benchmark, we also have the notion of different application profiles. And what that means is that the ratio of false positives versus false negatives and the importance of a false positive versus a false negative kind of changes depending on the application. And you can imagine that, let's say in a medical scenario, it may be okay to have a few false positives, but you definitely don't want to miss a really bad event. But there might be other cases, and the IT is actually one of those where you don't want to get too many false positives. It's okay to occasionally miss a server that crashes because you have lots of servers and the systems are generally robust. So what you really want to avoid is having too many false positives. So we have different application profiles that correspond to that. So we've tested the HTM against three different open source algorithms. And we were pretty happy with the results actually. We were not expecting this. So this chart shows the scores with HTM against several other algorithms. If you just look at the standard column that's on the left there, you can see that the HTM performed quite a bit better than some of the other open source algorithms that are out there. And why is this? We looked into this a little bit more. And this chart actually shows an example of why the HTM algorithms scored so much better. So overall the HTM tends to detect more anomalies and has fewer false positives. But more interestingly, it actually detects anomalies a lot earlier than the other algorithms. And the reason for this is that we are looking at, the HTM is trying to detect temporal sequences of looking at anomalies and temporal behavior. And so even though the actual value itself may not be out of range, the particular sequence of values may be very unusual. And this chart here shows one example of that. So here's an example of a failure in this machine temperature sensor data that I was showing. Some of them detected the anomaly, but the HTM actually detected it much earlier than the other algorithms. Okay, so just one last slide before I wrap up. Our basic business model is a licensing model. We licensed our technology in various different ways. So first of all, we have an open source version of our code. So all of our code including our learning algorithms and our learning parameters and the HTM engine, everything is available, an open source and they're an AGPL license. And this is available on GitHub. It's a pretty active GitHub project. We have over 3,000 GitHub followers. There are over 160 people who have contributed code back to it. So it's one of the more popular machine learning projects on GitHub. We have a very active kind of mailing list. I encourage you to get involved in that if you're a developer and want to learn more about this. Everything we do is in the open source. We also have a number of different corporate partners. I've listed some of the interesting ones here. We have a partnership with a startup called Cortical IO. And they're actually trying to apply the HTM algorithms to natural language processing. So you can think of words as being streaming data as well. If you get a sequence of words and you want to be able to do various types of analytics on that. We have a very rich partnership with IBM. So IBM is working very closely with us on core HTM research. They're also working with us to create very novel hardware architectures for HTMs. We know that what we're modeling today is a very tiny slice of the Cortex. And in order to really scale up and detect and handle a lot more complex problems, at some point we are going to need hardware, new hardware architecture. So IBM is working closely with us on that. We also have a partnership with Avic partners. So this is a brand new startup that just started a few weeks ago. And they have licensed the GROC application that I showed earlier. And they are going to take over and productize that and put a lot of effort behind that for IT and doing automatic analytics and anomaly detection for the IT scenario. So you can look at groxtream.com to get more information on that. To summarize, hope I've convinced you that the future of data is streaming data. The velocity of the sensors that is increasing rapidly as well as the number of sensors is increasing. We're going to have to handle a world where we have to create a very large number of models, a massive number of models, and where the statistics of the data can change at any point in time. And the problem with today's methodology is that the existing batch algorithms just cannot scale. And there's fundamental limitations in the methodology and the process by which these algorithms are created today that just will not apply. We think that the brain does this already automatically and that understanding the brain and understanding the cortex in particular can show the way. And the brain is sort of an existence proof that you can have systems that can automatically create models and learn new things, can continuously learn, and can model very sophisticated temporal streams. We've created an initial implementation of this in the form of HTM learning algorithms that implement some of these principles, and then we can demonstrate working applications today. Thank you very much. I'm happy to take questions through the Q&A forum here and feel free to email me at that email address as well. Subatayamad, thank you so much. Really fascinating work that you're doing. Before we dive into the questions, I'm going to give people a couple minutes to type those in. I just want to let people know too if you would like to mark your calendars. The next smart data webinar will be on October 8th. Our topic will be machine learning techniques for analyzing unstructured business data with Nick Pandar of SkyTree. And we do have a couple of questions coming in already. So, Subatayamad, in the Twitter analysis for stocks, the example that you showed, is natural language processing used at all? And if so, how is the text converted into an SDR? Okay, that's a great question. So, in the application I showed there, which is available today from Google Play, natural language processing is not used. What we are doing is monitoring the frequency which a particular stock is mentioned in every five minutes. So, if someone, you know, if you get 100 mentions of IBM in five minutes, the value of that will be 100. And we're taking that numerical stream and feeding that to the HTM and doing anomaly detection. Cortical IO, though, is looking at applying natural language processing and HTMs to Twitter streams. And what they're doing is really interesting. They're looking at not the number of mentions, but whether the underlying semantics of what's being discussed is changed significantly. So, if you look at news media, for example, if the meaning of what's being tweeted about at any particular point in time changes significantly, then maybe there's a shift in the type of news that's happening right now, and that could be treated as an anomaly. So, they're looking at applying natural language processing with HTMs to social media, but that's not released as a product yet. Okay. Can this technology be applied to data quality? I'm assuming the questioner means to monitoring and finding the anomalies that you might in data quality. Yeah. So, I didn't talk about that here, but some of our customers in the past have used it for that. So, one example is if you look at energy and the proliferation of smart meters. So, more and more buildings are being instrumented with smart meters where your energy is being monitored and every five minutes, every 15 minutes you get your average energy use which is sent back to some central location. And one of the problems there is that these meters are inherently unreliable and there's various kind of data quality issues for that. So, you can apply the anomaly detection techniques to that because some of the characteristics of the data quality problems are actually not easy to detect. Sometimes, you know, the meter will just go offline and that's easy to detect. But sometimes, the values actually just fluctuate really fast in a completely unnatural way. And so, that kind of anomaly is a temporal anomaly and you'd like to be able to detect that automatically. And when you have, you know, hundreds of thousands of millions of smart meters out there, it's pretty impractical to manually be overseeing all that. So, having an automated anomaly detection technique is pretty useful. Picking up on that, the next question addresses anomaly detection. Most of what you discussed was anomaly detection which, of course, is a binary result. But your initial example was a model that yielded values. How does HTM tie into predictive analytics yielding values? Yeah, that's a great question. So, prediction is kind of inherent to what's going on in the cortex all the time. We're constantly making predictions and seeing whether our predictions have been met or not. And the HTM algorithms actually inherently do prediction. And for us, anomaly detection is just the flip side of prediction. So, if you predicted something and it does not happen, instead what happens is very different than it's an anomaly. We have applied the HTM to lots of different predictions, lots of different prediction problems. So, we've looked at monitoring energy usage. We've looked at predicting energy usage. We've looked at the revenue forecasting example that I showed earlier. We've looked at prediction of advertiser click-true rates and so on. The reason I focus mostly on anomaly detection is actually the industry itself does not seem to be really ready yet to incorporate prediction into their day-to-day business process. And this was a very interesting lesson that we learned. If you have real-time predictions, it's great, but people or companies don't know how to react to that. So, if you suddenly predict that your revenue forecast for a particular department is going to go through the roof for the next three hours, how do you react to that? What processes do you need to actually deal with that? The other part of it is that a lot of the streaming data infrastructure for the interesting prediction problems weren't actually ready yet. So, we haven't had much success in actually deploying streaming prediction applications yet. We believe the industry will be more ready for that in a few years, but today the real uptick has been an anomaly detection. Places like IT and finance and so on, they want to do anomaly detection, but their infrastructure is actually a lot better in much better shape right now. The next question is, how many layers of neocortical processing does HTM simulate? Okay. So, in the cortex, there's levels and there's layers, and they're actually two separate things. There's hierarchical levels that I mentioned earlier where you have a region, one region of the brain sending information to another region of the brain. And when you look at the levels of processing and the hierarchical levels in our computer vision system, we actually had three levels of hierarchy in there. And most recently, the analytic stuff we showed actually only has one level of hierarchy, and that's been sufficient for the applications that I showed you. Now, in the cortex, another structure, which I didn't really get into, which is the concept of layers, within a level, the cortex has five or six different layers of cells, and it looks like each of the layers is actually... the layers are kind of hooked up in this little microcircuit, and each layer is responsible for different functions that are repeated throughout all the levels. So things like sensory motor inference and generating behavior and attention and so on are all implemented in different layers, and that's an active area of research for us. We've got models of two or three of those layers and we're working on more of them, but those are not deployed in the HCM engine application that I showed. So to summarize, we've done up to three different levels of hierarchical levels in HCMs before, and we're working on kind of the layers, the laminar structure of the cortex right now as a research effort. Have you tried applying the solution to problems where events may occur only very rarely, say once every couple of years? And following on to that, more generally, are there limits to the data frequency? Yeah, so the machine temperature data I showed you... I didn't really point this out, but that one, the time scale was over, I think half a year or a year, and there were three anomalies in that time frame, so that's an example of kind of the level of... the most rare anomalies that we've worked on. By definition, anomalies are going to be rare events and our system kind of guarantees that you're not going to get more than... the number of false positives will be very limited and you're not going to get too many anomalies. In most of the real-time scenarios, we deal with the frequency of the data is sort of around every five minutes or every half an hour or so on, and that's kind of a sweet spot for the algorithms. We've gone as fast as multiple times a second and we've gone as slow as sort of once a day, but slower than that, even once a day is kind of hard to pick up on the repetitive patterns because you'll need several years of data to really see a lot of the patterns. Most of our stuff works really well at kind of one data point a minute or one data point a second to about one data point an hour. That's kind of the sweet spot for the cortical algorithms. Can it be applied to real-time recommendation? Yes. So we've looked at applying it to websites where you can look at a news website, for example, and you look at... if you go to a news website, you'll see a bunch of links that correspond to news articles that you may be interested in. Typically, those are dependent... those are computed based on overall kind of popularity of articles. What you could do with an HTM is actually fine-tune those recommendations based on your particular browsing behavior. So if you happen to go to Forbes.com or something and you're interested in looking at the technology section, based on your particular behavior, it might automatically figure out that people who read your section might also be interested in some of these other articles. So you can do recommendations that are very tuned to your particular sequence of behavior. And we actually did a project with Forbes.com. It was a proof of concept that showed that if you apply HTMs, you can improve the prediction of which articles the user is going to click on significantly. So going from about a 20% success rate, you can get to about a 50% or 60% success rate of predicting what articles a person might be interested in. Again, the issue, though, is data infrastructure and incorporating these things in real-time. That's where a lot of the challenges are in real-time recommendation. Sure. We have a couple of questions here about SDRs specifically. So first is how many variables can fit in an SDR? Yeah. So in the brain, SDRs can handle a very large number of independent inputs. And if you think about a vision, for example, each pixel is kind of its own independent sensor reading. And it's reading a grayscale value or a color value. And you can have an SDR that represents an entire image or your processing of an image. What happens in the brain, though, when you have a very large number of variables is that there is a topology there. So neurons that are looking at a particular pixel tend to look at neighboring pixels and they incorporate that information. So the SDR that there is sort of localized to that particular part of the image. So that's a technique that the brain does to actually handle a very large number of variables into a single kind of SDR representation. And we've used that same technique in dealing with vision and cortical IO is using that same technique to deal with language where each SDR is representing the meanings of words. And you have thousands and thousands of different meanings that words can have. And by incorporating a topology, you can kind of efficiently represent all of those variables. Can any time-stamped data be converted into SDR? That's an interesting question. So the general answer is yes. Any time-stamped data can be converted into an SDR. But to really exploit the cortical algorithms, you really need to have time series of data that's kind of logically related. If you have a time-stamped, but one data point is the next data point, then even though you can convert it to an SDR, the cortical algorithms won't really know how to deal with that. And you can think about, as an analogy, imagine you're looking at watching a movie. Maybe every frame of the movie can be converted into an SDR. But if you randomly shuffle all the frames, then it's going to be garbage to you. You really need to see things in a logical flow. So having a logical time series data stream is pretty important for the cortical algorithms. This is actually quite the opposite of traditional machine learning algorithms. The traditional machine learning algorithms don't care about that time order. In fact, all of the techniques assume there is no time order and that the data points are independently distributed. And so those techniques are really good in situations where there is no inherent time series or time order. And the cortical algorithms are really good for data where there are inherent sequences and inherent flow to the data. Subitay Ahmad from New Menta, thank you so much for this great presentation and your thoughtful answers to our questions. I'm afraid that is all the time we have for today. Just to remind our listeners, we will be posting the recorded webinar and the slides to dataversity.net within two business days. And I will send out a follow-up email to all of those joining us today to let you know how to access that material. The next smart data webinar again will be on Thursday, October 8th. We look forward to seeing you then. Thanks again for attending today's webinar. And thanks again to our speaker Subitay. And I hope everyone has a great day. Thank you, Arikan. Thank you, everyone, for the great questions.