 All right. Good morning. We're out to our second talk and uh next speaker is uh you've been a few times at the packet hacking village talks. Gita Diabari. Hi everyone. Thanks for coming to my presentation. My name is Gita Diabari. Uh I'm coming from Verizon working as a senior consultant engineer software developer and uh I'm going to talk about how to tune automation to avoid false positive. So I'm going to talk about techniques to design a reliable automated tool, introducing threat intelligence feeds and to ha ha to avoid false positive when we generate automated tools and feeds. Techniques to design a reliable automated tool. So there are many different reasons that automation is needed in cyber security. There are massive amount of data that needs to be analyzed on an early basis and action is needed. So uh accelerated response time, consistency, scalability, efficiency, risk reduction, simplified IR process, empowering users. They are all good reasons that we do need to have automation in cyber security. However um since we have large amount of data and all of us are very much busy um bad ideas could lead to false positive and false positive is always bad especially in cyber security. So automation needs to be done with intelligence. By intelligence I mean let's just consider creating an automated tool that is pretty much simple to use. Because uh no one is going to be interested to learn a new tool that is very much complicated and it is uh it's designed to have a lot of let's say GUI pages with lots of uh maintenance and everything that it's just like takes a long time. And think about five years you are creating an automated tool and you move on. If you want to maintain it all the time and something breaks all the time then it's not going to be usable. Make it user friendly so that uh the threat researchers who are basically the target to use the automated tool they buy it and they start using it right away. Um make it like have some documented uh does uh design framework and think about whoever is going to inherit it the framework. It might be an open source you may decide to publish it. So uh think about those that are going to use and inherit the code. Uh most of researchers in cyber security and analysts they already have a framework that they are based on it and they are actually using it. So if you want to say that yeah I have this created just like this automated tool it's pretty cool and it's just like adding these many features to uh your analyzes and it's so fast and everything. Uh they are going to hesitate using it because they already have something that they are using and switching from one um source that is actually working to a new one is going to be a bit hard. So uh make it. We have to put this on you sorry. Okay. That's okay. So uh make it somehow that you guys can hear me? That's good. That's because I can. That's much better. So uh make it somehow uh between more dependencies. First of all to other servers. Again if you want to publish it in GitHub uh people who are going to download your tool they are not going to have the same servers that you have. So um make it easily integrated to any platform that uh is being used by different analysts, by different group, by different companies. It just doesn't matter. Just make it as much as possible independent with possibility of integration. Uh when it comes to planning the automation again simplicity is the most important thing because uh imagine that you could create an automated tool, a script that could be run through cron job in background through command line or you have a pretty nice UI. Um the UI believe me it just like requires a lot of maintenance and uh there are so many patches that you have to just like upgrade. There are so many dependencies and uh analysts they are not going to like it like threat researchers. They have to just like go to this website. They have to access it and then they have to uh log into it. There are so many layers involved. So just try to make it simple cron job. If you can do it through cron job then do it through cron job. Um some of us like Python, some of us like pair, some of us like PHP. If you have a framework that is already designed let's say doing Python then stick to Python and do not just switch because you just get bored from one language and you feel that the other language actually has more um functionality for doing this tool that you're for making this tool that you're making. Make it simple avoid dependent processes because uh we all know that when we have dependent processes if just one of them breaks then everything is going to be broken. If you're busy working on other stuff they are going to ask you to fix it and you don't know where to start. What is actually broken to fix it? And if you have to really have um chain of processes then use some monitoring systems like Nagios so that it just like a start reporting you if something goes down. And if it goes down make some uh really really like easy uh script like a line of uh command. Uh if it goes down like if your MongoDB is down send me the logs restart it so that your automated tool is not going to be broken forever till you or someone actually finds out what's going on. Uh in cyber security we are actually lucky developers because we do not have QA in most of cases to QAR code so that's a good part and a bad part but we have to be our own QA especially because we are actually um having a massive amount of data that needs to be analyzed and performance testing is very important. So make sure to have the platform for yourself for uh being able to test framework actually to be able to test your own code that is very very important. I know that we all do unit testing and feature testing performance testing is needed and we do need to consider it. I just uh quickly jump to threat intelligence feeds uh that's the area that I was working in the past few years and uh by cyber uh by uh threat intelligence I mean indicator based threat intelligence feeds such as domains URLs IP addresses hashes email addresses and whatever you want to consider as an indicator. So uh threat intelligence feeds you can get it from third party feeds like there are some open sources there are some community based based on the circle of trust uh commercial ones you have to pay good money and the government based the internal feeds on also value then you get feedbacks from analysts that these are basically the uh feeds that I could pass you and indicators that I could pass you uh or you create some automated data mining tools or you use the ones that uh are available and start mining data from the data um and from the likes that you have. You just generated how many of you guys are using threat intelligence feeds? How many of you are happy with the result? None. I was expecting it actually so the reason is that it's just like poor quality control, overlapping indicators, false positives, noise it's just like massively creating noise and uh most of the companies they would rather not to use it rather than using it because it's just like generating noise and you don't know where is the source coming or if you can't trust it or not. Now um generating high quality feeds is possible believe me um the first first step that you need to take action is actually having a database don't take everything from fly from different sources make it to like a JSON or CSV format and then export it to whoever you want to have everything in your database in that case you would be able to apply quality and you have control on the indicators that you're ingesting in the database what needs to be done is do duplicating, whitelisting, filtering, scoring and aging. If you are inserting an indicator check your database and see if it exists in the database if it does then update it based on the source based like aging, scoring just like um make some changes but do not reinsert it when you want to export the feed it's very important to also do a deduplicating so that you do not have to duplicate indicators whitelisting it could be done through third party sources it could be done via through internally third party sources there are some sources that are available like external sources some of them are free some of them you have to pay some money most of them are based on the popularity of the indicators for example the websites that are being most visited by different countries they are being introduced as whitelist try to scoop up just like the top one thousand and even if you are considering the top one thousand um let's say domains still um apply some filters filtering is very important actually in feeds because imagine that the uh adult content based domains are also pretty much popular and some of them are even in the top 50 um like indicators and they are not necessarily good or like trustable they're just popular so as quantity increases consider that the possibility of having false positive also increases um you get feedback from your peers your analysts your customers uh about white uh indicators that actually are false positive immediately quarantine them have another media tool to check the results through different sources and then whitelist then immediately everything has to be done in automatic way because you cannot just like go through every single indicator uh one by one and believe me one bad indicator puts results in many hits like 100 k plus so when I said that we don't have qa and we have to be our own qa um have your framework the test framework and start testing your feeds that you're generating write a script see the number of matchless and have a statistic about the indicators that you're that that are matching a lot of hits and then quarantine them analyze them through another source and see if they are actually a false positive or they are true positive immediately watch list them so you get a data a big amount of data in indicators you apply white listing from any source internal external you have a better data it's not enough though you do need to do some scoring so imagine that you're getting um you're using this many sources like open sources different sources sources coming from your internal or external and the indicators that are being reported repeatedly through different the different sources it means that they are actually happening and they have a better chance to be malicious so their score is actually higher than the bonds that are coming from a single source so they should have a higher score a score of sources is also important um check out the results that you're getting the alerts that is being generated and check out the number of false positive that you're getting from different sources have a statistic again here a test framework is needed if you want to analyze the sources and the score of the sources that are coming the ones that are noisier and they are generating more false positive they need to be um less scored and the ones that are more reliable they need to be higher scores now the result that you're getting today may be different one week from now so again you need to have an automated way for getting these scores of especially the sources and also the common number of sources that are reporting an indicator and tune the algorithm somehow that gives you the right results this is just like an algorithm you get the indicators see if it exists in database if it doesn't exist it then a score is based on the source that you have obtained if it does exist then check the common sources the quality the quantity of the common sources and also the quality of the sources in this phase right here you can apply many different scoring methods let's say if you have access to the malware type then you can start scoring the malware type as well so you can just apply intelligence based on your needs if you have the malware type malware type could be added here so filtering um we just talked about like white listing scoring there are so many different methods that you could apply in this phase for applying more filtering to have a better filtered data um there are some tools available that you could just like check the indicator through trusted parties that they have access to many abn genes and you can get a report from them and see what is the number of positives for this particular indicator uh then you want to build up the queries be as specific as possible more attributes it means that you can have a better database of indicators and you can easily search on whatever you are interested so make sure that the critical attributes that you're having are uh indicators the indicator itself the type of it if it is a hash URL domain IP address email any anything uh unique index based on the sources so that if an alert get generated you would know that okay this is coming from this source or these sources so you have a control for the nubbing thing for the algorithm that i just mentioned you to avoid false positive um a list of sources the score that is given currently um data insertion malware type and age so when it comes to feeds you cannot say that okay i have my feed that i built like a week ago and now it's just like this is what i have and i'm going to submit it it needs to be updated ingestion should happen in real time and you need to apply the intelligence the algorithms that we just covered and export the data at least on an early basis to have a real time and almost real time i would say feeds you could be selective now you have the pool of indicators based on different criterias that you defined and attributes now you can select what you are interested and actually interest you like if you want to select the in the type of indicator to be only hash you can apply it if you want based on the malware type then you can just like extract it if you want combination of the stuff that is happening let's say the most recent indicators that have been seen through different networks you can just like extract the information in the database aging is also another thing that is very much important and needs to be considered uh you don't want to save lots of indicator in your database and then um gets indicator that is not happening in any database anymore so as you see the uh indicators and as you're ingesting the indicators in your database you need to age them properly and set it based on the malware type based on the scoring set it somehow from like 60 days or like a little bit more and then age them out if they are not being seen so you don't have like something that is not happening in the network anymore in your feeds so as i mentioned like you get your feeds and indicators from different sources from third party sources from internal sources you do apply some intelligence white listing a scoring filtering aging and then you can select your feeds based on the criteria that you're interested now imagine that you want to have high priority feeds that are actually very critical and needs to be taken care of immediately you can just like have your alerts based on these feeds like very high you can define medium priority feeds or like low confidence and with all these criteria that i just told you you are going to have a better result less false positive selecting true positive and you do need to have aging for having a current indicators and the sort of like a real-time results do you have any questions