 Hi, I am Seth Grover. I'm the maintainer of Malcolm the network traffic analysis tool suite Network traffic analysis is all about Getting to the important stuff as quickly as possible There's a lot of open source and proprietary tools out there for analyzing raw packet capture or p-cap files Wireshark network miner grass Marlin But analyzing p-cap sets that are large or from complex networks with many tools is difficult Because often tools that handle raw p-cap data Struggle to handle packet capture files that are you know large larger than maybe a few hundred megabytes or or gigabyte Or two depending on your system resources So today we're going to talk about Malcolm Tool that was developed and is developed at the Idaho National Lab with the support of the United States Department of Homeland Security CISA You may be familiar with some or all of the tools that make up Malcolm because they're open source and they're all generally in use in the in the security and network traffic analysis community But what Malcolm provides is a framework of interconnectivity for these tools Which which streamlines network traffic analysis and helps you bring all that important stuff to the foreground as painlessly as possible So that's what we're going to talk about today in this video is using Malcolm to gain insight into both link layer and application layer network traffic Before we jump into our discussion about Malcolm and some of its primary components like Zeke and the elastic stack and Archemy let's take a minute and talk about Intrusion detection systems so we can get an understanding of what these tools do and how they fit into the threat detection landscape When talking about intrusion detection systems, usually you're going to be talking about tools in one of two categories host intrusion detection systems or HIDS Utilize a native agent that runs locally on individual hosts and endpoints on the network and these agents monitor You know, not only maybe network traffic and stuff at the device Nick level But also track modifications to system files or monitor user authentication events configuration changes and and report these events to a central manager for alerting and reporting Host intrusion detection systems is not what we're going to be talking about today Is it's not really the main focus of Malcolm? there are plugins that you can use to get host data into Malcolm and maybe at some future point we'll put together a Video that'll that'll instruct you how to do that But for now we're going to be talking about the other category of IDS, which is network intrusion detection systems or NIDS Network intrusion detection systems are generally passive and out-of-band programs or devices that capture and analyze network traffic as strategic points in your network In order to monitor traffic among devices in the network or between devices on the network and the outside world So a couple of different ways that we can do this the monitoring and analysis can be done concurrently in other words Analyzing the traffic and forwarding it along Sorry capturing the traffic and then forwarding it along for analysis as it's captured or The other way is the network traffic can be analyzed After it's been captured in the past, right? We can capture it first with whatever set of tools we want Matt, we're going to talk mostly about the latter today just for our examples I'll be importing pcap data and then analyzing that pcap data rather than using a live network sensor to capture and forward that data the Malcolm project does include a Linux distribution called hedgehog Linux that's kind of a stripped-down Debian based Linux that has these capture tools and an easy configuration for those capture tools to set them up and capture and And forward to Malcolm There's another one of the videos on the Malcolm network traffic analysis YouTube channel that talks about how to set up hedgehog and Malcolm and set up that forwarding together But today we're going to talk about more of the I've already captured. I've got some pcap and I want to upload and analyze it with Malcolm One other good important point about IDS is it's generally passive meaning that it shouldn't alter the network traffic itself as a side effect of its analysis There are systems out there that actively drop suspicious network traffic and those are called intrusion prevention systems or IPS instead of intrusion detection systems or IDS And it's important particularly in networks with critical infrastructure. One of the focuses of Malcolm is Industrial control systems or or critical infrastructure networks with OT protocols while it's perfectly suitable for using in a purely IT network. That's fine But particularly on OT networks, it's important that we don't go knocking things over As far as network services go because because that may be critical infrastructure, right? And so it's important that IDS is done out of band and and in a way that won't influence the network traffic itself So let's talk about some of the different approaches to IDS Each has its its strengths and weaknesses. I guess you could say First there's signature based Detection and this is what you're familiar to in the context of like a antivirus An antivirus program, you know, it has or or a malware program that recognizes malware based on some pattern or signature In the malware itself and then you know, it has a list of predefined patterns that it's looking for and When it finds a file or traffic or whatever that matches one of those patterns It flags an alert as a bit as malicious behavior While signature based IDS is great for known attacks and it's usually efficient as far as resource utilization goes It's not generally effective at detecting novel attacks Because it's like the new attacks the zero days or whatever you want to call them They're the unknown unknowns and you you can't write a signature for something that you don't know about generally Um the the second camp there for IDS is statistical anomaly based detection This is this is the machine learning right machine learning is Either math cleverly disguised as magic or magic cleverly disguised as math. I'm not exactly sure which it's cool but The basic idea is that it's creating a baseline for trusted network behavior and then it will compare new behavior against that baseline using this magic or math and this technique can be detect it can be effective in detecting anomalies or You know novel attacks things that you're not necessarily looking for but You have to it can suffer from high false positive rates, especially if you don't have the The baseline there to to really define what is normal which can be difficult in a network assessment Particularly if you're coming in from the point of view of like I don't necessarily maybe have this long-term This long-term network traffic. I want to analyze. I just have some p-caps or whatever, right? It can be computationally expensive as well as far as resources for the Anomaly the machine learning kind of anomaly stuff certainly has its place in the threat detection landscape and There are tools built into Malcolm once we won't really be talking about today necessarily but maybe in a future video we'll do one for that as well where we can set up anomaly detectors and and Start to get that baseline established but Today we'll be talking more about. I think the third camp which is where I feel is a nice kind of blend of of Knowing what I'm looking for versus trying to flag interesting things that I might not be looking for and that is stateful protocol analysis Stateful protocol analysis based detection uses knowledge of network protocols to look for deviations from profiles of generally accepted definitions of normal activity in other words, you know, I know what normal HTTP traffic normally looks like You know, usually I see these error codes and these kinds of requests or you know, I know what SMB file shares do most of the time And so if I can kind of categorize based on my knowledge of that network protocol We can programs can then say hey, you know Here's a summary of what we're doing in this particular protocol or that particular protocol and it helps us recognize patterns as we look at that data So what type of intrusions or attacks might we hope to uncover using an intrusion detection system? Let's talk about that a scanning attack first a scanning attack is used to Detect network topology to assimilate information about a system or a network being attacked By attempting connections to a range of IP addresses within a network or scanning for open ports Which would correspond to corresponding to responding services on those hosts an attacker puts together a map of the topology of your network The types of network traffic that are allowed through a firewall what active hosts are on the network the operating system Kernel and software versions running on the exposed services This information can then be used to launch attacks aimed at specific vulnerabilities with specific exploits and A good idea should be able to notice these kinds of Attacks these kinds of accesses because they might possibly be seen as a series of sequential connections from one host to a range of IP addresses reports or You know a brute force attempt to to log on to some exposed service And then you know can alert on that host scan or that port scan that took place a Denial of service attack works by flooding a network or host with an overwhelming number of connections or requests So this could be something as simple as sending a large number of ping packets Which is like a ping flood or by forging the initiation of a TCP connection a sin flood causing the host to be Unable to respond to legitimate connections intruding detection systems are usually good at categorizing traffic from or to a particular host or service and so Can often track things like connection state for various network protocols or at least you know the number of connections attempts Which makes identifying a denial of service? Attack pretty easy. It's pretty it's pretty obvious I guess when it's happening that that your network is overwhelmed or your host is overwhelmed Just based on the peer volume And then finally a penetration attack this is any type of attack which gives an unauthorized attack or the ability to actually access system resources privileges or data by Exploiting a misconfigured system or a software flaw These types of attacks are more difficult to identify because often they they look like legitimate traffic, right? Or they may be Exploiting like I said a misconfiguration or or some unknown loophole to get in and get a foothold in the system And so it's at that point once they get a foothold in a system and can maybe pivot to another area in the system It's easier for them to cover their tracks for future communications and mask their commands as normal network traffic So this is particularly true with custom targeted zero-day exploits or Expoits for which the attack vector was not previously known And hence there's not signatures to detect them However, IDS is can still be valuable in identifying penetration attacks when they are protocol aware Allowing analysts to recognize changes in patterns of behavior or unusual operations in the context of those protocols within normal network traffic So Zeke we're gonna talk about Zeke for a little bit Zeke which was formerly known as bro Is one of the two? PCAP analyzing engines used by Malcolm to generate metadata about network traffic Which metadata is indexed and made searchable through Malcolm's visualization tools So before we get into what those tools are let's do a little more laying the groundwork of what Zeke's Capabilities are and what Zeke is to better understand what it offers Analysts as a Malcolm data source So where does Zeke come into the picture? What is Zeke? Sometimes it's referred to as Zeke IDS and it incorporates some techniques from the previous slide but Zeke is more than just a Intrusion detection system really it's it's it's really great. It's an extensible open source passive network analysis framework and It does packet capture traffic inspection intrusion detection Records flow logs. It can even be used has a robust scripting and data structure framework for log enrichments or for Writing your own, you know creating your own logs or even writing your own network analyzers So if I had to categorize Zeke into one of the three detection method categories from from our previous slide I'd categorize it in the Stateful protocol analysis detection camp Zeke's network traffic parsers examine network traffic at the application layer and then Reports on the behaviors of hosts that are communicating over those protocols So these logs can then be used to do a more in-depth Manual or automated analysis as we'll see throughout our discussion of Malcolm today I would say that Zeke as well is fundamentally different from other IDS is in that it goes beyond pure signature matching in favor of Analyzing that application layer behavior of the host themselves. Although it does have signature matching capabilities similar to Yara or snort Generally, it's focused as more on Just a parsing network traffic at the application layer Zeke features can be combined with Combined in powerful ways to provide insight into network traffic With Zeke logs and network analysis Can include content extraction for example extracting Exfiltrated files from PCAPs for further examination Behavior analysis and session correlation as Zeke is is highly stateful Extensively tracking application layer network state it can be used to determine what else took place During a session or during the communication between two hosts or or maybe what preceded or followed a suspicious event And then like I said Zeke is extensible Support for uncommon protocols for example OT protocols can be added via scripts and plug-in architecture as as I work a lot with analysts that deal as I said earlier in industrial control systems or Critical infrastructure kinds of networks a lot of these protocols are not commonly seen You know on the internet as a whole and so a lot of the times office health tools that you'll find for dealing with don't include dealing with this kind of traffic and so One of the things that Malcolm has done with its use of Zeke is is Add a bunch more support for OT protocols for ICS protocols and And then allowing us to kind of bring that metadata that's associated with those communications to the forefront Alongside the you know more common IT protocols So Zeke is really Powerful tool. It's commonly used in network traffic analysis, but it does you know have its own set of hurdles And in a minute, we'll hopefully see how Malcolm You'll hopefully see how Malcolm helps overcome those hurdles Well Zeke For example, you know Zeke is going to give us that metadata, but it can be difficult for for someone who is Not, you know an expert traffic Analyst an expert network analyst to get back from the Zeke data to the original packet payload And and you know if you do need to get into the to the actual low-level payload It can be difficult to like go from Zeke to say wire shark and open that up With just the tools that you might be used to using The one of the other things right Zeke it Generates a bunch of flat text log files right a bunch of delimited or JSON log files And if you've got gigabytes of text files There you know, there are some tools to manipulate those and stuff, but again if you're not maybe a really well-versed on the command line or If you don't already have a tool set in place to process those text files It can be difficult to run Zeke and then know what am I supposed to do with all of this? the all this logs all these logs So we're gonna talk today about Zeke mostly in its context as a component of Malcolm To perform post capture network analysis Against PCAP files that we've already gathered previously So let's talk about the kinds of logs that Zeke generates so that when we get to looking at that data, we can recognize what we're seeing first the Kind of the backbone of Zeke traffic analysis is con.log Con stands for connection and it's that's what it is. It's network session tracking Con.log is the backbone of a Zeke analyst Of a Zeke analysis because each line of con.log each record in con.log represents a unique network session, which is identified by a four tuple consisting of Originating or source you can kind of think of it like that source originating IP and port and responding or destination IP and port so Originating IP port destination IP and port makes up that four tuple that is a unique identifier for that session So each connection in con.log each session is assigned a random ish 18 character unique identifier or UID and So a particular sessions UID from con.log Will be referenced in any other Zeke log files generated from that same network traffic so for example in the case of a an HTTP session between a web browser and a website there may be one line Representing the entire session in con.log because that HTTP session is a TCP connection And it's from an IP and a port to an IP and a port and then during the course of that connection There could be many HTTP operations Gits and posts and Responses and requests and each of those Actions in the HTTP section each of those operations will be represented in HTTP.log So you may have one line in con.log with this UID and then that UID will tie to many lines in HTTP.log because those requests belong to the same session and so you can use that con.log UID that Zeke UID to to find out like what happened across all of my Network traffic in the context of this session So there are a lot also besides con.log of Protocol specific log files that Zeke will generate So taking note of which log files are generated from a network trace can can give you insight into what's In your network Even before you begin analyzing the files contents for example if I see an SSH log in a network Maybe where I wasn't I didn't know I had a SSH going on or something like that, right? That can that can be like hey I better go look at that because I don't I don't know what that is and so Even just starting at like what log files do I have what did Zeke generate today on this peak app? Can give you a good idea of just what traffic is in your network? Zeke has a really powerful file analysis engine that attempts to detect and identify when file transfers occur so In other words any time a file is transferred across one of the protocols that Zeke can understand Zeke can recognize hey, this is a file transfer. This is a file being uploaded or downloaded across your network using you know HTTP or FTP or SMB or IRC or SMTP email there's various different protocols that are supported in Zeke's file analysis engine Similar to connections each file in files.log is assigned a random file unique identifier or FUID That can be referenced in other log files so very similar to our previous example a file is transferred by an HTTP request and response and That file will generate a line in files.log with a unique FUID and then That could be referenced in the HTTP.log to see the details of that connection, right? So use that UID just in the same way that you use the UID and con.log to reference the session across the entire Analysis data set use that files.log FUID to reference that file transfer across the various different protocols that it may have been involved in entries and files.log can also be linked to the sessions during which they were transferred and In con.log by the connection UID So this these two files fields that con.uid and that files.fuid Can be really really important for getting context about what's going on in a network session Two specific types of files that Zeke breaks out into their own log files is pe.log pe stands for portable executable and this contains entries about File transfers that were done for portable executable files. In other words, you know windows.exe or a Linux elf file That might be specifically of interest as a network security analyst because oftentimes, you know You may be like, hey, you know, you we're not supposed to be downloading executable files or whatever and running them on our on these systems on these endpoints There's also x509.log which contains information about x509 Formatted public key signatures or public key certificates like you would see in an SSL or TLS session One really important log file for Zeke and and almost kind of where I would start if I was going to look at a Zeke log Collection of Zeke logs would be notice.log Notice.log is Zeke's concept of an alarm a big red flag a way to draw extra attention to an event So notices can be generated from any other Zeke script or protocol as it's processing traffic Zeke currently implements. I think Maybe 50 or so notices kind of in its default configuration and then Malcolm adds several more ranging from You know brute force SSH login attempts to SQL injection attacks to expired SSL certificates and then some of the ones that Malcolm adds for Recent CVEs that you may be aware of or have heard of in the news in the last couple of years Events that are Categorized according to the MITRE attack framework So when you see the presence of a notice.log, you know, we'll go check that out and say What's going on here that the Zeke thinks this needs to raise this red flag that I that I should look at this traffic And we'll give some examples of those as we get into the demo kinds of stuff here which My plan is to sprinkle throughout this video. So as you're watching this video on YouTube or You know, wherever you got it That's that's kind of my plan is there'll be there'll be some demo stuff in here as well a Weird.log weird is a good place to begin when looking for anomalous network traffic from the point of view of anomalous and that the protocol itself is not being is Not behaving the way that you expect it to you can kind of some people kind of look at weird is like a notice.log light like stuff that's kind of strange or maybe not You know a little bit out of the ordinary but It couldn't be you know, it might be nothing And so, you know, it's something to look into but also you need to understand your network in order to You know, maybe read out some of the false positives or things that you you know Actually do expect to see your network for example, you know what? Because what's weird in one network might be perfectly normal in another For example, you know in some networks that I've been in and seen where old serial protocols are being encapsulated over ethernet that that may be very common in an aging industrial control systems network that's using you know Serial Modbus or something like that and then You would see from Zeke a weird.log that's like hey, I've got non IP traffic over ethernet and I Look at that in an OT network, and I'm like yeah, you know, this is pretty much what I expect That's not really that weird, but in maybe in your IT network or in your corporate side of your network That would be unusual right so I don't one man's trash is another man's treasure or whatever But you just kind of need to get familiar with your network enough to know really what is weird and what's not Signatures.log is used to flag hits from Zeke signature-based engine and is also used by Malcolm to log hits from file scanning engines on Transferred files extracted by Zeke and we'll talk more about that later And then this one we won't spend a ton of time on this but At a configurable interval Defaulting to one day Zeke will dump summary lists of various entities end points Services whatever that it's seen over the course of that period. So that might include SSL certificates MAC addresses Hosts that have performed TCP handshakes Modbus servers and clients and TCP I said TCP services, but um, yeah, like a you know, what services what servers are on your network Known host log along with con.log Can be an essential part of Zeke for building a network diagram or from from Validating an asset inventory list Zeke may also generate a software log software.log where it can identify software communicating Across the network client server, you know and Possible if possible include the actual version of that software. So examples might include Identifying Windows operating system versions and clients and servers communicating over HTTP FTP SSH S&TP my SQL And this can be useful during an assessment to identify Network hosts or devices that are running software or firmware with known vulnerabilities and Or software that's out of date, right that hasn't been patched And when identifying servers by operating system type of application running and and you know, like I said the version of that software So we've talked a little bit about Zeke now, let's talk about Archemy Archemy is the other pcap analyzer used to populate Malcolm's network session metadata database So Archemy has a lot of similarities to Zeke in In that it parses network traffic data and and generates these these sessions these these logs That represent network connections These Archemy session logs are written into an elastic search database where they are indexed and they become searchable What's unique and powerful about Archemy is that these network sessions then can be tied back into That original payload that exists in that pcap file So that allows for deeper packet inspection and searching that's not just limited to packet headers So that's kind of the the list of components that put together that are put together to make Malcolm at least the main ones the components comprising Malcolm are our industry standard open source tools and That makes it easy to integrate Malcolm with other solutions in those tools respective ecosystems whether that be importing more Zeke plugins or or dashboards into Kibana or whatever it happens to be I've got a list here of the network traffic network application protocols that Malcolm can can parse interpret And and there's dozens of them there as you can see including several protocols commonly seen in OT networks Much of Malcolm's development right now is dedicated towards improving Malcolm's coverage of protocols used by ICS devices So let's kind of talk about the the journey that your pcap file will take as it is on its way to being enriched and indexed and and user searchable upon upload Malcolm generates the meta data for the network Traffic that was represented by that pcap file using both Zeke and Archimedes Mullick capture And so that pcap gets sent to directions Mullick capture aggregates its metadata for particular network connections and it Aggregates that into a session record, which is then written into elastic search for us to index Zeke as well as we've talked about generates these log files that are broken out primarily by application protocol And they similarly contain Metadata that that is you know quite that is not unlike that done by Mullick capture Malcolm also uses Zeke's ability to carve out files transferred over these protocols and these files can be scanned For example by an antivirus tool or or preserved for analysis With you know with other tools with whatever your your tool of choice is So those Zeke logs are forwarded by file beat to logs dash for further enrichment And it's normalized to the same underlying database schema the underlying the same underlying field schema that Archemy uses So that those Zeke as much as possible those Zeke logs and those Archemy sessions can be viewed side-by-side as Apples and apples and then that's indexed into elastic search as well And then once ingested by elastic search Malcolm provides two interfaces for visualizing that network traffic Kibana Primarily for the Zeke logs and then Archemy viewer which can be used to visualize the Zeke logs as well as the Archemy session data So now that we have had Kind of an overview of the main components of Malcolm and how they fit together and kind of the theory behind what they do Let's get into the process of actually Doing network traffic analysis and and that surprising left that doesn't start with just uploading your p-cap file You would think that would be the first step But there's actually one very useful step that we want to do first as much as possible And that is to identify network hosts and subnets so For that there's an interface that Malcolm provides Called the host and network segment name mapping interface and that allows you to assign names for network segments or subnets and a host and that might be any kind of endpoint whether that's a server or a desktop or a laptop or a PLC or an HMI or You know, whatever whatever has IP addresses or MAC addresses on your network As Zeke logs are processed into Malcolm's elastic search instance the logs source and destination IP and MAC address fields Which is Zeke dot a ridge underscore H and that's that naming convention We kind of hinted at it earlier, but that originating host. That's a ridge underscore H Zeke dot rest underscore H that's responding host and then the MAC address fields a ridge underscore L2 underscore adder and REST underscore L2 underscore adder So basically, you know source IP Dust IP source Mac and desk Mac They're compared against the lists of the host addresses provided in this interface and When a match is found a new field is added to the log It's Zeke dot a ridge underscore host name or Zeke dot rest Responding underscore host host name So that a ridge underscore host name and the rest under score host name is added to allow your Custom-defined host name that you've you know mapped out to these IP addresses or MAC addresses to be to be actually written along with those logs For traffic matching the list of segment addresses provided Zeke dot a ridge underscore segment and Zeke dot REST underscore segment fields are added if both Zeke dot a ridge underscore segment and Zeke dot rest Segment are added to a log and if they contain different values different subnets Then a tag a value to the tags field will be added with the cross-segment value and that allows you to Conveniently identify cross-segment traffic. So at this, you know, maybe right now you're like my eyes are glazing over What is this REST underscore a ridge underscore L2? It's it's really it's not that complicated. It's just a way for you to say hey Here's some IP addresses. I want to give them names that I recognize. Here's some network subnets I want to give them names that I recognize, right? This is my This is my corporate zone. This is my OT zone. This is my DMZ whatever and then If you can identify what those are ahead of time Upon ingestion of that traffic will just Malcolm will tag that traffic as such and then note when things like cross traffic Cross-segment traffic is happening, right? If you add device in your Control systems network that is that is talking to reaching out to the internet or talking to a device in your corporate network You know, it would ideally flag that as cross-segment traffic and you would be able to recognize that without having to go hunt it down yourself There's all field in this in this interface called the Required tag field and basically what that is is it's a way for you to say only Apply this tag or only apply this name the segment or host name if this tag exists As we get into uploading pcap files in a minute We'll talk about how you can apply tags that go along with your uploaded pcap data so in other words if if for example, I'm Uploading a pcap that's represented by a particular representative of a particular Facility and that facility's name is you know facility ABC I could tag that upload with facility ABC and then in this Network segment name mapping interface Basically say only apply this name to this IP address if the tag facility ABC is present These mappings can also be defined as in a delimited format in cider dash map dot text and host dash map Dot text in the Malcolm installation directory I would refer you to the Malcolm documentation on github to see the format for that I hope to be able to come up with a feature in the future where like you could take an expel spreadsheet or something similar and Upload it straight into this interface and have it magically figure stuff out for you It's not quite there at this point, but that's that's something that maybe we can look at doing in the future So we're going to go ahead and do our host and network segment mapping now And I'm going to do that by navigating on the Malcolm web interface to My Malcolm instances running on local host and the name map UI is the interface that I'm interested in And since I was previously authenticated before I started recording this I I didn't have to put in my username and password now if I was going to start doing this from scratch You know depending on how big my network is I'd either just start typing IP addresses in you know one at a time Something like this Click save or maybe another segment home network We're gonna call 216.0.0 12 or something like that And you know this is home network, etc In this case for our example, I've already created a network mapping for the p-cap that I'll be using for our For this demonstration So if you've done that before you can actually save that and back that up and then restore it later Which is what I'll be doing now So I'm gonna click import And I'm gonna replace the name mappings with this net map.json file that I Previously created that already corresponds to this p-cap I'm gonna do that and you'll see that it populates my list here with segments and hosts that That make up my network map As you're doing these you can also you can search the list for example if you want to come back later and Look up, you know what you what you created as your you want your list of you know Historians or something like that I can start typing historian here and it'll filter that list or Just any text that's available here You know we can start typing that and then it will filter that list But once we've defined our host and network name mappings, we're gonna go down to the bottom of this list We're gonna click save and click yes to save our name mappings and Then what we're gonna do is before that will take effect into our name ingestion into our p-cap ingestion Excuse me, we will need to click restart log stash and it will tell us Make sure that we actually want to do that want to apply the save name mappings and restart log stash and If we do that It says log stash ingestion is restarting in the background log ingestion will be resumed in a few minutes and I can click okay and And at this point after I wait a couple of minutes and wait for log stash to come back up we will be able to To continue with our upload and make sure that those IP addresses are mapped to the names that we specified here Once we've defined our network Subnets and host names We're ready to upload that p-cap data It elastic search is a right once read many kind of mentality as of a storage Platform and so we can't upload p-cap data and then go back after the fact and Apply those network and host names we need to We need to have that ahead of time so that we can enrich that data as we're doing it And so if you come up later, and you say I didn't do my network hosts or whatever It's not like a huge deal Malcolm generally is pretty quick to analyze network traffic And so you know what I would do is is wipe that data out Clear the database for Malcolm and then you know apply my network Subnets and host names just re-ingest the data is kind of the workflow that we generally see So so once we've done that Malcolm can be must be provided with network capture Network traffic to analyze right in other words a p-cap file So we've talked a little bit how that could be done with dedicated network sensors like hedgehog Linux Oftentimes though in an assessment you'll be given p-cap files that have been previously captured and and you know Provided to you as network security analyst or that you have captured at some other point in your network Or some other point in the past and you're bringing in to analyze now So p-cap files can be uploaded to Malcolm by processing For processing by accessing the the upload interface So whatever your Malcolm IP address slash upload on the host at which Malcolm is running Prior to starting the upload as I mentioned you can add tags which will allow the data from the p-cap files Being uploaded to be searchable using those tags later on There's also some other behavior here on on how the p-cap file is parsed With regards to whether you're analyzing it with Zeke or whether files are being extracted or not Generally, I like to set these in the the configuration options So that they're just done globally and I don't have to mess with them here But if you want to override the global behavior that you've set during configuration With as far as the Zeke and Zeke file extraction goes you can do that here in the upload interface We're going to upload our p-cap file now by navigating to the local host slash upload and that's going to present us with the Malcolm capture file and log upload log archive upload Interface that we that we just discussed I have got my p-cap file here and So as it uploads this p-cap file I talked about the tags. It's going to apply So, um, you know if I wanted to tag this with for example, I'm doing this for a for this training video so I could Create a tag called training and it would add that or if this was a particular site, you know site ABC customer one two three You know incident Omega Whatever oops, you got the idea. Anyway, I could apply whatever tags I want here and those tags would be searchable after I do the upload Additionally Malcolm will take and create tags from the name in the p-cap file itself So in this case where the p-caps file name is cyberville dot p-cap The tag cyberville will automatically be applied So I'm not going to create any additional tags here. I don't really need any besides this default one that I've got From the file name But I could add as many other p-cap files as I wanted to here if I had multiple p-cap files And and then once I'm done, I can either start them individually or just click start upload Before we we take a harder look at Kibana and Archimedes user interfaces Let's talk for a moment about the fields that Log stash can use to enrich log data before it's written into the database. So In other words, there's there's a lot of stuff that we can infer from our network session metadata that might not initially be in that data Mac addresses for example, Mac addresses can generally be mapped to a hardware vendor as the first three octets of a Mac address Are called an OUI or Organizational unique identifier it can be used to distinguish a network card that was manufactured by you know Intel from one manufactured by Dell or you know a VMWare network interface or An interface on a Schneider PLC or whatever This this Mac address then is used to look up that vendor and and that's added into that log whenever possible Malcolm also can be configured to do GUIP and ASN lookups for IP addresses We can identify Internal and external traffic based on IP ranges or in other words, you know private IP ranges versus publicly routable globally routable IP addresses Malcolm can do reverse DNS lookups. It can do DNS query and hostname entropy analysis in order to detect DGAs or domain generation algorithm host names that are often used by malware. So some malware will will use for It's a you can go read it on Wikipedia or whatever right look up domain generation algorithms, but basically, you know, the malware generates this long giant Hostname or URL or whatever that that is like a Bunch a whole a whole bunch of characters letters and numbers dot a whole bunch of ugly stuff and That high entropy that that randomness in that URL hostname Or that DNS lookup or whatever Can be used to can be can be bubbled to the top for your for your analysis The other kind of things that we'll do on enrichment with Malcolm is Community standard fingerprinting algorithms to be applied Whenever applicable so that can make Malcolm's data cross-referensible with other tools. One really good example of that is a Flow hashing technique called community ID that basically takes the Relevant stuff in a network connection IP addresses and ports Primarily and generates this unique hash of That flow and then that hash can be used to cross-reference Logs from you know, not only inside of Malcolm Zeke and arcane But you know off to suricata or or whatever other network tool that supports community ID and it's quite a long list The other one that I would mention we've already kind of talked about this is enrichment is done To populate that tags field with each log And some of the tags that I would make particular note of would be these ones that are listed here the internal source Internal destination External source external destination. Those are hopefully self You know self-describing or whatever but to identify traffic that is on a private subnet your 192 dot your 172 dot your sorry your 192 168 dot your 172 dot and Your 10 dot IP address ranges versus globally routable IP addresses and then that cross-segment one if you've defined your network segments in the Previous interface that we talked about That cross-segment tag will be applied whenever it sees traffic in the Z Clogs that crosses one of those network segment boundaries Once you've got data into Malcolm It'll take once it's been processed after a few minutes You'll start to see logs trickle in and and you can visualize that log data Kibana is one of Malcolm's two user interfaces for visualizing log data where Kibana really shines for me is in providing an intuitive interactive representation of log data that That simplifies the process of starting it like this high level and and then being able to drill down quickly To stuff that is interesting to you So filtering from this big pile of hay and you're looking for the the needle in the haystack to like a much smaller pile of hay That makes it easier to find the needle Being able to quickly drill down on an individual host or connection of interest Malcolm comes with dozens of pre-built visualizations specifically for data ingested from from its Z Clogs It's dashboards fall into two categories overview dashboards and Protocol specific dashboards and we'll review some of those here in a minute And aside from the pre-built dashboards Kibana provides an easy drag-and-drop wizzy wig editor for creating new visualizations on the fly Now The key to effectively using Kibana is learning how to how to apply filters how to apply filters How to how to search the data? so As we talk about applying filters like you know You'll be using these patterns throughout all network traffic analysis in Kibana to to do what we just talked about to start from this big Picture and drill down into something to zoom in on something of interest The first step to applying filters is Identifying the time range of interest This can be done by using the time filter controls that are in the upper right-hand corner of the interface if and this is me as much as anyone else I've ever seen but Oftentimes if I'm not seeing the traffic that I want to see and I'm like where's my data? It's because I'm searching on the wrong time frame, right? It's because I captured my p-cap two weeks ago for analysis and I'm finally getting around to it and like the default controls for Kibana are Like set for the last 15 minutes or something like that So always check your time frame in that in those time controls in the upper right-hand corner And then you can use that time histogram that that that like logs over time graph that you'll see on most of the Kibana interfaces most of the dashboards to to zoom in or zoom out based on what time frame you're interested in Additionally the query bar allows you to specify search constraints And you can do that using a couple of different syntaxes actually the traditional leucine query syntax or the new KQL Kibana query language syntax and in the documentation and in some tables that I'll probably include here in this video you'll see maybe some of the differences in those syntaxes and and there's really no magic bullet other than just You know having the reference bookmarked and and then getting used to what that syntax is for creating your queries Modifying the contents of this search bar and then hitting enter or clicking the search icon to the right of it will run the search and update the results that are displayed so You know the time filter that search query bar and then finally the filter bar which is more GUI kind of based filter creator underneath the search bar is the third way of specifying search constraints Although it provides more of a GUI kind of interface to do so It's generally not quite as like freeform or flexible as as writing textual queries In most cases, there's not really a meaningful distinction between putting query terms in the query bar versus the filter bar What the filter bar does allow you to do which is nice is allow you to pin those filters So that when you navigate from one dashboard to another Those filters stick around and the ones in the query bar generally don't Filter bars are also what's the filter bars also what's populated when you click on values in the charts or tables So whenever you see a value in a chart or a table in a cabana Visualization you mouse over it. They'll be like a little Magnifying glass with a plus icon or a minus icon in it so you can use that to create or exclude filters or create Filters for or filters to exclude those values based on what you want to see or don't want to see in that result set And as you click those inner click those magnifying glasses, you'll see the filter bar is updated to To reflect that filter I've read that a future release of cabana will merge the query bar in the filter bar Into like one component. I'm not exactly sure how they'll do that, but I guess that's something we have to look forward to Let's talk about the overview dashboards The dashboards under the general section of Malcolm's cabana interface provide high-level overview of network traffic from across all of the Logs generated by Zeke in other words, they're not restricted to like a particular application protocol These dashboards are a good jumping-off point for investigation when you're trying to get a feel for the network and application protocols And and the hosts that are on that network and using those protocols One of note is the notices dashboard As discussed earlier Zeke notices are the tools way of raising some some event to the forefront of an analyst's attention And those notices are summarized here in the Zeke notices dashboard The third-party Zeke plugins that Malcolm uses To generate additional notices or to analyze traffic in different ways can be found in the Malcolm read me on the Malcolm's GitHub page They include some of the interesting ones that include but aren't limited to Notices that are generated for clear text passwords detected in HTTP post requests non-compliant HTTP post requests Sorry non-compliant HTTP requests like those that might be used for You know first HTTP smuggling XOR obfuscated file transfers I mentioned this earlier, but behavior or techniques categorized according to the MITRE attack framework And then like lots of various other CVEs and vulnerabilities bad neighbor call stranger Sig red zero log on You know the ECC certificate validation And the new TLS unencrypted session ticket dissection the eternal family of Samba Windows exploits from SMB v1 which includes eternal blue internal synergy eternal romance double pulsar Other various SMB exploits ripple 20. So the developers of Malcolm endeavor to stay abreast of Developments in the threat landscape and whenever possible we release updates that include, you know, new Zeke plugins to detect Or scripts to detect these vulnerabilities and exploits as they are discovered Another interesting dashboard from a security standpoint a pair of dashboards actually is the security overview dashboard and the ICS slash IOT security overview dashboards These highlight events that may be of particular interest from a security standpoint including Zeke notices signatures triggered from file scans clear treks transmission of passwords outdated or insecure versions of application protocols traffic originating from or directed to public IP addresses Types of files transferred and more these these dashboards can be a good jumping off place When looking for indicators of compromise in your network or vulnerabilities in your network traffic Where possible Malcolm correlates common fields from across different protocols To allow you to view one device or applications network traffic in the context of the other traffic occurring around it For example multiple failed HTTP authentication attempts followed by a successful authenticated HTTP post operation followed by I don't know sequential reads and writes To a file server could indicate that a foothold was obtained in an HTTP server that allowed the adversary to pivot to another service on the network So a good example of this is the actions and results dashboards In which actions a lot of network protocols have the concept of of action And you know a cause and effect action and result Request and response whatever you want to call it and so actions or things such as a file was written a logon was attempted A web page was requested and then the results would be success access denied page not found Across all the protocols that that I can figure out what those actions and results are I normalize those to the same fields so that in for example the actions and results dashboard You can see actions and results across all these different protocols together In addition to the overview dashboards Malcolm provides dozens of dashboards tailored specifically to application protocols including protocols commonly found in Industrial control system networks as well as those found in more traditional IT kinds of networks The discover view Kibana's discover view enables you to view events on a record-by-record basis Similar to a session record in Archemy, which we'll discuss in a moment In other words the discover view allows you to look at an individual line from a Z Clog an individual record from a Z Clog The data table in the discover view can be customized to display only the fields that are relevant to the traffic that you're interested in So for example, if I wanted to put together a play-by-play of an HTTP session Rather than looking at this big You know JSON log that contains all the fields that may not be relevant to my traffic right now. I could Go to the discover page filter on seek.log type for HTTP Sort by time and then include source IP user agent referer desk IP And then the HTTP host URI and status message right and and by doing that I've got a more Focused view of HTTP traffic that I can then go through and see this happen and then that and then this And then once I've got a configuration that I like for a particular kind of traffic I can store I can save that search as you know HTTP traffic analysis or whatever I want to call it and then return to it for further investigation in the future The visualizations page allows you to view and manage visualization components Which are are like graphical building blocks to be used in dashboards Cabana includes lots of different kinds of charts tables maps for displaying your data Well, Cabana is great for at a glance views and for creating custom visualizations Archemy which until recently was known as Molek provides another interface for examining Network traffic that may be better suited to in-depth analysis and network forensics Earlier when we talked about the Malcolm PCAP processing pipeline. We mentioned that that PCAP data got sent to directions right it got sent to Zeke and And then it also got set down to Molek capture, which is Archemy's program for ingesting that data while Malcolm's Cabana dashboards are focused on the Zeke logs and The Archemy sessions won't necessarily be reflected there Malcolm's instance of Archemy can be used to view both the Zeke logs and Archemy sessions together in the same interface another really great strength of Archemy and and I mean arguably it's Kind of initial reason for existing is as a full PCAP a Full PCAP solution right its ability to tie the session metadata back to the original packets bytes The original packet payload which allows you to view and search and export the data Deeper in the PCAP that may not be referenced in the metadata. So Archemy What it really allows you to do which I think is So so incredibly powerful is efficiently deal with very large PCAP file sets And still have access to the underlying payload data Something that wire shark struggles to do so similar to Cabana we want to learn how to Effectively build filters in Archemy To narrow in on the data that's of interest to us So we've got at the top of the Archemy interface. You've got controls for Specifying your time filter to define your time search frame your search time frame and Just like in cabana. That's very important. Make sure you know what the time frame of the data that you're looking at is There's a little globe icon that can be clicked to expand a map filter that allows you to Restrict results to a geo location, which may be of interest if you're looking at data that that is going out to the internet There's a query bar where you can specify queries in Archemy syntax and then To the right of the search button. There's this eyeball This the views button and that allows you to overlay Previously specified filters on to the current sessions filters and so for convenience Malcolm provides several Archemy preconfigured views Including some that involved the Zeke log type field so that in Archemy as we're viewing both Archemy sessions and Zeke logs You've got a quick way to say hey right now I only want to look at Archemy sessions that are tied to PCAP files or I want to look at Zeke logs that Were generated from PCAP files, but don't leak directly back to the payload like the Archemy sessions do so You can see those here the Some of the views that Malcolm has pre preconfigured for us Archemy sessions tab provides a low level details of the sessions being Investigated in a way similar to how Cabana's discovery interface does it and in the sessions view You will see Archemy sessions that are created from PCAP files And then that Archemy session log is written to the elastic search database And you'll see Zeke logs mapped to that same Archemy session database schema And you'll see those together in the same in the same pane of glass It should be noted that you can distinguish between the two By the value in or the absence of a value in the Zeke log type column in the sessions in the sessions table Similar to how we did with the discussion to the discover table earlier in our discussion you can also Customize the set of fields present in the sessions table and then you know save or later recall Those configurations of fields that you're interested in to exist in that table as mentioned Archemy's ability to Tie a session record back to its original packet is is one of its greatest strengths So details for an individual session or log can be expanded by clicking on that plus icon on the left-hand side of the On the left-hand side of each row in the sessions table for Archemy sessions records An additional packet section will be visible underneath the metadata section When the details of this section When the details of a session of this type in other words an Archemy session are expanded Archemy will will reach out to where the pcap is stored and Extract the payload for that session for display here various controls can be used to adjust how the packet is displayed Personally, I like to enable natural decoding and click show images and files And and that produces, you know visually appealing results to me for for when I'm looking at payload data But there's lots of options there other options also become invisible when you have Pcap data available when you have that pcap session data to be extracted for the payload and that includes downloading the pcap itself downloading and generating carving out if you will a pcap for that particular session Carving out and downloading or viewing images and files applying decoding filters and Examining payloads in CyberChef all that can be done from this packet payload section Back up at the top of the interface If you click the down arrow to the far right of the search bar, you'll see some new actions presented there including pcap export When a full pcap sessions are displayed the pcap export feature allows you to Generate a new pcap file from the matching Archemy sessions Including controls for which sessions are included in other words open items only the ones that I've actually got expanded right now Visible items, which is everything that I'm seeing on this page right now or all matching items everything matching my current search filters And then whether or not to include linked segments once you've defined how you want to do that What your filters are and then which sessions you want to include click the export pcap button? To generate the pcap and after which you'll present it with a browser download dialog to save or open that file In y-shark or whatever it happens to be note that Dispend Depending on the scope of your filters. This could take a long time right to generate that pcap file or it might even time out So it's it's a good practice to look at the number of matching sessions There on your Archemy sessions interface before you go exporting a pcap and if it's like hey, you know, this is One billion sessions. I may want to apply further filters before generating that pcap file to further narrow my narrow my search Note here as well and and this is kind of a known issue that that I hope to be able to figure out a way to resolve but You will probably get an error if you try to do an export pcap Without applying the pcap files view with that little eyeball icon to the sessions first There's there's a bug in what I'm doing. I guess or something that when It tries to export a pcap file from logs that don't have pcap associated with it You might get an error So if you'll if you'll apply that pcap files view first, you'll make sure that that error is avoided Moving on from the sessions interface, we're going to go to Archemy's spy view spy stands for session profile information and the spy view provides a quick and easy to use interface for Exploring session or log metrics Basically, what the spy view page does is list categories for general session metrics metrics like protocol source and destination IP addresses source and destination ports And also like all of the various different network protocols that are understood by Archemy and Zeke So whether it's HTTP or SNMP or backnet or whatever These categories can be expanded and the top end values displayed for whatever field of interest I Exists there including that fields values cardinality in other words, it's a it's a good like top talkers top end whatever display of any field of interest for you in in Malcolm's logs and between Archemy and Zeke data sources Malcolm's list of Of fields that are available for you to check out here in the spy view is is over 1300 Different fields across all these different kinds of network traffic So click on the plus icon to the right of a category to expand it And the values for specific fields are displayed by by clicking that fields description in the field list underneath the category name That list of fields can be filtered by typing part of the field name in the search for fields dialogues to display in this category text input The the load all and unload all buttons can be used to to just like Bring forward everything that it knows about that protocol at once But you may want to be careful with this as it is going to run a lot of queries and and it might take a while So, you know, if you know you actually want to see everything. Yeah, go ahead and smash that load all button but It it might you know end up giving you more data than you really want depending on on what your interests are Once displayed a field's name or one of its values can be clicked on to provide further actions for filtering or displaying that field or its values So of a particular interest might be the open field name Spy graph option when clicking on a field's name or pivoting to another sessions tab with that field filtered That'll open a new tab with the the spy graph populated or the sessions populated with that filter already applied Note that because the spy view page can potentially run many queries Spy view limits the search domain to seven days or in other words seven indices as Each index represents one day's worth of data in Malcolm So when using spy view basically that means you need to limit your search time frame to less than or equal to seven days Before you flip over this tab or or it'll complain at you for doing that Spy graph is is another really cool way to visualize the top end field values of a particular field both chronologically and geographically So spy graph session profile information graph visualizes the occurrence of whatever fields values over time and and for me that's Particularly useful because it helps me identify trends in a particular type of communication over time when I'm looking at Just kind of the conglomeration of all of the traffic over the date histogram there up at the very top It's hard. It's hard to pick out patterns for a particular protocol or a particular IP address or user agent so Traffic for example using protocol as an example traffic using a particular protocol when seen sparsely at regular intervals on that Protocols date histogram in the spy graph could indicate, you know a connection check or polling or beaconing and And having it split out by protocol by value like that Is very useful But it doesn't have to be protocol It can be any of those 1300 fields that Malcolm knows about you can set as kind of that pivot value for the spy graph to To split out those top end values and show you chronologically and geographically where those took place Controls can be found underneath the time bounding controls for selecting that field of interest For the number of elements that you want to be displayed for the sort order and for if you want periodically refresh That data view or not One of my favorite ones to play around with is the connections view But also just from being cool, it is very useful the connections view presents network communications via a force-directed graph so it makes it easier to visualize logical connections between These logical relationships between network hosts or between subnets or between You know again any fields any of these 1300 different data points that we have You can visualize kind of how the traffic goes between source and destination based on that field values controls are available for specifying the query size So this again can run a lot of queries and so By default like with the query size sets to to small. This is like a hundred. I think is the default You it'll run faster But you actually may not be getting all of your data right So What I like to do is set up my other filters and get everything kind of the way I want it with my my query size set to small and then once I've got it I'll set the way I want it with my filters and my views and like the source and destination nodes set to the fields that I'm interested in Then I will increase the query size To like the max value so I can you know see everything That I'm interested in there But that will take longer to execute right So you can select the query size you can select which fields to use as the source and destination for node values You can set a minimum connection threshold The method for determining the weight or the thickness of that line between the nodes and and the sides of the nodes themselves As is the case with pretty much every other visualization in Archemy that graph is interactive So by clicking on a node or clicking a link between two nodes You can you can apply or modify filters. You can reposition the nodes themselves by dragging and dropping them A node's color indicates whether it's communicated as a source or a destination an originator or responder or or both So while the default is source and destination IP for those fields The connection view is able to use any combination of any of the fields that Archemy knows about so some that I have found interesting or useful before Selecting source OUI and desk OUI to view which Devices from which hardware manufacturers are speaking to each other Source IP to protocol can be a good way of visualizing What hosts are communicating to what servers out there based on the services that they provide Maybe originating network segment and responding network segment if we populated our network segments At the very very beginning before we even created our pcap file in that define network hosts and subnet name interface Maybe originating Goip city and responding to IP city could be interesting to see where my source and destination traffic is Is going out to on the internet? So any of the combination of these or other fields can be used to specify that source node and destination node in the connections view Another cool recent addition to this feature and one that actually we developed here at the INL and then contributed backup stream to Archemy Was the ability to specify a baseline time frame in the connections view and then visualize Use that baseline to visualize changes to a network over time in other words You know new hosts or protocols that appeared in my network this week that didn't exist last week This feature is mainly useful if you have prior long-term packet captures available in order to establish that baseline Right in order to define what my previous time frame is versus what my current time frame is So that you can do that comparison Another really cool feature of Archemy that allows you to To get access to the payload data and actually search the packets themselves Versus just the session metadata is the hunt feature You can kind of think of this as like a pcap grep The the search string that you specify for the hunt filter can be specified with ASCII with or without case sensitivity Hex codes regular expressions and basically you you create this packet search job on the hunt page with the filters and the other parameters that you're interested in to limit that search scope To to make that packet search go a little bit faster And then once that hunt job is complete it runs in the background But once it's complete it tags matching sessions with that hunt ID and you can go view the sessions or the matching payloads in the sessions view So note that whatever filters you specify in the search bar when that Hunt bar when that hunt job is created will also apply to the hunt job as well So if I'm over on the sessions view and I've got a filter in the query bar and I switch over to the hunt view those those filters Limit the search of packets the search of the scope of the packets that the scope of the search of the packets that I'm looking at in that hunt job and And so pay close attention to that There's a little little like information icon that says creating a new packet search job will search the packets of one million sessions or whatever that number is just be aware of what that number is because Your your search your hunts execution time will be directly related to how many packets it needs to go out and look in Note that the hunt view is only available for sessions created from full packet capture data not Z Clogs In other words archemy sessions only not Z Clogs So it's a good idea again here to apply the to click that eyeball icon click the view eyeball and Select a p-cap files view to exclude Z Clogs from your candidate sessions prior to using the hunt feature a couple of notes on correlating data sources between the archemy and Zeke data sources Note that because these are different tools developed by different organizations for different purposes The search syntax between archemy and cabana is different right Malcolm is utilizing both of these open source tools together but You know in some cases the In all cases really the search syntax will be a little bit different between archemy and cabana And in some cases the field names themselves will be a little different so You know refer to that documentation refer to that table that we showed you earlier To to compare like how do I write a query that? Searches an IP address based on its presence in a subnet You know or for sessions that that include that subnet or for how do I do a regular expression or whatever? Because it won't necessarily be the same in archemy versus cabana The as I mentioned archemy uses its own field names in its user interface for example in archemy You would search protocols equals equals HTTP, but in cabana the equivalent search would be protocol colon HTTP So going to archemy's help Which is the click on the owl icon in the upper left hand corner and then go down to the field section Which is at the bottom of that help page and click? Database display display database fields can help you map what archemy's field names are to What the underlying database field names are that the that the zclogs might use? As much as possible during ingestion we do try to map those Archemy fields to sorry the zclogs to their corresponding archemy fields But but it might help to know like if I'm jumping from archemy over to cabana And I want to know what fields it was actually searching when I was in archemy You may need to do some manual mental mapping of the archemy field names to the underlying database field names Also, you know despite considerable overlap, especially for common protocols There are differences in protocol parser support between Zeke and archemy Notably Malcolm's configuration of Zeke parses a lot more ICS protocols than than archemy does Because that's one of the main focuses of this project So we've looked at these two different interfaces to analyze that same underlying The session metadata from our network traffic we've taken this one pcap and Archemy has generated its session records and Zeke has generated its logs and How can we correlate the two how can we how can we bring them together and and see like this get the strengths of both tools? right get the information about the protocols and stuff that Archemy maybe doesn't have support for From our Zeke logs, but then at the same time be able to see the archemy sessions that correspond to those Events of interest or whatever they are from the Zeke logs so that we can get down if we want to and see the payload and and do analysis kind of at that level So as I mentioned just in in the previous topic like One of the things that that Malcolm does to try to facilitate this is it maps wherever possible? Zeke fields to corresponding archemy fields in the database schema But then for any protocols or fields where? Where Archemy doesn't already have native support Malcolm creates a native Database a native data source type I guess for that kind of Zeke log For all these other Zeke log values for which there's not currently Support or an equivalent in Archemy The field section of Archemy's help which I referenced a few minutes ago can provide a list of all of those known fields across both the Archemy and the Zeke data sources So in this way when a full packet capture is an option Analysis of pcap files can be enhanced by the additional Information Zeke provides and why I say when pcap full packet capture is an option you know there may be cases where for whatever reason whether it's size constraints or You know Sensitivity of the information that you may be able to capture zeke logs just zeke logs and actually not store the full pcap So in that case Malcolm can still be used in both the Archemy and the Cabana interfaces But when you have the full packet capture available to you You you are able to you know Enhanced that full packet capture beyond just what the Archemy session is giving you with with the Zeke logs as well versus the Zeke logs just being alone The value the values of the records that are created from Zeke logs can be expanded and viewed Like any native Archemy session by clicking that plus icon to the left of the record in the sessions view Just as we showed you when we were talking about the sessions view However, note that when when you deal with those Zeke logs The Zeke records the full payload That section on the packet payload it doesn't exist right because the packet contents aren't available So the buttons that deal with viewing and exporting pcap information Like don't behave the same they don't do anything basically as they would for records with pcap files Other than that those Zeke records and their values are usable in Malcolm just like their native Archemy session counterparts a Few fields of particular mention that help limit return results to those Zeke logs and Archemy session records generated from the name from the same network connection our Community ID and Zeke's connection you ID we've mentioned both of those earlier and We can talk about how Let's talk about how we can use those to To get kind of that full picture right so so the example that we're talking about here is Something interesting happened on my network. I want to see in one list Everything I know about it. I want to see all the Zeke logs And I want to see the Archemy session or Archemy sessions that corresponded it to it There's kind of a hard way and an easy way to To do that the hard way would be You know find the logs that I'm interested in or whatever in Kibana or or however I want to do it and then like say okay I've got this source IP and that desk IP and this poor and it happened at this time and just kind of try to like Handcraft your filters to to include everything you want, but nothing you don't want and that's difficult to do But what we can do is we can use this Zeke connection UID Zeke dot UID and that Community ID and we can use that to kind of build a query filter That includes everything that has to do with that session across both Zeke logs and Archemy sessions Community ID is Specification for standard flow hashing published by Coralite and the intent is to make it easier to pivot from one data set Like Archemy sessions to another data set like Zeke condot log entries In Malcolm both Archemy and Zeke Populate that Community ID value which makes it possible to filter for a specific network connection and see both data sources results for that connection that Zeke dot UID is also mapped to another Archemy database field called Root ID so you can use Zeke dot UID or Root ID interchangeably that Root ID field is used by Archemy to link session records together when a Particular session has too many packets to be represented by a single session When normalizing Zeke logs to Archemy schema Malcolm kind of piggybacks or hijacks on Root ID and stores that Connection UID to cross-reference entries across Zeke log types, so That that's interchangeable Zeke dot UID or Root ID So the the cool pattern that I want to kind of get across here in this example is by filtering on the Community ID Or with the Zeke UID so Community ID equals equals some horrible long string or pipe pipe Root ID equals equals some different horrible long string that represents the Zeke UID That will that will cast this tent which will include both the Archemy sessions and the Zeke logs That are generated by this particular network connection and you can see them together in one place one item that was Mentioned earlier, but we didn't really get into the details of how it works is the Malcolm's ability to analyze files that are carved from network traffic As I did mention earlier as I reference this feature Zeke can carve files from a variety of protocols In observed network traffic And then those files can actually be extracted and stored temporarily locally by Malcolm for analysis Malcolm leverages that feature to submit these carved files to a number of file scanning tools Clammy v. For example, which is an open source antivirus engine can be used to scan for known malware signatures Yara the pattern matching Swiss army knife Scans the files using a curated list of security related signatures or your own custom Yara signatures that you can write Kappa which is a portable executable capabilities analyzer and Virus total an online database of file hashes now to use virus so you do have to specify your API token And you would need to have an internet connection for those To be submitted for those hashes to be looked at so Malcolm can be configured to There's a couple different other configurations with the file scanning behavior that you can set number one You can set which kinds of files you Extract and scan to begin with Meaning do I want to scan all files? Do I want to scan files? that are just of mime types that may be of particular interest when it comes to Security standpoint right things like zip files or executable files or PDF files that might be vectors for for common known attacks And then in addition to which file in addition to which files we scan We can specify which files we want to preserve if any In other words, do I want to preserve all files and then to get stored in this directory locally on the Malcolm instance? Or do I want to only preserve files that get hits from the scanning engines? That might be flagged as suspicious Or do I not want to preserve files at all and then those files if they are preserved can be downloaded from that Directory where they were preserved and then you can do whatever further examination you want on them whether that's submitting them to other file scanners or reverse engineering them with with Ghidra or or Ida Pro or whatever your tool of choice is there as The files are scanned if Zeke file carving is enabled and the scanners are turned on questionable files will be written into the signatures log and reported on the signatures dashboard in Kibana The signatures dashboard will break it down by the scanning engine and then also the name of the signature Or rule that triggered on that file The Zeke connection UID Zeke that UID and file UID Zeke FUID Fields in these logs in the signatures log can be used to cross reference to other Visualizations to provide the context for how that file was transferred So in other words, you know finding on the signatures page a reference to a Zeke FUID for a triggered signature filtering on that Zeke FUID and Then jumping over to the files log to see What what is it about this file that I know? How was it transferred? How big was it what other dashboards might reference that file that sort of thing? Just a few search tips before Before we close out here To effectively searching Kibana and Arkamy number one always check your search time frame If you're not seeing the data you're expecting to see often It's because the data lies outside of the window of time that you're searching Number two this effective technique for investigating is is zoom in For example narrowing on a particular type of file that's transferred or a particular host Once you find something of interest Pivot to another field so select the source IP address that was the source of this file transfer and Then zoom back out by removing the file type filter to see what other activity that source IP was involved in That that's a very effective way when dealing with a large data set is is you find some value that's of interest to you Zoom in on that value see the contextual the other values in Context with that and then zoom back out by filtering on these new values and removing the original filters Remember that most elements in Kibana and Arkamy are interactive and can be configured to work with any of the more than 1300 data fields that Malcolm knows about so Generally in Arkamy and Kibana if you can click on it you can you can create a filter or you can pivot to some other view and and You don't have to go find these things manually You know don't don't Save yourself some work by it by learning to create filters from the graphical interface Rather than just having to go remember and oh, what was my QA string that I was doing before and then get to it that way Learn how to filter on on common fields like Z dot log type the the source IP desk IP source port desk port protocol field action and result OUI source OUI and desk OUI These fields that get normalized across all of the different log types that we can do with Zeke You know, there's there's you're not going to memorize all thirteen hundreds of the fields that we have That we could populate in Malcolm, but you know as you start to become familiar with those most common ones Learn to build filters around those fields and you'll become more effective in your searching Finally, don't forget about that tags field Using pre-populated tags like the ones that we do during enrichment for the private and public IP space The tags that you that's got generated when you uploaded your data The tags that got created during the segment mapping stage of enrichment All of these tags including tags that are populated automatically based on p-cap file name Learn to search on those tags apply filters on the tags field And you can narrow your you can you know easily narrow to cross segment traffic with tags equals cross segment And it's much harder to do the same thing with source IP in You know 192 dot 168 dot 0 dot 0 slash whatever and desk IP in 10 dot whatever dot whatever like Yeah, you could do it that way, but it's gonna be a lot easier just to use the tag That's already populated for that field I Hope this was a good kind of Malcolm 101 course for really starting to get down and get our hands dirty on With p-cap analysis I Enjoy Maintaining this project and and showing people how to use the tool If you've got feedback for me Reach out on on github Reach out via if you want to in the YouTube Comments or or get in touch with me any other way you can figure out And and let me know what you think We're interested to to know how the community is using the tool and and ways that we could improve it Thank you for your time and happy hunting