 The impish was first introduced by Mandiant FireEye in an article named Tracking malware with import hashing. That was 2014. Mandiant revealed that they used the impish specifically to track certain groups of threat actors, because they discovered that their samples often share the same imports. The article explains the basic idea. The imports of a file are representative of the file's behavioral capabilities. If the order of functions is changed within the code, the order in the import table is different. If the order of source code files is changed, the import table is also different. The impish takes advantage of that and calculates the hash value. Based on the order, the imports appear in a file. Mandiant states that the same impish in different files indicates the same origin of code. Oftentimes, these are the same threat groups, but malware builders might also be shared by several groups. The impish has since been adopted by many security vendors and tools. For instance, VirusToto shows the impish. PE Studio Portex Analyzer Yara also has an impish plug-in, so you can hunt for malware using the impish. AV vendors made the impish part of some of their signatures and use it as a feature for AI-based detection, meaning the impish is one characteristic used to teach an AI to recognize malicious PE files. Mandiant published their impish reference implementation in the Python library PE file, which is open source in GitHub, so we can see exactly how it is calculated. The algorithm works as follows. First, collect all PE imports. Second, remove the extensions OCX, SYS, DLL from module names. Third, resolve names of ordinal entries for the following modules. Non-resolved ordinal entries are written as odd ordinal number, for instance, odd 3. 4. Build the string from import table entries by concatenating imports in order of appearance. Module name and function name are separated by dot. Imports are separated by comma. 5. Put the whole string lowercase. 6. Save the MD5 hash of this string. The result is the impish. Knowing the algorithm, we can conclude the following characteristics. As we already said, a different order of function use in the code results in different hashes. The casing of functional module names does not affect the impish. Some imports by ordinal are resolved to a name, for these it doesn't matter if those imports are by name or by ordinal, they will result in the same impish. For ordinals that cannot be resolved to a name, the impish changes if an import by name is replaced by an ordinal. If you name a function ord number, it will lead to the same hash for different imports. The hash algorithm used is MD5, which was designed to be a cryptographic hash function. That means only one small change in the input will change all of the output. For instance, adding one function will completely change the impish. In my opinion, some parts of the impish could have been designed better. Instead of using MD5, a hash that keeps similarity might have been more suitable to group similar samples. Furthermore, many parts of the implementation lead to inconsistencies. For instance, ordinal imports are only sometimes resolved for a fixed set of functions. Whether you use ordinals instead of names may or may not affect the impish. Adding more ordinal resolvers to the impish is not practical, because different implementations will lead to different hashes. That means the impish on various total for instance would not be comparable anymore to my enhanced implementation. That means the current number of resolved ordinal imports is fixed, as changing it would have impractical consequences. In my opinion, the decision to only resolve selective imports by ordinal is not a good choice and just leads to confusion and bloating of the implementation. The impish is mostly useful on non-packed files, because packer stubs resolve imports at runtime and their import footprint is usually small. The impish is still a good tool for hunting and clustering malware samples. If you look for similar samples, searching by impish might be worth a try. The impish does not suffice on its own for malware detection, but it can be one characteristic of a detection signature. Many malware authors have reacted to the wide adoption of the impish in AV signatures by tailoring their malware to work against it. For instance, some use fake imports to imitate clean files and resolve the actual imports dynamically. The impish does not work for .NET malware, because .NET assemblies usually only import the CLR runtime. Most .NET files have the very same import table consisting of one import. .NET has their own import structures in their metadata tables, so one solution to that is the use of the type ref hash instead of the impish. This .NET import hash was developed by one of my colleagues, Stefan Hausauder. The type ref hash algorithm will be part of another video though. As usual, I will put the links to all referenced articles and hash implementations in the description below, so check it out. Thank you for watching, hedgehogs are awesome, and if you have any questions or video suggestions, please write a comment.