In this article I will tell you how to extract malicious files from network captures, i.e. how to create your antivirus.
Network takeovers are a common occurrence in security events. Malware present on any network can be easily extracted using tools like Wireshark. If you have a lot of PCAP files, how would you extract them all? Let’s look at a script to extract PE files from Pcaps using Python and Scapy.
Scapy is a fairly powerful Python library. In less than 200 lines, we can write a simple parser to extract PE files from HTTP requests. It’s convenient to be able to create your tools or have one that’s fairly versatile and can be customized to suit your needs.
How it works?
By passing a single file or directory with network captures, the script will read and parse them to extract the PE files it finds.
The output will vary depending on the file. I got a few sample packets from malware traffic analysis and started going through them.
The code is not final, but it works. We hope that others will find it useful to expand, adapt, and use it for teaching.
The program code is here (Zen does not have normal code formatting)
I placed all the code in main.py. List of all required libraries in requirements.txt. This will be more than enough to get acquainted.
The most important function in this code is extract_http_objects() :
- Open PCAP file
- Load sessions and iterate through them, filtering HTTP to retrieve files.
- Write all files to the output directory
Opening and loading a PCAP file using Scapy?
This part is pretty simple. After loading the Scapy module in Python, we just need to open it and read its contents.
I have previously explored several possible methods. I rely on rdpcap() for this purpose.
Discover: The Dark side of Python no one told you about
HTTP traffic filtering
The code will not do anything complicated to filter the traffic and will simply focus on any message where the source or destination port is 80. This will be similar to filtering in Wireshark using tcp.port == 80 || tcp.dstport == 80.
Filtering in Wireshark
The code will iterate over all sessions. Look at each packet and check if the packet level is TCP and the source or destination port is 80 and store the payload.
Once all the information has been collected, we can analyze the payload and try to determine if there is any indication that it contains a file.
HTTP header analysis
The next step is to analyze the HTTP headers. There’s nothing special about the goals: we’re just trying to create a dictionary to access content as key/value pairs and evaluate whether something specific we’re looking for is present.
HTTP headers in Wireshark
There are a lot of things that can be filtered by looking at HTTP headers, so it’s impossible to catch every combination. The good thing is that after analyzing the headlines, you can filter them.
HTTP header analysis
The code is adapted for several cases. For the sake of practical implementation and testing the possibility of extending this approach to a more general one.
The goal was not to find them all, but to find common cases among the many network interceptions where malware is downloaded from a remote server via HTTP.
So far it’s working quite well.
Extracting objects from HTTP payload
The script is designed to extract objects from HTTP payloads. In this case, filtering will begin by confirming the presence of Content Type .
Based on an initial sample of about 40 computers containing malware (not a lot, but just a start), I decided to use 2 filters.
- application/x-msdownload
- application/octect-stream
There are many known variations of content types and this page is useful for learning/reading about them. One idea is to perform various actions based on the Content-Type field if present since it can be a good indicator of other file or content types.
The extract_object() function does the job:
We can record some more information here so that after a failed extraction, we can look at the logs and see what we missed.
Wireshark HTTP Conversation
Running the script
How can this scenario be expanded?
- Extract new file types
- Disassemble other protocols
- Code refactoring
I have a few more ideas in this code as I’m currently using it to extract files that can later be fed into the malware pipeline to get information and find good examples or files to dig deeper into.
You can’t do malware analysis without having access to malware samples, so creating a good repository for them (besides the known public ones) can be helpful.
Performance improvements?
I’m pretty deep into running this script. It’s worth trying the PcapReader() option, which consumes less memory. The goal would be to reduce the resource consumption of parsing multiple Pcap files at once.
Automation is great.
Extracting information from network captures can be useful. Scapy is a powerful and useful Python library for this purpose. It can also even be used to sniff the network. I will talk about this in the following articles.
❤️ If you liked the article, like and subscribe to my channel “Codelivly”.
👍 If you have any questions or if I would like to discuss the described hacking tools in more detail, then write in the comments. Your opinion is very important to me!