Encryption has become widespread and it’s common to encounter at least a few encrypted files during an investigation. Bruteforcing a password is always an option, however, depending on the type of encryption that has been used this can take a few minutes or even centuries using commonly available computer hardware.
Your best bet when trying to gain access to a file/document or even entire encrypted volume is using a personalized word list. In this post, I am going to explain on how to generate such a wordlist using the free utility bulk_extractor.
A wordlist is, as its name suggests, a list of words. Most decryption tools have support for wordlists or even include some standard wordlists. There are a lot of wordlists freely available on the internet, these lists vary from common dictionary words in a certain language to a specialized list containing video-game characters.
When attacking an encrypted file your best bet will be a personalized wordlist, one way to get such a list is by extracting all words from a drive image. It’s common for people to store their passwords in some way on their drives or to base their passwords on something familiar (family members, pet names, owned cars). By generating a list of all words stored on a computer system there is a good chance your list will contain (parts of) the password. The wordlist doesn’t have to contain the exact password used by the person, most decryption tools are able to perform some basic manipulation on the words, for example, if the tool finds the word tiger in the wordlist, it will automatically try Tiger, TIGER, T1ger, T1G3R, Tiger! etc.
Bulk_extractor is a tool that is able to scan a forensic disk image, directory or file and extract useful information. This information is stored in text files that can be analyzed further. In this post, I will only use it’s wordlist generating capabilities.
You can get bulk_extractor for free from its website or GitHub. I will be using “bulk_extractor64.exe”, this is the 64bit windows version of bulk_extractor version 1.5.2. The main advantage of this version that it requires no installation. You should be able to follow the guide using any version of bulk_extractor.
!Please note, I have renamed bulk_extractor64.exe to bulk_extractor.exe
Once you have acquired bulk_extractor you can use the following command to generate a wordlist:
|bulk_extractor -E wordlist -o E:\Results\ E:\Images\Evidence.001
The command broken down:
|Running the tool
|Only run the wordlist module.
(The wordlist module is disabled by default)
|Save the results in this directory
|The image file we are going to scan.
By default bulk_extractor only exports words between the 6 and 14 characters long. This is a good setting to start with since bulk_extractor will also generate a lot of “noise”. If you want to change the length of the words it exports you can use the following -S switches:
|(default: 6) the minimal word length
|(default: 14) the maximum word length
The time it takes to generate the list varies depending on the size of the image, the contents of the image and the system you are using. Bulk_extractor is I/O and CPU intensive, so using a fast multi-core CPU and an SSD will speed things up noticeably.
With an i7-6700K and an SSD, the average overall performance will be around 100MB/sec. In my case, it took 41 seconds to process a 30GB Windows XP Image (containing 4GB of data).
|E:\Tools\bulk_extractor -E wordlist -o E:\Results/ E:\Images\Evidence.001
bulk_extractor version: 1.5.2
When bulk_extractor finishes it will generate 4 files in the result directory.
|Any errors/alerts generated during the scan will be stored here.
|A report stored in XML format containing scan details.
|A list of all words extracted during the scan.
|The wordlist without duplicates.
The wordlist_split_xxx.txt is the list you will want to use when running an attack on an encrypted file/container. It will contain all the “words” that were detected during the scan but without any additional data or duplicates.
When you open the list you will notice that the list will contain a lot of “noise” meaning there are a lot of “words” that don’t make any sense or aren’t even words, to begin with. This is one of the limitations of these tools. There have been attempts to clean these lists up by comparing them to a dictionary and filtering all the gibberish out, but doing this also removes all password that the user might have made up. While generated wordlists might be long and will contain a lot of noise, they remain a great way to break encryption.
|“Password Strength” by Randall Munroe (XKCD) is licensed under CC BY-NC 2.5