bulk_extractor is a computer forensics tool that scans a disk image, a file, or a directory of files and extracts useful information without parsing the file system or file system structures. The results can be easily inspected, parsed, or processed with automated tools. bulk_extractor also creates a histograms of features that it finds, as features that are more common tend to be more important. The program can be used for law enforcement, defense, intelligence, and cyber-investigation applications.
bulk_extractor is distinguished from other forensic tools by its speed and thoroughness. Because it ignores file system structure, bulk_extractor can process different parts of the disk in parallel. In practice, the program splits the disk up into 16MiByte pages and processes one page on each available core. This means that 24-core machines process a disk roughly 24 times faster than a 1-core machine. bulk_extractor is also thorough. That’s because bulk_extractor automatically detects, decompresses, and recursively re-processes compressed data that is compressed with a variety of algorithms. Our testing has shown that there is a significant amount of compressed data in the unallocated regions of file systems that is missed by most forensic tools that are commonly in use today.
Another advantage of ignoring file systems is that bulk_extractor can be used to process any digital media. We have used the program to process hard drives, SSDs, optical media, camera cards, cell phones, network packet dumps, and other kinds of digital information.
bulk_extractor [options] imagefile runs bulk extractor and outputs to stdout a summary of what was found where
Required parameters: imagefile - the file to extract or -R filedir - recurse through a directory of files SUPPORT FOR E01 FILES COMPILED IN SUPPORT FOR AFF FILES COMPILED IN -o outdir - specifies output directory. Must not exist. bulk_extractor creates this directory. Options: -b banner.txt- Add banner.txt contents to the top of every output file. -r alert_list.txt - a file containing the alert list of features to alert (can be a feature file or a list of globs) (can be repeated.) -w stop_list.txt - a file containing the stop list of features (white list (can be a feature file or a list of globs)s (can be repeated.) -F <rfile> - Read a list of regular expressions from <rfile> to find -f <regex> - find occurrences of <regex>; may be repeated. results go into find.txt -q nn - Quiet Rate; only print every nn status reports. Default 0; -1 for no status at all Tuning parameters: -C NN - specifies the size of the context window (default 16) -G NN - specify the page size (default 16777216) -g NN - specify margin (default 4194304) -W n1:n2 - Specifies minimum and maximum word size (default is -w6:14) -B NN - Specify the blocksize for bulk data analysis (default 512) -j NN - Number of analysis threads to run (default 2) -M nn - sets max recursion depth (default 5) Path Processing Mode: -p <path>/f - print the value of <path> with a given format. formats: r = raw; h = hex. Specify -p - for interactive mode. Specify -p -http for HTTP mode. Parallelizing: -Y <o1> - Start processing at o1 (o1 may be 1, 1K, 1M or 1G) -Y <o1>-<o2> - Process o1-o2 -A <off> - Add <off> to all reported feature offsets Debugging: -h - print this message -H - print detailed info on the scanners -V - print version number -z nn - start on page nn -dN - debug mode (see source code -Z - zap (erase) output directory Control of Scanners: -P <dir> - Specifies a plugin directory -E scanner - turn off all scanners except scanner -m <max> - maximum number of minutes to wait for memory starvation default is 60 -s name=value - sets a bulk extractor option name to be value -e bulk - enable scanner bulk -e wordlist - enable scanner wordlist -x accts - disable scanner accts -x aes - disable scanner aes -x base16 - disable scanner base16 -x base64 - disable scanner base64 -x elf - disable scanner elf -x email - disable scanner email -x exif - disable scanner exif -x gps - disable scanner gps -x gzip - disable scanner gzip -x hiber - disable scanner hiber -x json - disable scanner json -x kml - disable scanner kml -x net - disable scanner net -x pdf - disable scanner pdf -x vcard - disable scanner vcard -x windirs - disable scanner windirs -x winpe - disable scanner winpe -x winprefetch - disable scanner winprefetch -x zip - disable scanner zip
cyborg@cyborg:~$ bulk_extractor fileewf.E01 -o log.txt bulk_extractor version: 1.3 Hostname: cyborg Input file: fileewf.E01 Output directory: log.txt Disk Size: 2141356032 Threads: 2 Phase 1. 14:14:55 Offset 33MB (1.57%) Done in 0:00:09 at 14:15:04 14:16:13 Offset 100MB (4.70%) Done in 0:26:24 at 14:42:37 14:16:58 Offset 167MB (7.83%) Done in 0:24:07 at 14:41:05 Thread 0: Processing 167772160 Thread 1: Processing 201326592 Time elapsed waiting for 2 threads to finish: 12 sec (please wait for another 59 min 48 sec.) Thread 0: Processing 167772160 Thread 1: Processing 201326592 Scanner net Exception: Error: Attempt to read past end of sbuf processing (179634301-ZIP|0) Time elapsed waiting for 1 thread to finish: 19 sec (please wait for another 59 min 41 sec.) Thread 1: Processing 201326592 All Threads Finished! Producer time spent waiting: 159.868 sec. Average consumer time spent waiting: 0.239627 sec. ******************************************* ** bulk_extractor is probably CPU bound. ** ** Run on a computer with more cores ** ** to get better performance. ** ******************************************* Phase 2. Shutting down scanners Phase 3. Creating Histograms ccn histogram... ccn_track2 histogram... domain histogram... email histogram... ether histogram... find histogram... ip histogram... tcp histogram... telephone histogram... url histogram... url microsoft-live... url services... url facebook-address... url facebook-id... url searches... Elapsed time: 184.8 sec. Overall performance: 0.9984 MBytes/sec. Total email features found: 1376