As you might know, there is a difference between deleting a file and wiping a file. For the user they seem to have the same outcome, the requested file has been removed. However, when you delete a file, it’s still possible to recover it, while wiping a file makes recovery impossible. In this article, I would like to explain the differences between deleting a file and wiping a file, and also explain how different drive types (HDD vs SSD) affect the outcome.
To explain the differences between deleting a file and wiping a file, and how the drive types affect the outcome of these actions, we first need to understand how both of these drives actually work.
The Hard disk drive
The Hard disk drive (HDD), is an extremely efficient storage device that uses magnetism to store vast amounts of data. The HDD consists out of a circular magnetic “disk” called a platter. The platter is divided into concentric, circular paths called tracks. Each track is broken up into areas called sectors. The file system is responsible for keeping track of the data that is being stored in these sectors. When you save a file, the file system knows what sectors are unallocated and orders the drive to store the data in these sectors.
The drive itself has a controller, this controller simply takes the orders from the computer and performs the required actions. Most drives have an onboard cache where data is temporarily stored while the drive is performing these actions. Some other type of drives, a “hybrid” or SSHD drive (Solid state hybrid drive) has an onboard flash memory where the most commonly accessed data is stored. These hybrid drives, however, also store the data on the platter thus not affecting the outcome of file deletion or wiping.
The Solid state drive
We have been using hard disk drives for years, however, hard disks have one major drawback, they are mechanical. Because of this, the drive needs some time to access the data. The data is stored all over the platter, this means when you want to access a certain file, the drive needs to align the drive heads over the area containing the data and read the magnetically stored information. This can take some time if you want to access a lot of data.
Solid state drives are, as the name implies, not mechanical. It uses several flash chips to store its data. Flash chips have been used for years as embedded storage for several devices including USB-drives, mobile phones, and embedded systems. Internally, the flash memory is not divided into 512-byte sectors, instead, it is organized in a grid. One entire grid is called a block and the individual rows that make up the grid are called a page. The page sizes are 2k, 4k, 8k, 16k, or larger. Every block has 128 to 256 pages. The block size typically varies between 256KB (128 pages * 2k per page) and 4MB (256 pages * 16k per page).
SSD’s do however have one very interesting limitation. While they can read and write data at the page level, they are only able to write at page level if the other pages within the block are empty. And an SSD is only able to erase a complete block at once. This is the main reason why SSD’s are slow when writing data to blocks that already contain data. It has to store the data present in the block in its cache. Erase the block, and then write the old and new data together to the block.
Garbage collection takes care of this issue. Like explained above, data is written to the flash memory in pages. If the data in some of the pages of the block are no longer needed, only the pages with good data in that block are read and rewritten into another previously erased empty block. Then the free pages left by not moving the stale data are available for new data. If the user or operating system erases a file completely, the file will typically be marked for deletion, but the actual contents of the disk are never actually erased. Because of this, the SSD does not know that it can erase the blocks previously occupied by the file, so the SSD will keep including such blocks in the garbage collection.
How does this all relate to file deletion and wiping you might ask. Since the introduction of Windows 7, Mac OS Snow Leopard, FreeBSD 8.1 and Linux 2.6.33, these operating systems include support for the TRIM command. TRIM is an SATA command issued by the operating system to the SSD to tell the controller which blocks of data are no longer needed as a result of file deletions. When a block is replaced by the OS, as with an overwrite of a file, the SSD knows that the original block can be marked as stale or invalid and it will not save those blocks during garbage collection. When a file is permanently deleted or the drive is formatted, the OS sends the TRIM command along with the blocks that no longer contain valid data. This informs the SSD that the blocks in use can be erased and reused. This reduces the blocks needing to be moved during garbage collection. The result is the SSD will have more free space enabling better performance. When garbage collection is run differs per manufacturer, but a general rule of the thumb is that after 30 minutes files are unrecoverable.
As you might imagine, both garbage collection and TRIM doesn’t do a whole lot good for the recovery of deleted data.
Filesystems
Now we have a general understanding on how HDD’s and SSD’s store data, we need to know what a file system is and what it does. The operating system doesn’t directly store files on a drive, it needs a system for that.
What is a file system?
When storing information to a drive the operating system needs to keep track of what is stored where. The storage device itself has no idea what is stored and will just do as it’s told.
The file system is used to keep track how and where the data is stored and retrieved. Without a file system, the data stored on a storage device would be a seemingly random collection of data without any way to tell what is what. By separating data into blocks and registering each block in an index the file system is able to identify and retrieve data on demand.
A file system consists of two or three layers.
Logical
The logical layer handles the interaction with software. It provides applications a way to interact with the file system. In this layer, an application can simply ask the contents of a file by filename e.g. “Read the contents of c:\test.txt”. The logical layer then passes on the request to the layer below it to process the request. This is also the layer responsible for file security (e.g. NTFS permissions).
Virtual
This optional layer is an abstraction layer that provides an interface between the kernel and file system. Through a VFS, client applications can access different file systems.
Physical
The physical layer handles the physical operation of the storage device. It processes physical blocks being read or written. It also handles buffering and memory management and is responsible for the physical placement of blocks on the storage device.
Different file systems
Throughout the year’s several file systems have been developed. I will touch on the most commonly used file systems.
Windows
Microsoft Windows used three major file systems: FAT, NTFS, and ReFS.
FAT
File Allocation Table, the simplest type of filesystem. It’s inherited from the old DOS. The most commonly used version is the FAT32 version. It consists of a file system description sector, a file system block allocation table and a plain storage space to store the files and folders. It’s commonly only used on portable storage media.
This is the most commonly used file system on removable storage.
NTFS
New Technology File System, developed for Windows NT and the default File System for all NT based operating systems since. It found its way into home computers since the release of Windows XP. Each file on NTFS is stored as a file descriptor in a Master File Table. The MFT contains all information about the file including size, allocation, name etc. NTFS was never publicly documented which resulted in poor third-party support.
This is the default file system in Microsoft Windows.
ReFS
Resilient File System, developed for Windows 8 servers. This a completely newly developed file system for Microsoft operating systems. It uses B+ trees for all on-disk structures, including all metadata and file data. Metadata and file data are organized into tables similar to a relational database.
macOS
Apple has two major file systems: HFS+ and APFS.
HFS+
Hierarchical File System+ is a file system developed by Apple Inc. It served as the primary file system of macOS. HFS+ was developed to replace the HFS as the primary file system used in Macintosh computers. It is also one of the formats used by the iPod digital music player. HFS+ is also referred to as Mac OS Extended or HFS Extended, where its predecessor, HFS, is also referred to as Mac OS Standard or HFS Standard. Like HFS, HFS+ uses B-trees to store most volume metadata, but unlike most other file systems, HFS+ supports hard links to directories. Since macOS High Sierra the default file system has been changed to APFS.
This is the default file system on macOS until High Sierra.
APFS
Apple File System is a proprietary file system for macOS, iOS, tvOS, and watchOS, developed and deployed by Apple Inc. APFS is optimized for flash and solid-state drive storage, with a primary focus on encryption. It uses 64-bit inode numbers and allows for more secure storage. The APFS code, like the HFS+ code, uses the TRIM command, for better space management and performance.
This is the default file system on macOS since High Sierra.
Linux
Linux is a bit complex since there are so much different use cases for Linux (Main desktop OS, NAS, Phones, and Embedded systems) there are several file systems developed throughout the years.
EXT (2,3,4)
Extended filesystem is a journaling file system for Linux. A journaling file system is a file system that keeps track of changes not yet committed to the file system’s main part by recording the intentions of such changes in a data structure known as a “journal”, which is usually a circular log. In the event of a system crash or power failure, such file systems can be brought back online more quickly with a lower likelihood of becoming corrupted.
This is the most commonly used file system on Linux systems.
ReiserFS
ReiserFS is a general-purpose, journaled computer file system. It is currently supported for GNU/Linux and may be included in other operating systems in the future. Introduced with version 2.4.1 of the Linux kernel, it was the first journaling file system to be included in the standard kernel.ReiserFS is the default file system on the Elive, Xandros, Linspire, and YOPER Linux distributions.
XFS
XFS is a high-performance 64-bit journaling file system. XFS is supported by most Linux distributions, some of which use it as the default file system. XFS excels in the execution of parallel input/output operations due to its design, which is based on allocation groups.Space allocation is performed via extents with data structures stored in B+ trees, improving the overall performance of the file system, especially when handling large files.
How does a file system work
As you can see there are different file systems for different operating systems and different tasks. Each file system has their own way of organizing and storing files on a storage medium. I won’t go into detail on how each file system works but they all have some things in common, the storage device will have one or more partitions and each partition is formatted with a file system.
A file system provides a way of separating the drive into smaller pieces (sectors) in which data can be stored. It also provides a way to store information about the data, for example, their filenames, permissions and other metadata. The file system contains an index, a list of all the data that is stored on the drive and where they’re located. The logical layer reads this information and knows where to find the requested file by looking it up in the index, it then passes the found sectors to the next layer for retrieval. Applications don’t communicate with a drive directly, for them access to the files is transparent, applications don’t need to know anything about sectors, they simply request to open a file and the file system does the rest.
File deletion vs file wiping
Now we have a basic understanding of how a drive works, and what a file system is and how it works we can about the difference between deleting a file and wiping a file.
A commonly used metaphor is a book. Files are stored on pages of the book, and the index keeps track of where the files are stored.
Using this metaphor deleting a file will remove the file from the index, but will leave the file written on the page while wiping a file will erase the contents of the page by overwriting them. The reason why the contents of the page are not erased when you delete a file is speed. Erasing the page takes time, in the case of a hard drive the head needs to move to the related sectors and erase its contents. There is no reason to do so since hard drives have no problem overwriting old data. It’s normal for the drive to continuously overwrite old data if the drive has been in use for some time.
This is the reason why while time passes it gets harder to recover deleted data, the longer you wait the larger the chance your files are overwritten with new data. The same reason why it’s important to carefully make decisions when encountering a live system (also read: Should you pull the plug?).
File wiping, however, erases the contents of a file before deleting it. It overwrites the sectors containing the data associated with the file you want to delete. Recovery tools will no longer be able to recover the file.
Wiping methods
There are several standards for file wiping. The most common ones are:
- DoD 5220.22-M
- Gutmann
- Schneier
- Random Data
- Zero-fill
DoD 5220.22-M
The DoD 5220.22-M method is a three pass overwriting method:
First pass with zeroes.
Second pass with ones
Third pass with random data.
Gutmann
The Gutmann method consists out of 35 passes. An overwrite session consists of a lead-in of four random write patterns, followed by a set of patterns pass 5 to 31, executed in a random order, and a lead-out of four more random patterns.
Schneier
The Bruce Schneier method consists out of seven passes:
First pass with ones.
Second pass with zeroes.
Third, Fourth, Fith, Sixth, and Seventh pass with random data.
Random Data
With this method, your data is overwritten with random data. There is no set amount of passes.
Zero-fill
The drive is filled with zeroes.
File recovery after wiping
One common question is if it’s possible to recover files after the file has been wiped. Yes, theoretically it’s possible to recover files from wiped drives. This is also the reason there are several different wiping standards/methods.
A commonly discussed method is the use of an electron microscope to determine the magnetic charge of the platter. A hard drive uses magnetism to read and write data. The drive head produces an analog signal that’s converted to a digital 1 or 0 when reading the disk. This analog signal will never be a perfect signal but close enough to know the difference between a 1 and a 0. By analysing the magnetic charge of the platter it might be possible to determine the previous charge and thus the previous value.
Research done by Craig Wright, Dave Kleiman, and Shyaam in 2008 demonstrated that correctly wiped data cannot reasonably be retrieved even if it is of a small size or found only over small parts of the hard drive. Not even with the use of an MFM or other known methods. The belief that a tool can be developed to retrieve gigabytes or terabytes of information from a wiped drive is in error.
Overwriting Hard Drive Data: The Great Wiping Controversy Craig Wright, Dave Kleiman, and Shyaam – 2008 https://www.vidarholen.net/~vidar/overwriting_hard_drive_data.pdf |
HDD vs SSD
In this article, I mainly discussed the hard drive, and there is a reason for that. As I explained, SSDs can’t write over old data like HDDs can. This means an SSD has to perform garbage collection and is, in fact, wiping old data itself. An SSD is, in fact, a self-wiping drive. When a user deletes a file the operating system will notify the SSD with the TRIM command that it can erase this file the next time it runs its garbage collection routine. Once issued there is no way to cancel the TRIM command, and there is no way to really stop the garbage collection routine from being run. This means while you are imaging the drive using a write-blocker it’s quite possible for the drive to be running its garbage collection routine and erasing evidence. I have seen images made from the same SSD right after each other with a forensic duplicator each having a different hash value and slightly different contents (the second image having fewer files in unallocated space). File wiping also doesn’t always work as expected on SSDs, wear-leveling, another feature of SSDs, tries to spread the writes over all the flash chips in order to evenly wear them all.
File wiping also doesn’t work as expected on SSDs. SSDs also have a feature called wear-leveling. As explained flash memory has individually erasable segments, each of which can be put through a limited number of erase cycles before becoming unreliable. Wear leveling will arrange the data so that erasures and re-writes are distributed evenly across the medium. In this way, no single erase block prematurely fails due to a high concentration of write cycles. So when you are trying to wipe a file and ask the SSD to “overwrite” an existing sector, it doesn’t actually overwrite or delete the existing data immediately. Instead, it writes the new data somewhere else and just changes a pointer to point to the new version (leaving the old data stored on the drive). The old version may eventually get erased, or it may not. As a result, even data you think you have erased may still be present on the SSD.
The question that remains is how do you erase an SSD if wiping isn’t a reliable solution. A group of engineers published a study regarding this subject in 2011. According to their study, none of the available software techniques for sanitizing individual files were effective. To remedy this problem, they described and evaluated three simple extensions to an existing FTL that make file sanitization fast and effective.
Overall, they concluded that the increased complexity of SSDs relative to hard drives requires that SSDs provide verifiable sanitization operations.
Reliably Erasing Data From Flash-Based Solid State Drives Michael Wei, Laura M. Grupp, Frederick E. Spada, Steven Swanson – 2011 https://www.usenix.org/legacy/events/fast11/tech/full_papers/Wei.pdf |
It used to be possible to do a chip-off on an SSD in order to access these fragments of data. But since all modern SSDs store, their data encrypted it’s nearly impossible to retrieve data from the flash chips with a chip-off. The fact is that SSDs are quite complex and each manufacturer differs from the other meaning there is no single answer to the question if recovery is possible if you wipe a file on an SSD.
The bottom line is that recovery on an HDD is quite possible if the file has been deleted, but near to impossible if the file has been wiped. On an SSD it’s hard to tell because of its complex inner workings, but if everything works correctly it won’t be possible to recover a file when it’s deleted
Summary
The difference between deleting a file and wiping a file on a hard drive is simple. When deleting a file the data remains on the drive while wiping a file overwrite the data with (random) other data destroying the original data. When looking at the differences between traditional hard disk drives (HDD) and solid state drives (SSD) it becomes clear that the same does not apply to SSDs. Because of the characteristics of an SSD deleting a file will already destroy it while wiping a file does not have the same effect as it has on a HDD.