Introduction
Recovery records are a known benefit of the RAR5 format, yet no research on their effectiveness is available. As such, users do not know whether the recovery record will be effective for their purposes. In this study, both sequential and sparse corruptions of the archive are tested, with varying recovery record sizes.
Background
First of all, why use such software?
Archival software is used to create an archive of files. Basic archival functions are built into common operating systems through ZIP files, but the possibilities are limited. For more options, programs such as WinRAR are required. Whilst WinRAR does support ZIP, its main attraction is the RAR5 format, which has a much greater feature set.
Once an archive of files has been created; it can be sent to someone as a single group, it can be encrypted for added security, and can backed up to the cloud. In the case of RAR5, a recovery record can also be added, protecting the archive from damage.
When a folder containing many files is archived, a single file containing all others can be produced. This single file will be faster to transfer between devices, sometimes substantially so. If a large file is to be uploaded to a storage service, a split archive can be created, turning one 40GB file into, say, twenty 2GB files; so that network instability only requires the reupload of one 2GB chunk rather than the entire 40GB file. Additionally, instability during restoration is less of an issue – if the provider times out whilst the 40GB file is being downloaded, it could be difficult to retrieve the data.
More advanced formats additionally support superior compression algorithms, resulting in much smaller archive sizes than for ZIP. ZIP files are quite basic and do not have features such as split archives and encryption. Anyone working seriously with archival will require something more substantial.
The options are RAR5 through WinRAR, as tested in this study – but notably, also 7z through 7Zip. RAR5 and 7z support many comparable features, and in the author’s personal experience, it has been found that 7z often features superior compression, but objectively does not support recovery data. Any damage to the archive has the potential to result in a complete loss of data. When these archives are uploaded for off-site backup, encryption is essential. The primary concern, after security, is data integrity.
The Recovery Record
What could cause a data integrity issue? When files are stored, the main culprits are bit rot, general disk errors, and disk failure.
Bit rot is a natural phenomenon caused by the decay of charge within the storage device, whereby there is a chance that a bit of data can change from one to zero.
In time, a storage device will encounter an uncorrectable error of some kind when trying to read or write data. The only scenario where this does not occur is if it fails completely in advance. When an uncorrectable error is encountered, there is a chance that data has been lost, depending on the type of error. If the error identifies a failed sector, and that sector is now unreadable, then any file using that sector is now corrupt. A sector is a chunk of data typically either 512 bytes or 4096 bytes in size. These errors can be monitored using CrystalDiskInfo.
These above phenomena are somewhat rare, but if the data is valuable, action must be taken to detect and recover from these faults. This article is not intended as a deep dive into these strategies, but know that the first line of defence is to store the data across multiple devices – any storage device can fail at any time, so if data is stored only on one disk, then this data is unsafe. If using only two storage devices, ensure they are isolated from each other to mitigate the risks of accidental deletion and malware.
A backup strategy can include RAR5 archives with recovery records, such that any missing data can be recovered – but just how useful is the recovery record?
A recovery record must be added to an archive whilst it is intact. The recovery record is then used to resolve any corruptions encountered later.
How do we know if an archive has been corrupted?
Knowing Whether An Archive Is Corrupt
Good archive formats such as RAR5 contain checkums which they use to verify data integrity. A manual check can be performed using WinRAR’s Test function. Extraction can also be attempted, and WinRAR will report any faults it encounters, such as with the below message.
WinRAR, during an extraction, reporting archive corruption.
Even after repairing the archive, WinRAR may report “Unexpected end of archive”. This does not necessarily indicate an issue. It was found that even after seeing this message, the extracted files verified successfully. This was verified using independent checkums created with LiamFootHash.
WinRAR reporting unexpected end of archive, after repairing.
Repairing an Archive
Once finding that the archive is corrupt, repairs can be attempted. Open the archive, go to Tools > Repair Archive, and click OK. A window similar to the following then appears:
WinRAR repairing 1MB of damage
Note that the original archive was named “Firmware.rar”, and the repaired archive is named “fixed.Firmware.rar”. This is an automatic convention. Once this stage completes, the fixed archive must be extracted from; not the initial, damaged archive.
If the archive has damaged headers, it may not open at all when clicked through File Explorer – in this case, the WinRAR Graphical User Interface or command-line must be used.
Note that the presence of a fixed archive does not mean it was actually fixed. This “fixed” archive must be tested. The key point from the image below is that a file reports as corrupt. This means the repair was not successful. The unexpected end of archive, and corrupt recovery record, are not important. If the repair was successful, the fixed archive should be extracted and then re-archived.
When testing the fixed archive, the archive is known to be fixed correctly if there are no reported checksum errors. See an example of a checksum error below, indicating a failed repair.
WinRAR fixed archive reports as corrupt when tested.
Recovery Testing Introduction
In the next section, an archive containing a single file, nested inside two folders, is created. This is simply the format in which the file was found, a PlayStation 3 firmware, that was randomly selected for testing. As such, all tests are performed on an archive containing only one file.
In this testing, the archive contains a recovery record of 1% (Firmware.rar) or 5% (Firmware5.rar), as specified, with other archive settings being identical. Sequential and sparse corruptions are then tested against the archives containing 1% and 5% recovery records, explained in further detail in the discussion. Though a user specifies a percentage for the recovery record, WinRAR will decide on its exact sizing.
The corruptions are performed by replacing bytes within the file. To do this, a brief .Net 6 Console Application was created. This program reads the file stream, and writes random noise to the appropriate positions.
All prefixes such as k (kilo) and M (mega) are used as binary prefixes, meaning they