Snap! Websites
An Open Source CMS System in C++
In the last two days, I worked on a Jira task I had assigned to me in link with shredding files. Whenever you purge our project, it deletes a lot of files, including all the logs, configuration files, etc. By default, though, the purge command will just do:
rm -rf <path> ...
This generally works, only it is much better if you can also make sure that the content of the file is not recoverable. Under Unix systems, it often has been close to impossible to recover files because the OS was very quick at over reusing the just released blocks for new data. Not only that, the location of the data would be deleted for real (i.e. under FAT you have a block with block references which stick around for a while).
Because of that, most people always considered that a deleted file was just gone.
Now we've seen a large number of tools appear and being used to search a disk for data. Such tools ignore the existing file system and read all sectors that do not return with just zeroes or some pattern (i.e. 0x55AA). And although the prominence of such tools first appeared under Microsoft OS, many existed for all OSes that ever exited. Pretty much.
So... whenever you are asked by a recommendation to delete all confidential data, especially data that could identify your website or app users, you are expected to do something to really get rid of that data.
This new tool is part of the snaplogger project.
For that reason, the shred tool was created. It's a simple executable which will open a file, overwrite the data with random data (or patterns if you use your own input device) and optionally delete the file by first renaming it. The renaming is just like the overwriting of the data in the file, by renaming you overwrite the name which means that the old name is lost.
Further the file is truncated before getting deleted. Why? Because that way you do not know the original size.
All of that work is done with the sync feature turned on. In other words, it requires the kernel to write everything directly to disk, constantly. Without the immediate synchronization, the kernel would likely cache everything and only save the last values. If you delete the file, the last value would be the original file (i.e. in other words, the sectors holding the sensitive data would still be holding said sensitive data).
I guess the main reason is that in most cases files being deleted do not require shredding in the first place. However, when they do, it most often doesn't come to mind.
In our case, we wanted to be able to shred certain files before they get deleted on a purge of a package. So in other words, if you uninstall Snap! from your computers, some files, specifically the secure logs and Cassandra database files get shredded. This is completely automatic so you do not have to worry about it.
Further, the log rotation needs to shred files before deleting them. That too we do. We have the shred option already in the file1.
Less and less.
There are several reasons for the shred utility to not work so much anymore.
First and probably the most important one reason: newer file systems do not overwrite data. Instead they write new data in new sectors, this is called journaling. That way, if something goes wrong, you do not lose all your data, only the new data that could not be saved.
The other reason for the shred feature to fail is the disk drive controller. It may actually decide to write your data in a new block region. This happens a lot on SSD devices. For increased speed and better use of your drive in general, an SSD drive writes data to a new block nearly every single time. This way we avoid a whole read + rewrite cycle which saves a lot of time. Not only that, it means all cells on the SSD drive are likely to age at a similar speed (otherwise some would age very quickly and fail much sooner).
Because of these two features, the likelihood that the shred utility is still useful in your situation is really low. This is why we created shredlog instead. A form of indirection which you can control through options in a configuration file.
Note that writes to SSD drives are very fast now a day, so that's not a huge worry. However, more writes means that the drive ages faster.
The number of writes to a cell in your SSD drive is limited. Older drives may have limits as low as 1,000 write cycles. Newer drives are expected to have a much higher number such as 100,000 writes. Either way, each time you send a write command, you reduce your drive lifetime (at least in terms of writes).
So not only will your not be able to shred anything on an SSD, but you would lose many writes if you were to use that feature.
First it is perfectly capable of deleting an entire tree of directories and files.
But the most important feature is the fact that it automatically checks whether a drive is rotational or not. However, this is close to useless because virtual servers (what you get 99% of the time now, with through Amazon, DigitalOcean, Google Cloud, etc.) are all showing their hard drive as HDD (rotational) drives even if the physical drive is an SSD.
So for now the default is for the shredlog tool to just do a regular rm command.
The shredlog still adds the recursivity and it can easily be switched to use the shred command, so we still think it is quite useful.
The command is documented in our references (near the start of the source file, search for "/** \file"). Most of the command line options come from the shred tool directly, such as the --zero | -z option which is used to have a last pass write all zeroes in your files.
If you are already using shred, then just add "log" at the end and it will work the same magic.
You can then tweak options in the configuration file (/etc/snaplog/shredlog.conf) or in the SHREDLOG environment variable (don't forget you probably want to export that variable).
Right now the configuration file says delete as the default (instead of auto).
That way, over time, if it becomes possible to use shred again, you will be able to update the configuration settings in one place and get all your shredding done automatically.
If you have rm commands you would like to convert, it is very simple too:
The rm -r ... means recursive. The shredlog command accepts --recursive | -r.
The remove command is also often used with --force | -f. The shredlog also has the same command line option.
If you used the rmdir executable instead, then you must avoid the --force | -f option. That way shredlog will also use rmdir to delete directory and keep non-empty directories.
Since shredding is not really possible on most servers, we needed a work around to still make our data secure, especially against others inheriting our hard drive(s).
The current method is to switch to encryption. This can be done on a folder, partition, or whole drive level. Encryption means that the data on disk is unreadable if (1) you do not have an entire block of it and (2) you do not have the secret key.
There is one huge drawback in regard to encryption: for the OS to get started, it needs to have the encryption/decryption key. In other words, you somehow have to enter a password each time you reboot your system. In itself, this is not that bad, but if you manage many computers, trust me, that's work and probably the boring kind. Please do not encrypt using an empty password or offer the computer a way to download the password from somewhere, such solutions mean that any hacker has access to your password and can use it to decrypt all your encrypted data.
With encryption, you can simply delete a file. Without the key, the file is already unreadable. Shredding it would not help.
Snap! Websites
An Open Source CMS System in C++