file deduplication

File system-based deduplication

File system-based deduplication is a simple method to reduce duplicate data at the file level, and usually is just a compare operation within the file system or a file system-based algorithm that eliminates duplicates.

An example of this method is comparing the name, size, type and date-modified information of two files with the same name being stored in a system.

If these parameters match, you can be pretty sure that the files are copies of each other and that you can delete one of them with no problems. Although this example isn’t a foolproof method of proper data deduplication, it can be done with any operating system and can be scripted to automate the process, and best of all, it’s free.

File-based delta versioning and hashing

More intelligent file-level deduplication methods actually look inside individual files and compare differences within the files themselves, or compare updates to a file and then just store the differences as a “delta” to the original file.

  • File versioning associates updates to a file and just stores the deltas as other versions.
  • File-based hashing actually creates a unique mathematical “hash” representation of files, and then compares hashes for new files to the original. If there is a hash match, you can guarantee the files are the same, and one can be removed.



Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.