detect duplicate files
duplicate file finder
deduplicate data
de-duplicate files find identical files
remove redunant files find duplicate files identical hash value
same checksum value

download file manager utility

PeaZip is a free cross-platform file archiver and hashing utility that provides an unified portable GUI for many Open Source technologies like 7-Zip, FreeArc, PAQ, UPX...
Create 7Z, ARC, BZ2, GZ, *PAQ, PEA, QUAD/BALZ, TAR, UPX, WIM, XZ, ZIP files
Open and extract over 180 archive types: ACE, ARJ, CAB, DMG, ISO, LHA, RAR, UDF, ZIPX files and more...
Features of PeaZip includes extract, create and convert multiple archives at once, create self-extracting archives, split/join files, strong encryption with two factor authentication, encrypted password manager, secure deletion, find duplicate files, calculate hashes, export job definition as script.

free file manager

find duplicate files
remove duplicate data

How to find duplicate files


find duplicate files

Data deduplication, to identify and (possibly) remove duplicate content, is important to reduce disk occupation without loss of information (the data being removed exists in other copies), in order to keep under control the size of backup - possibly speeding up the process and sparing space on backup media supports - and to reduce the final size of archives. Some compressors pushes the principle further and integrate mechanisms to identify / remove duplicate data blocks in order to improve compression ratio.

Search for duplicate files

When browsing a filesystem the file browser can show file checksum / hash value on demand in last column, allowing to identify binary identical files which have same checksum/hash value.
Clicking the name of the function (in context menu, "File tools" group) will display hash or checksum value for all (or selected) files.
Clicking "Find duplicates" will display size and hash or checksum value only for duplicate files - same binary identical content featured in two or more distinct files - and report the number of non-unique files identified.

In both cases, sorting for CRC column allows to group all files (in same folder, or same search filter) with identical hash or checksum.

The verification function used to deduplicate files can be set in main application's menu: Organize, Browser, Checksum/hash), a wide selection of algorithms can be selected, ranging from simple checksum functions as Adler32, CRC family (CRC16/24/32, and CRC64) to hash functions like eDonkey/eMule, MD4, MD5, and cryptographically strong hash as Ripemd160, SHA-1 and SHA-2 (SHA224/256/384, and SHA512), and Whirlpool512.

When browsing an archive this on demand verification is not available, but some archive types provides the same integrity-checking information, saving for each archived object the pre-computed checksum or hash value depending on the archive format, and on the archival settings employed - i.e. CRC32 in ZIP archives - allowing to sort archive content by CRC column to group identical files and find out duplicates.

Find similar images

When browsing a filesystem, PeaZip can display image thumbnails to help deduplication: in context menu, organize, check show picture thumbnails, or select a file browser's preset style showing thumbnails.
While checksum/hash based inspection allows to search for exactly identical files (and images), thumbnails allows the user to visually detect similar images (i.e. same picture or graphic saved in different formats, or with different color depth or compression settings, or scaled to different sizes), to help in deciding if the (pseudo) duplication is acceptable, and what copy (or version) to keep or delete.
As role of thumb for deleting extra versions, the best quality image (larger resolution, lower compression or possibly lossless format as RAW, BMP, TIFF, PNG) should be kept, discarding lower quality copies: once lost, information/quality cannot be recreated.

Verify multiple checksum and hash values at once

Check files utility in "File tools" submenu (context menu) allows to compare multiple hash and checksum algorithms of multiple files at once.
Employing multiple functions, and relying on cryptographically strong hash algorithms as Ripemd, SHA-2, Whirlpool, can identify even malicious attempt of forging identical-looking files, detecting differences that would go undetected to weaker algorithms, subject to easier found collisions.

Byte-to-byte comparison (alternative deduplication method)

Compare files utility in "File tools" submenu performs byte to byte comparison between two files; unlike checksum/hash method it is not subject of collisions under any circumstance, and can find out and report exactly what the different bytes are - so it not only tells if two files are not identical, but also what changes were made to content between the two versions.

Read more: checksumvalidate data integrity, and hash functionsfind hash value definitions on Wikipedia.


Tag Cloud
CRC32 deduplicate data detect identical files duplicate content files with same hash how to find duplicate files identical checksum MD5 remove duplicate files search for redundant data SHA hash checksum hash tool convert RAR ZIP files download file archiver extract ACE archives extract CAB file extract encrypted file extract RAR TAR ZIP files free RAR software how to encrypt files how to split files open 7Z archives open RAR files open ZIP files portable RAR ZIP software RAR file format secure delete TAR file extraction what is ZIP format ZIP utility ZIPX file extraction

compare files
search for identical files
All PeaZip downloads
PeaZip for Windows 32 bit
PeaZip for Windows 64 bit
PeaZip Portable
PeaZip Linux/BSD
compare hash values
search duplicate files
Online help
Frequently Asked Questions
More information

compare chesckum values
identify duplicates
Support PeaZip project, or donate to FAO, UNICEF and UNESCO from donation page

© PeaZip srl: TOS, Privacy
OSDN software repository
free rar downloads
SourceForge software repository
free rar
Releases Feed zipx files
PeaZip Wiki rar files
Developer email download rar software
Search knowledge-base
rar archiver