Web Design

Categorized | cli, System

Find duplicate files in Linux

Let’s say you have a folder with 5000 MP3 files you want to check for duplicates. Or a directory containing thousands of EPUB files, all with different names but you have a hunch some of them might be duplicates. You can cd your way in the console up to that particular folder and then do a

find -not -empty -type f -printf “%s\n” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=separate

This will output a list of files that are duplicates, according tot their HASH signature.
Another way is to install fdupes and do a

fdupes -r ./folder > duplicates_list.txt

The -r is for recursivity. Check the duplicates_list.txt afterwards in a text editor for a list of duplicate files.

Author Profile

T4L ;

Other posts by

Author's web site

Are you satisfied with this blog?
Why not subscribe our RSS Feed? you will always get the latest post.


  1. Thanks so much for posting this! This is exactly what I was looking for. Very useful.

  2. Nate Chapman

    Wow this is awesome, thanks! Is there any way you could send me an email that explains this code? That would be a huge help.

  3. Ross

    FYI, on cygwin, I had to modify the command to be:

    find . -type f -printf ‘%s\n’ | sort -rn | uniq -d | xargs -I{} -n1 find -ty
    pe f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=sepa

  4. Ross

    The above one-liner is O(n**2) in the number of nodes of the filesystem. In addition to the original `find` command, a separate `find` command must be run on each found duplicate. This is problematic if you’re running on a slow filesystem and/or through cygwin.

    Here is a modified command line that only invokes one `find` command total:

    find . -type f -exec stat –printf=’%32s ‘ {} \; -exec md5sum {} \; |sort -r
    n | uniq -d -w65 –all-repeated=separate


  5. I’ve further optimized this to eliminate md5sums for all but the files that match other files in this post.


  6. +1 for fdupes! I was not aware of this tool, and it is VERY handy! It also allows you to delete the duplicates on the fly :)

  7. pbl

    Thanks. fdupes is just what I needed to seek clutter. Didn’t know it, but I knew their out to be a nice way to do that recursively.


Leave A Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>