Web Design

Categorized | cli, System

Find duplicate files in Linux

Let’s say you have a folder with 5000 MP3 files you want to check for duplicates. Or a directory containing thousands of EPUB files, all with different names but you have a hunch some of them might be duplicates. You can cd your way in the console up to that particular folder and then do a

find -not -empty -type f -printf “%s\n” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=separate

This will output a list of files that are duplicates, according tot their HASH signature.
Another way is to install fdupes and do a

fdupes -r ./folder > duplicates_list.txt

The -r is for recursivity. Check the duplicates_list.txt afterwards in a text editor for a list of duplicate files.

Author Profile

T4L ;

Other posts by

Author's web site



Are you satisfied with this blog?
Why not subscribe our RSS Feed? you will always get the latest post.


7 Comments


  1. Thanks so much for posting this! This is exactly what I was looking for. Very useful.

    1
  2. Nate Chapman

    Wow this is awesome, thanks! Is there any way you could send me an email that explains this code? That would be a huge help.

    2
  3. Ross

    FYI, on cygwin, I had to modify the command to be:

    find . -type f -printf ‘%s\n’ | sort -rn | uniq -d | xargs -I{} -n1 find -ty
    pe f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=sepa
    rate

    3
  4. Ross

    The above one-liner is O(n**2) in the number of nodes of the filesystem. In addition to the original `find` command, a separate `find` command must be run on each found duplicate. This is problematic if you’re running on a slow filesystem and/or through cygwin.

    Here is a modified command line that only invokes one `find` command total:

    find . -type f -exec stat –printf=’%32s ‘ {} \; -exec md5sum {} \; |sort -r
    n | uniq -d -w65 –all-repeated=separate

    4

  5. I’ve further optimized this to eliminate md5sums for all but the files that match other files in this post.

    5

  6. +1 for fdupes! I was not aware of this tool, and it is VERY handy! It also allows you to delete the duplicates on the fly :)

    6
  7. pbl

    Thanks. fdupes is just what I needed to seek clutter. Didn’t know it, but I knew their out to be a nice way to do that recursively.

    7

Leave A Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>