Pages: [1]   Go Down

Author Topic: Cleaning up a nightmare!! Best tools to sort and de-dupe?  (Read 1480 times)

mschubb

  • Newbie
  • *
  • Offline Offline
  • Posts: 26
Cleaning up a nightmare!! Best tools to sort and de-dupe?
« on: February 07, 2020, 12:17:30 pm »

Need to cleanup a family member's archive of pictures which are a complete mess. Multiple dupes in multiple folders, random back-ups of back-ups, some maybe not backed up... tens of thousands of files all jumbled together across multiple drives with no consistent labeling. 

Would really appreciate any advice or recommendations for software tools and workflow.  A simple file duplicate finder isn't sophisticated enough for this job and I am hoping someone who has tacked a mess like this can share some wisdom.  How can I efficiently select and delete many thousands of dupes in order to create a single archive that can then be backed up as a whole?

Thank you in advance for any help!

Mark
Logged

kers

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2925
    • Pieter Kers
Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
« Reply #1 on: February 07, 2020, 01:52:18 pm »

I have a friend with the same problem... Is it a mac or windows?
Anyway it will cost you a LOT of work to organize it or... you can just put everything on a harddisk and try Neofinder.
It makes a catalogue of theharddisk ; now images and other things and you can search for similar photos etc.
You could also choose Media pro to do that, but is is 32 bit and stops working at 10.15
Logged

Bart_van_der_Wolf

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 8573
Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
« Reply #2 on: February 07, 2020, 02:13:41 pm »

Need to cleanup a family member's archive of pictures which are a complete mess. Multiple dupes in multiple folders, random back-ups of back-ups, some maybe not backed up... tens of thousands of files all jumbled together across multiple drives with no consistent labeling. 

Would really appreciate any advice or recommendations for software tools and workflow.  A simple file duplicate finder isn't sophisticated enough for this job and I am hoping someone who has tacked a mess like this can share some wisdom.  How can I efficiently select and delete many thousands of dupes in order to create a single archive that can then be backed up as a whole?

Hi Mark,

I don't know if there is an application that can do all the work, but there may be an approach to create some order in the chaos.

One step could be to copy all files into a single directory/folder. You could use an application that detects exact duplicates based on a hash value, so differently named files with the same content will be skipped.

That will create a de-duplicated list of files. Maybe that's all you need, if so then you're done.

If you want to preserve some of the folder structure you can delete all files which have a duplicate in the above created single de-duplicated folder, except for one folder you want to keep intact.

That will allow, step by step to either keep, empty, or eliminate folders except for one that has the correct structure. You then copy files without a duplicate from the deduplicated folder to their new destination.

A free application that can assist with managing this process, in Backup mode, is FreeFileSync:
https://freefilesync.org/

Cheers,
Bart

Logged
== If you do what you did, you'll get what you got. ==

mschubb

  • Newbie
  • *
  • Offline Offline
  • Posts: 26
Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
« Reply #3 on: February 07, 2020, 03:21:47 pm »

Thanks, Kers and Bart.  Very helpful advice.  Should have included that I'll be working in Windows, so can't use Neofinder.  Will definitely check out FreeFileSync.

I can drag all the various folders onto a single drive.  So maybe the workflow after that looks like this: 
  • Start by designating the most complete/clean folder as the keeper folder
  • Use an app to find and delete dupes within that main folder and it's internal subfolders
  • Then hopefully I will be using an app that can identify all dupes NOT in the keeper folder and then A) batch delete only the external dupes. and B) batch delete remaining external folders that are empty 
  • Which leaves the final step -- manually going through all the remaining external folders and sorting that stuff into the keeper folder 
Shouldn't take more the 2 - 3 years.    ;D
Logged

mcbroomf

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 978
    • Mike Broomfield
Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
« Reply #4 on: February 07, 2020, 04:32:35 pm »

I don't know if it would work but if you create a new catalog in Lightroom (if you have it) and import all files to a single new directory (ie use Copy) with Don't Import Duplicate Files turned on then you may just end up with a single set of files.  Then sorting starts...
Try on a small subset with known dupes 1st ...
Logged

mschubb

  • Newbie
  • *
  • Offline Offline
  • Posts: 26
Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
« Reply #5 on: February 07, 2020, 07:02:36 pm »

Thanks, Mcbroomf.  I do use lightroom.   Never imagined it handling such a large import while crosschecking for dupes... but will check it out.   And using a new LR catalog may be a great idea for doing the final round of de-dupes, sorting and reorganizing.   
Logged

BobShaw

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1671
    • Aspiration Images
Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
« Reply #6 on: February 07, 2020, 07:18:16 pm »

First I suggest that you get a unique name for each different file. Otherwise there will be probably a mass of images with the same file name and you delete the wrong stuff.

To do that select a bunch of files and rename them using a suitable tool to append the SHOOTING DATE at the front in reverse order. I use the Canon Digital Photo Professional Rename Tool.
So instead of IMG_0001.cr2 or whatever you have 2007-10-03_IMG_0001.cr2

You can then sort them in order. Drag all of the 2007 files into a folder called 2007 etc. Repeat
You can then create further subfolders. I generally use YYYY-MM-DD_FirstImageName

If the images are scanned then the the date will be the scanned date but they should be easy to identify.
Logged
Website - http://AspirationImages.com
Fine Art Photography

Joe Towner

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 1176
Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
« Reply #7 on: February 09, 2020, 04:23:07 pm »

The lightroom dupes may not work if the dupes are in the same import batch.

Might also look at Mylio as the central system for this.  For one it keeps your stuff out of it, and you can share the sorting work with others.  Figure out how much storage you'd need - consolidate all the files to a single new drive, point Mylio at it and see how it goes.
Logged
t: @PNWMF

mschubb

  • Newbie
  • *
  • Offline Offline
  • Posts: 26
Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
« Reply #8 on: February 10, 2020, 12:02:25 pm »

Thanks Bob and Joe.  Good heads up about duplicate names... this mess also includes phone photos from multiple family members, so duplicate names/numbers are likely.  And Mylio looks interesting... will check it out.   Reminds me of the now-defunct Picasa, which was a friendly and very usable photo organizer back in its day.
Logged

kers

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 2925
    • Pieter Kers
Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
« Reply #9 on: February 10, 2020, 03:54:34 pm »

ah, neo finder is not an option on windows alas- very nice made program..

But i think there are some suggestions in this thread that might work...
what i would do:

the most easy way
1 put the old material that you do not need everyday in one folder ( with a sensible name) on a hard disk (duplicate , so 2 Harddisks as an archive Aorg and Acopy)
2 let some kind of catalogue program do its job on it... so you can find dupilcates ans see thumnails.)
3 make sure that from now on things are organized - every photo a unique name/number

The idea is you only put yourself to search when you need something in the archive... instead of finding/sorting it all - can be a tremendous amount of work
Your new work is easely found from now on.

Ps how i work-

After making photographs i give them all a unique name with some references in the name and the folder they are in.
If the photo is a PSD or PSB file it is a master file. Other files are made from it.
So i at least keep the RAW and PSD/PSB.
But there are different working methods as long as you do it in a steady systematic fashion.


« Last Edit: February 10, 2020, 04:02:55 pm by kers »
Logged
Pages: [1]   Go Up