Luminous Landscape Forum

Equipment & Techniques => Digital Asset Management => Topic started by: mschubb on February 07, 2020, 12:17:30 pm

Title: Cleaning up a nightmare!! Best tools to sort and de-dupe?
Post by: mschubb on February 07, 2020, 12:17:30 pm
Need to cleanup a family member's archive of pictures which are a complete mess. Multiple dupes in multiple folders, random back-ups of back-ups, some maybe not backed up... tens of thousands of files all jumbled together across multiple drives with no consistent labeling. 

Would really appreciate any advice or recommendations for software tools and workflow.  A simple file duplicate finder isn't sophisticated enough for this job and I am hoping someone who has tacked a mess like this can share some wisdom.  How can I efficiently select and delete many thousands of dupes in order to create a single archive that can then be backed up as a whole?

Thank you in advance for any help!

Mark
Title: Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
Post by: kers on February 07, 2020, 01:52:18 pm
I have a friend with the same problem... Is it a mac or windows?
Anyway it will cost you a LOT of work to organize it or... you can just put everything on a harddisk and try Neofinder.
It makes a catalogue of theharddisk ; now images and other things and you can search for similar photos etc.
You could also choose Media pro to do that, but is is 32 bit and stops working at 10.15
Title: Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
Post by: Bart_van_der_Wolf on February 07, 2020, 02:13:41 pm
Need to cleanup a family member's archive of pictures which are a complete mess. Multiple dupes in multiple folders, random back-ups of back-ups, some maybe not backed up... tens of thousands of files all jumbled together across multiple drives with no consistent labeling. 

Would really appreciate any advice or recommendations for software tools and workflow.  A simple file duplicate finder isn't sophisticated enough for this job and I am hoping someone who has tacked a mess like this can share some wisdom.  How can I efficiently select and delete many thousands of dupes in order to create a single archive that can then be backed up as a whole?

Hi Mark,

I don't know if there is an application that can do all the work, but there may be an approach to create some order in the chaos.

One step could be to copy all files into a single directory/folder. You could use an application that detects exact duplicates based on a hash value, so differently named files with the same content will be skipped.

That will create a de-duplicated list of files. Maybe that's all you need, if so then you're done.

If you want to preserve some of the folder structure you can delete all files which have a duplicate in the above created single de-duplicated folder, except for one folder you want to keep intact.

That will allow, step by step to either keep, empty, or eliminate folders except for one that has the correct structure. You then copy files without a duplicate from the deduplicated folder to their new destination.

A free application that can assist with managing this process, in Backup mode, is FreeFileSync:
https://freefilesync.org/

Cheers,
Bart

Title: Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
Post by: mschubb on February 07, 2020, 03:21:47 pm
Thanks, Kers and Bart.  Very helpful advice.  Should have included that I'll be working in Windows, so can't use Neofinder.  Will definitely check out FreeFileSync.

I can drag all the various folders onto a single drive.  So maybe the workflow after that looks like this: 
Shouldn't take more the 2 - 3 years.    ;D
Title: Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
Post by: mcbroomf on February 07, 2020, 04:32:35 pm
I don't know if it would work but if you create a new catalog in Lightroom (if you have it) and import all files to a single new directory (ie use Copy) with Don't Import Duplicate Files turned on then you may just end up with a single set of files.  Then sorting starts...
Try on a small subset with known dupes 1st ...
Title: Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
Post by: mschubb on February 07, 2020, 07:02:36 pm
Thanks, Mcbroomf.  I do use lightroom.   Never imagined it handling such a large import while crosschecking for dupes... but will check it out.   And using a new LR catalog may be a great idea for doing the final round of de-dupes, sorting and reorganizing.   
Title: Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
Post by: BobShaw on February 07, 2020, 07:18:16 pm
First I suggest that you get a unique name for each different file. Otherwise there will be probably a mass of images with the same file name and you delete the wrong stuff.

To do that select a bunch of files and rename them using a suitable tool to append the SHOOTING DATE at the front in reverse order. I use the Canon Digital Photo Professional Rename Tool.
So instead of IMG_0001.cr2 or whatever you have 2007-10-03_IMG_0001.cr2

You can then sort them in order. Drag all of the 2007 files into a folder called 2007 etc. Repeat
You can then create further subfolders. I generally use YYYY-MM-DD_FirstImageName

If the images are scanned then the the date will be the scanned date but they should be easy to identify.
Title: Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
Post by: Joe Towner on February 09, 2020, 04:23:07 pm
The lightroom dupes may not work if the dupes are in the same import batch.

Might also look at Mylio as the central system for this.  For one it keeps your stuff out of it, and you can share the sorting work with others.  Figure out how much storage you'd need - consolidate all the files to a single new drive, point Mylio at it and see how it goes.
Title: Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
Post by: mschubb on February 10, 2020, 12:02:25 pm
Thanks Bob and Joe.  Good heads up about duplicate names... this mess also includes phone photos from multiple family members, so duplicate names/numbers are likely.  And Mylio looks interesting... will check it out.   Reminds me of the now-defunct Picasa, which was a friendly and very usable photo organizer back in its day.
Title: Re: Cleaning up a nightmare!! Best tools to sort and de-dupe?
Post by: kers on February 10, 2020, 03:54:34 pm
ah, neo finder is not an option on windows alas- very nice made program..

But i think there are some suggestions in this thread that might work...
what i would do:

the most easy way
1 put the old material that you do not need everyday in one folder ( with a sensible name) on a hard disk (duplicate , so 2 Harddisks as an archive Aorg and Acopy)
2 let some kind of catalogue program do its job on it... so you can find dupilcates ans see thumnails.)
3 make sure that from now on things are organized - every photo a unique name/number

The idea is you only put yourself to search when you need something in the archive... instead of finding/sorting it all - can be a tremendous amount of work
Your new work is easely found from now on.

Ps how i work-

After making photographs i give them all a unique name with some references in the name and the folder they are in.
If the photo is a PSD or PSB file it is a master file. Other files are made from it.
So i at least keep the RAW and PSD/PSB.
But there are different working methods as long as you do it in a steady systematic fashion.