Ignoring names in Compare Directories

We welcome any suggestions for new features or improvements in Altap Salamander. Please post one suggestion per report.
User avatar
SvA
Posts: 484
Joined: 29 Mar 2006, 02:41
Location: DE

Ignoring names in Compare Directories

Post by SvA »

The find duplicate Files function allows to ignore the name of the file and only consider (size +) content.
Compare directories could work quite similar (selecting the complement, though). However, it is not possible to ignore the file name in directory compare. Was it possible to add that option (probably enebled only if content is ticked)? or maybe an additional command to do a find duplicate in the pannels.
The result should be a means of investigating unique and different files in the two directories, ignoring name changes.

Is there maybe an alternative way of doing this?
User avatar
Ether
Posts: 1471
Joined: 10 May 2007, 16:08
Location: Czech Republic
Contact:

Re: Ignoring names in Compare Directories

Post by Ether »

Could you elaborate further on the usage of your idea, possibly including an example? I'm afraid I either don't understand what you're suggesting, or simply don't see any real use for it.
Ελληνικά rulez.
User avatar
SvA
Posts: 484
Joined: 29 Mar 2006, 02:41
Location: DE

Re: Ignoring names in Compare Directories

Post by SvA »

Well, when I store several files in a folder, there is usually a reason they are stored in the same folder. So, when some files in two folders are mostly identical, but some are different or are stored in one of the two folders only, I often need to find out why there are those differences and maybe need to reconcile/merge the two folders. For this reason, the files that are not identical are the most interesting, so I have to identify those. The file names may not match for a whole lot of reasons: they might be downloaded from different sources, autogenerated with some dynamic name, versioned while processed, renamed to fit in the collection ...

Examples of such files are picture collections, modified or not; Data files of some sort; processed sound files; parts of a larger texts, be it a book or documentation or some web project, or source code of a customised software ...
Jan Patera
Plugin Developer
Plugin Developer
Posts: 707
Joined: 08 Dec 2005, 14:33
Location: Prague, Czech Republic
Contact:

Re: Ignoring names in Compare Directories

Post by Jan Patera »

So, you actually want to compare every file in source folder with every file in the target folder? What results are you expecting to get? If src contains files A, B and C, and the target contains just D, but A and D actually have the same content (and timestamps etc.), should then D be marked as different, because it differs from B and C or the same (with A)?
Comparing content of every file with every file might not be what someone would like to do.
IMHO this would not work in an intuitive way.
Not speaking about real usefullness for more than a minimal number of users.
therube
Posts: 675
Joined: 14 Dec 2006, 06:22

Re: Ignoring names in Compare Directories

Post by therube »

If I'm understanding correctly, you're looking for a duplicate file finder that finds "non-duplicate" files.
(I could actually use something like that from time to time too.)

If you were to uncheck all the Compare Directories options (Size Date Attr Content), that would at least point out unmatching named files. Might help some.

Only thing I can think is to duplicate your file sets (so as to not delete real data), run a duplicate file finder (crc, not name based) on same, deleting all duplicates. That would leave you with only identical files & or files with differing content (but same name) in each set. Could be a lot of work, but might do what you need?


A program like DigitalVolcano Duplicate Cleaner records both duplicates & all files scanned.

If you were to export both lists (as .csv), slightly modify, sort, then you could compare the two lists (with Salamander's Compare Files).

Export duplicates as dup.csv.
Export all as all.csv.
Edit dup.csv (as it adds additional fields to the csv file) so that that it can be compared to all.csv.

Code: Select all

%s/ AM".*/ AM"/
%s/ PM".*/ PM"/
Sort both lists.
Use Salamander, Compare Files.

Again could be a bit of work ...
WinXP Pro SP3 or Win7 x86 | SS 2.54
User avatar
SvA
Posts: 484
Joined: 29 Mar 2006, 02:41
Location: DE

Re: Ignoring names in Compare Directories

Post by SvA »

Jan Patera wrote:So, you actually want to compare every file in source folder with every file in the target folder? What results are you expecting to get? If src contains files A, B and C, and the target contains just D, but A and D actually have the same content (and timestamps etc.), should then D be marked as different, because it differs from B and C or the same (with A)?
If I compare size and content, A and D should not be marked, because all compared properties are identical, B and C should be marked, because there is no identical match (same behaviour as now)
Jan Patera wrote:Comparing content of every file with every file might not be what someone would like to do.
Well, then he either needs to tick Name and Content, to compare only files with the same name, or not tick Content at all. And furthermore, content cannot be identical unless size is identical too, so even if the user did not tick size, but did tick content, it is a good thing to check size internally, so, only files with identical size need to be compared at all.
Jan Patera wrote:IMHO this would not work in an intuitive way.
What do you consider unintuitive when you tell the system to compare size and content of files in two directories and mark those that do not match? For me it is unintuitive if there is an implicit additional condition "Name" which is only mentiond in prose.
Jan Patera wrote:Not speaking about real usefullness for more than a minimal number of users.
People might just do things as cumbersome as therube proposed, or even completely manual.
Obviously, people do have identical files with different names, or else Find Duplicate Files would likely not offer that option. But what does it help you to find those, unless the file is isolated (ie. has no relationship to other files in it's environment), in wich case you can safely delete it?

So,I am asking people:
  • How do you avoid the situation I need the feature for?
  • if you happen to need to investigate differing files, how do you do it (i.e. how do you identify them)?
  • would the feature, I proposed, help you in doing the task?
Please also note: I am not asking for a massive change. I am just asking for making an implicit criterion explicit and changable.
User avatar
Ether
Posts: 1471
Joined: 10 May 2007, 16:08
Location: Czech Republic
Contact:

Re: Ignoring names in Compare Directories

Post by Ether »

SvA wrote:So,I am asking people:
  • How do you avoid the situation I need the feature for?
  • if you happen to need to investigate differing files, how do you do it (i.e. how do you identify them)?
  • would the feature, I proposed, help you in doing the task?
I'm sorry, but I can't remember any situation where I needed such feature.
SvA wrote:Please also note: I am not asking for a massive change. I am just asking for making an implicit criterion explicit and changable.
Your not asking directly, but if you think a bit about the way Compare Directories works ((*) compare by name, 1 to 1, at most min(m, n) comparisons) and about the thing you propose (compare by whatever, 1 to each other, at most m * n comparisons), you have to admit it's not a minor change.

The thing here is that comparison by name defines a useful constraint on the relations between the files in the two panels. Anyway, how would you solve the case where I choose to compare only by size and there are exactly two files of 123 B in the left panel and exactly one file of 123 B in the right one? Are the directories identical? If not, and I choose to compare by content, which pair of the files would be compared? If both of the pairs would be compared, and all three files would be identical, what would be the final result?

*] m is the number of files in the left panel, n is the number of files in the right panel
Ελληνικά rulez.
Jan Patera
Plugin Developer
Plugin Developer
Posts: 707
Joined: 08 Dec 2005, 14:33
Location: Prague, Czech Republic
Contact:

Re: Ignoring names in Compare Directories

Post by Jan Patera »

SvA wrote:
Jan Patera wrote:So, you actually want to compare every file in source folder with every file in the target folder? What results are you expecting to get? If src contains files A, B and C, and the target contains just D, but A and D actually have the same content (and timestamps etc.), should then D be marked as different, because it differs from B and C or the same (with A)?
If I compare size and content, A and D should not be marked, because all compared properties are identical, B and C should be marked, because there is no identical match (same behaviour as now)
Aha. What if there is also E in the target and equal to B in source. Then A, B, D, E are not marked after the compare. How would you distinguish whether the matching pairs are A-D and B-E or A-E and B-D or even all 4 files are the same?
User avatar
SvA
Posts: 484
Joined: 29 Mar 2006, 02:41
Location: DE

Re: Ignoring names in Compare Directories

Post by SvA »

Ether wrote:Your not asking directly, but if you think a bit about the way Compare Directories works ((*) compare by name, 1 to 1, at most min(m, n) comparisons) and about the thing you propose (compare by whatever, 1 to each other, at most m * n comparisons), you have to admit it's not a minor change.
Your number of comparisons are not correct. As it is now, the number is between min(m, n) if all files you compare match on first try and you compare only by file name. The upper bound is max(m, n) + min(m, n) * x where x is the number of additional criteria. You reach this if all filesnames match, but you need to test some files more than once eventually using all filenames in the folder with more files; then all further criteria match except possibly the last one tested on each file.
With my proposal, the cost is exactly the same, as long as you include name in your set of criteria. I admit that without the name as a criterion, the number of comparisons will probably increase, in some rather seldom cases even rocket, but so what? you did not sacrifice anything on status quo, but you got additional possibilities.
Ether wrote:Anyway, how would you solve the case where I choose to compare only by size and there are exactly two files of 123 B in the left panel and exactly one file of 123 B in the right one? Are the directories identical?
Asking whether they are identical here is the wrong question. No files are marked after compare, since every one matches according to the set of criteria given to a file in the other panel.
When including directories in the compare then this should be discussed, what people would expect. I'd say yes, they do match i.e. do not get marked, since none of the files in them would get marked (that's the way subdir compare works now also).
Ether wrote:If not, and I choose to compare by content, which pair of the files would be compared? If both of the pairs would be compared, and all three files would be identical, what would be the final result?
(Ignoring "If not") Still the same as above, with still the same justification. And , yes, both pairs will be compared, since the size matches.
User avatar
SvA
Posts: 484
Joined: 29 Mar 2006, 02:41
Location: DE

Re: Ignoring names in Compare Directories

Post by SvA »

Jan Patera wrote:Aha. What if there is also E in the target and equal to B in source. Then A, B, D, E are not marked after the compare. How would you distinguish whether the matching pairs are A-D and B-E or A-E and B-D or even all 4 files are the same?
I don't. I am interested in the non-matching files. Those are those that get marked even now, because those are them you will want to take action uppon (usually inspect and then copy or delete).
Even now you need to do the match manually (i.e. in mind) again, the only difference beeing that you have a 1:1 relationship and the criterion is displayed on screen. But what do you want to do on those matching pairs? Except maybe compare their content!?
User avatar
Ether
Posts: 1471
Joined: 10 May 2007, 16:08
Location: Czech Republic
Contact:

Re: Ignoring names in Compare Directories

Post by Ether »

SvA wrote:Your number of comparisons are not correct. As it is now, the number is between min(m, n) if all files you compare match on first try and you compare only by file name. The upper bound is max(m, n) + min(m, n) * x where x is the number of additional criteria. You reach this if all filesnames match, but you need to test some files more than once eventually using all filenames in the folder with more files; then all further criteria match except possibly the last one tested on each file.
SvA wrote:With my proposal, the cost is exactly the same, as long as you include name in your set of criteria. I admit that without the name as a criterion, the number of comparisons will probably increase, in some rather seldom cases even rocket, but so what? you did not sacrifice anything on status quo, but you got additional possibilities.
(I thought we were discussing only the case when you don't compare by name.) I was trying to prove that it's not a minor change, I wasn't objecting to the number of comparisons.
SvA wrote:Asking whether they are identical here is the wrong question. No files are marked after compare, since every one matches according to the set of criteria given to a file in the other panel.
Just to be sure here - when two directories are to be compared with no criteria set, the result is always "identical"? IMO that implies that the directories at least contain the same number of files.
Ελληνικά rulez.
therube
Posts: 675
Joined: 14 Dec 2006, 06:22

Re: Ignoring names in Compare Directories

Post by therube »

Not exactly what you want, but might help?


SearchMyFiles - Alternative to 'Search For Files And Folders' module of Windows + Duplicates Search
Added new search mode - 'Non-Duplicates Search' which allows you to find all files in the specified folders that are not duplicated.
(Though it does find same named, different content files.)
WinXP Pro SP3 or Win7 x86 | SS 2.54
User avatar
SvA
Posts: 484
Joined: 29 Mar 2006, 02:41
Location: DE

Re: Ignoring names in Compare Directories

Post by SvA »

Thanks, therube, but how does this help me with comparing two directories?
Furthermore, I did not find any options that let me specify according to what criteria a file is to be compared, so how should I procede?
Post Reply