Search parameters when searching file content

This is a place for users to discuss Altap Salamander. Please feel free to ask, answer questions, and express your opinion. Please do not post problems, bug reports or feature requests here.
Jonathan

Search parameters when searching file content

Post by Jonathan »

What about the possibility of searching for a parameter, without the
specific characters within that parameter?

What I am trying to do is:

Search a global network drive which is a shared drive for patient
names in this format (<first name>,<last name>) or social security
numbers in this format (000.00.000). We need to be able to find
instances of these two entries and delete them if found...but we don't
know the specific names or the specifics social security numbers. Is
there anything your program could do for that?
cincura.net
Posts: 593
Joined: 09 Dec 2005, 17:30
Location: a step further
Contact:

Re: Search parameters when searching file content

Post by cincura.net »

You can use regular expressions to search file content. With RE you can do this in easy way.
Jiri {x2} Cincura
Jan Rysavy
ALTAP Staff
ALTAP Staff
Posts: 5231
Joined: 08 Dec 2005, 06:34
Location: Novy Bor, Czech Republic
Contact:

Post by Jan Rysavy »

(We would probably need sample of your file.)

Example: supposing we have file new.txt with following content:

Code: Select all

xxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxx(Jan,Rysavy)xxxxxxxxx
xxxxxxxxx(123.45.678)xxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxx
Regular expression searching for names in format (string,string):

Code: Select all

\(([a-zA-Z]+),([a-zA-Z]+)\)
Regular expression searching for social security numbers in format (nnn.nn.nnn):

Code: Select all

\([0-9][0-9][0-9]\.[0-9][0-9]\.[0-9][0-9][0-9]\)
Note: this expression will not match for (12.45.678) because of two digits "12" instead of three "123". Is it what are you looking for?

To find files use Commands > Find Files and Directories command, see http://www.altap.cz/salam_en/help/salam ... k_find.htm
Set the Regular expression option.
For syntax of regular expressions see http://www.altap.cz/salam_en/help/salam ... regexp.htm

Let us know if you have any questions...
Attachments
findregexp.png
findregexp.png (93.56 KiB) Viewed 10893 times
Jonatan

Parameters

Post by Jonatan »

Actually, the file would be more like

xxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxJan,Rysavyxxxxxxxxx
xxxxxxxxx123.45.678xxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxx

or it could also look like this in a form format

xxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxName: Jan,Rysavyxxxxxxxxx
xxxxxxxxxSS#: 123.45.678xxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxx

Thanks so much for your reply. This may be exactly what we need!!
Jan Rysavy
ALTAP Staff
ALTAP Staff
Posts: 5231
Joined: 08 Dec 2005, 06:34
Location: Novy Bor, Czech Republic
Contact:

Post by Jan Rysavy »

Jonathan, there should be some separators. How do you distinguish the name or numbers from the surrounding characters?
Last edited by Jan Rysavy on 01 Apr 2008, 08:21, edited 1 time in total.
JOnathan

Post by JOnathan »

They may be in a form or just in the body of a document. Some could be listed like this.

Name: Joe Smith or Resident: Joe Smith .

The problem is that we are searching a large global network drive with random file types and content on it. We need to be able to delete anything with personal information in it. There is no way to be sure that the content would have any kind of seperators.
Jan Rysavy
ALTAP Staff
ALTAP Staff
Posts: 5231
Joined: 08 Dec 2005, 06:34
Location: Novy Bor, Czech Republic
Contact:

Post by Jan Rysavy »

Do you know all the possible names and numbers before search?

For example you could know there could be only 3 names:
<First1, Last1>
<First2, Last2>
<First3, Last3>
(could be presented in different forms such as: "First Last" or "Last, First")
and only 3 numbers:
<number1>
<number2>
<number3>

Do you know exactly these names and numbers before you start the search?
User avatar
SvA
Posts: 486
Joined: 29 Mar 2006, 02:41
Location: DE

Post by SvA »

JOnathan wrote:They may be in a form or just in the body of a document.
Since you don't know where you need to search you need to search anything.

Unless you cannot invent an algorithm for what you want to do, you cannot get a computer do it for you.
The problem is that we are searching a large global network drive with random file types
In this case you cannot use a simple file search as Altap Salamander offers. The name can be encoded in any way and the program the document is made for will transform it into something readable. Imagine, for example a docx file (create by a recent version of Microsoft Word). Any text that docoment contains is stored within a compressed zip archive file and unless your search tool knows how to make sens of the bits it finds on the disk, it will not find anything sensible. Furthermore, if the names and numbers need to be removed, the tool needs to know how to edit that file, or else you'd be better of zapping your disk right away (writing zeros or random data all over it).
We need to be able to delete anything with personal information in it. There is no way to be sure that the content would have any kind of seperators.
My advice: zap the drive or get you some people doing it with intelligence using the applications made for editing the file types of which you need to make the content irrecognisable (or else get you a _large_ IT budget and a lot of time).
Post Reply