Text Workbench Online Help Submit feedback on this topic   

Options Dialog - Processing

The Processing option page contains settings that define how HandyFile Find And Replace treats files and text.

You can click on any image area to scroll to the description. 

Dialog Fields

The tables below describe the dialog fields.

Processing steps

Field Description
Options in this group are listed just in the same order that they are applied to input files.
Process .doc/.dot/.rtf files in Microsoft Word (requires MS Word) If this option is checked, all .doc, .dot and .rtf files will be processed using the installed Microsoft Word engine. This allows to modify these files correctly, ensuring their safety and integrity.

The following restrictions apply when processing .doc, .dot and .rtf files in Microsoft Word.
  • Regular Expressions cannot be used.
  • Microsoft Word does not report the true match count; the only information that can be obtained is that the sought text is found (or not found) in a file.
  • Collector does neither apply to Word documents, nor it collects any text in such files.
  • As processing a Word file requires running a Microsoft Word instance, the processing speed is rather slow.
If this option is unchecked, any .doc or .dot file will be reported as a binary file if the binary filter is switched on. RTF files may report that the text is found, but we do not recommend using simple text replace on RTF files to avoid data corruption.
Search headers If checked, headers of a Word document are also processed, not only the main content.
Search footers If checked, footers of a Word document are processed as well as the main content.
Search properties (Word) If checked, TW searches (and replaces if needed) common document properties of a Word document.
Search hyperlink text If checked, text of hyperlinks in a Word document are processed as well.
Process .xls files in Microsoft Excel (requires MS Excel) If this option is checked, all .xls files will be processed using the installed Microsoft Excel engine. Checking this option is the only way to find and replace text in your XLS files. This allows to modify these files correctly, ensuring their safety and integrity.

The following restrictions apply when processing .xls files in Microsoft Word.
  • Regular Expressions cannot be used.
  • Microsoft Excel does not report the true match count; the only information that can be obtained is that the sought text is found (or not found) in a file.
  • Collector does neither apply to Excel documents, nor it collects any text in such files.
  • As processing an Excel file requires running a Microsoft Excel instance, the processing speed is rather slow.
If this option is unchecked, any .xls file will be reported as a binary file if the binary filter is switched on.
Search properties (Excel) If checked, TW searches (and replaces if needed) common document properties of an Excel document.
Analyse input and exclude binary files If this options is checked, each file that is about to be processed is checked for presence of non-printable bytes. The main intention of this option is to ensure the safety of binary files if the file filter mask is set to * (match all file names and extensions) .

Normal text files do not contain non-printable characters. The only allowed non-printable symbols are blanks (\x20), tabs (\x09), carriage returns (\x0D) and line feeds (\x0A). If the file contains any other non-printable characters (with the code less than \x20), it is considered suspect.

You can adjust the suspect value by using the Binary threshold parameter.

Additionally, this option tells the HFFR to analyse input when replacing and recognize UTF-8 files without BOM's (byte order mark). If you turn this option off, non-standard UTF-8 files (those without BOM) will be processed as ANSI files without loss of information.

Enabling this option slows down the processing speed.
Binary threshold This parameter defines the maximum quota of non-printable characters allowed in the suspicious file. If the file exceeds this value, it is rejected and is not processed.

For example, you can set the value to 1% to allow the incorrectly formatted text files to be processed. As researches show, this is the best value.

A value of 0% rejects all suspicious files.

A value of 100% is similar to unchecking the option Analyse input and exclude binary files.
Apply filter when searching If checked, files will be checked for binary content when searching and replacing. This allows to find binary or invalid text files.

If unchecked, files will be checked for binary or UTF-8 content when replacing only.

Uncheck this option if you want image files (pictures) to be included in the search results so you can view them.

Ask for replacement confirmation (in text files) If this option is checked, the Replacement Confirmation Dialog will appear asking you to accept or decline the replacement of every occurrence of the sought text in each processed file.

This option is only effective for text files. This does not apply to Microsoft Word or Excel documents.

Search using Regular Expressions

Field Description
Whitespace operator \s matches line-breaks (CR and LF) Normally, the regular expression operator \s matches line-breaks in addition to whitespace and tabs. This allows to find the irregular text blocks (that differ in formatting) easily. 
You may choose to turn this option off for some reason.
Current file extension operator \X includes a dot The operator \X that is used in the Replace expressions inserts an extension of the file under process - for example, .html. Uncheck this option to not include the leading dot in the extension: html

Search without Regular Expressions

Field Description
These options affect the search exactness. Spanning some or all of the formatting characters allows to find blocks of text similar to the target string but differentiating in formatting. See the Remarks below.
Ignore blanks (\x20) Skip all blanks (character code \x20) in both the processed text and the search string when searching. That is, number of blanks and their position in the text do not affect the match/no match result. 
Ignore tabs (\x09) Skip all blanks (character code \x09) in both the processed text and the search string when searching.
Ignore line-breaks (CR and LF) Skip all carriage returns (\x0D) and line feeds (\x0A) in both the processed text and the search string when searching.
Ignore other symbols Allows to specify and skip the user-defined symbols in both the processed text and the search string when searching.

Remarks

Spanning options

These options are extremely useful if you need to find some heterogeneous blocks of text and do not want to use Regular Expressions. For example, if you use a WYSIWYG HTML editor, you notice that it formats the code in a higgledy-piggledy fashion. The formatting is generally performed using blanks, tabs and line-breaks. Say, you need to find the following code:

<a href="http://www.mysite.com">
<img src="images/button.gif" alt="image" border="0"
 width="128" height="82">
</a>

The WYSIWYG formatter might write it like this:

<a href="http://www.mysite.com"><img src="images/button.gif" 
alt="image" border="0" width="128" height="82"></a>

or like this:

<a 
 href="http://www.mysite.com"><img 
 src="images/button.gif" alt="image" border="0" width="128"
 height="82"></a>

or even like this:

<a
 href="http://www.mysite.com"><img 
 src="images/button.gif"
 alt="image"
 border="0"
 width="128"
 height="82"></a>

You can easily handle all cases using Regular Expressions, but if you do not want bother yourself or not familiar with them, you can use the spanning options. Simply check all the three boxes (to match the text in this example) and the TW will find the string.

Note
Use of the spanning options results in search speed degradation.