Text Workbench Online Help Submit feedback on this topic   

HOWTO: Create a CSV File With E-mails Collected From Multiple Files

We shall consider two examples of creating a CSV file - a simple one, just e-mails, and the one a bit more complex - collecting e-mails and names.

Collecting E-mails Only

  1. Specify the folder in which you want to find files. For example: C:\MyWebFiles
  2. Set mask(s) of the files that you want to find. For example: *.htm*
  3. Switch on the regular expressions.
  4. As we need e-mails only, we won't bother matching the A tags. The expression would look like follows:

  5. [\w\.\_\d]+\@[\w\.\_\d]+\.\w+
    
    where
        [\w\.\_\d]+ matches all letters, dots, underscores and digits
  6. Go to the Collect tab and check the Collect... option. Set the path to the collector file, for example: C:\MyWebFiles\emails.csv. Set the Collected text to Found text, and Text entry separator to New line.
  7. Click the Search button.

Collecting E-mails and the Anchored Text (usually a contact person name)

  1. Specify the folder in which you want to find files. For example: C:\MyWebFiles
  2. Set mask(s) of the files that you want to find. For example: *.htm*
  3. Switch on the regular expressions.
  4. In this case, we have to match the A tags. The expression would look like follows:

  5. \<a[^\>\<]#href\=\"mailto\:([\w\.\_\d\@]+)\"[^\>\<]@\>([^\<]#)\<\/a\>
    
    where
        [^\>\<]#         matches all extra information 
                         between the tag name and HREF attribute;
    
        ([\w\.\_\d\@]+)  matches and stores the e-mail address 
                         (first stored expression);
    
        ([^\<]#)         matches and stores the tag inner text 
                         (second stored expression).
  6. We will have to reformat the stored expressions to match the CSV format. Please note that we can still use the replacement strings for collecting even if searching. So, in the Replace with field we shall type:

    \2,\1
    
    where
        \2 is the stored inner text of a tag (contact person title);
        \1 is the stored e-mail address.
  7. Go to the Collect tab and check the Collect... option.
  8. Set the path to the collector file, for example: C:\MyWebFiles\emails.csv
  9. Set the Collected text to Replacement text, and Text entry separator to New line.
  10. Click the Search button.