Art.Net Help Pages: Doing Searches on Art.Net

Our server supports the htgrep program for doing text or html searches. Htgrep allows a user to input a keyword into a form, and then displays the result of any matches of that keyword in a particular search file. Htgrep allows you to search any html or plain text document on a paragraph-by-paragraph basis.

This page provides a summary of how to use htgrep on art.net. You can find more information on htgrep by looking at the htgrep FAQ.

[Searching Documents | Formulating Queries | Search Options]


Searching Documents

The file to search can be plain text or HTML. If the keyword is found, then the entire paragraph the word was found in will be displayed. A paragraph is defined as lines of text (or HTML code) separated by blank lines. This means that htgrep will not interpret the following html as separate paragraphs:

<P>Here is HTML paragraph one.
<P>Here is HTML paragraph two.
<P>Since there are no blank lines, htgrep looks at this as one paragraph.

To search a file in your account dir called name for a particular keyword, you set up a URL of the basic form:

http://www.art.net/cgi-bin/htgrep/file=/dir/name?keyword

With the URL above, the keyword is hardwired and can not be entered by the person doing the search. In order to allow the keyword to be entered, you need to setup a FORM. Here is a sample keyword entry search form:

<FORM ACTION="http://www.art.net/cgi-bin/htgrep/file=/dir/filename">
<P>Search for: <INPUT NAME="isindex" SIZE=32>
<INPUT TYPE="submit" VALUE="Submit">
</FORM>

Searching Multiple Files

You can have htgrep search multiple files by setting the file tag to a comma-separated list of files. For example, if you have two files named file1.txt and file2.txt, you would use the following URL for your form's action:

http://www.art.net/cgi-bin/htgrep/file=/dir/file1.txt,/dir/file2.txt

Note that you cannot use wildcards to specify multiple files for a search. The following URL will NOT work:

http://www.art.net/cgi-bin/htgrep/file=/dir/file*.txt

Formulating Search Queries

Users may submit queries using either boolean expressions or Perl regular expressions. With boolean expressions, the and, or, and not operators may be used. For example, if you wanted to find instances of paragraphs with both bulls and giraffes, you would enter "bull and giraffe". This is actually the same as the query "bull giraffe" since htgrep defaults to and. If you want all paragraphs containing either word, you would use the or operator: "bull or giraffe".

For paragraphs containing "giraffe" but not containing "bull", you would enter: "giraffe and not bull". You can also use parentheses "(" and ")" to form more complex searches. For example, "bull and (giraffe or cat)" would display paragraphs containing the word "bull" and either the word "giraffe" or the word "cat" or both.


Search Options

There are a number of options that can be used with htgrep. You can set the options by appending &option=value to the end of the search URL in the FORM ACTION= specification. For example:
<FORM 
ACTION="http://www.art.net/cgi-bin/htgrep/file=/dir/filename&style=pre&max=10">

Output Format

The output format can be set with &style=format, where format is one of pre, dl, ol, ul, corresponding to the standard HTML tags for preformatted text or list items. pre can be used when searching plain text files, and dl, ol, and ul can be used when the file to search contains HTML list items (<li>).

Cover and Closing Pages

By default, htgrep produces only a minimal title and introduction to a searchable document. However, if a header file base.hdr exists (where base is the filename being searched), htgrep will print that instead of the default header. In addition, if base.qry exists, it will be used whenever a non-empty query is given. (Normally base.hdr will be a cover page with introductory information, whereas base.qry will only contain the title and main headline.)

You can also have htgrep display a footer files in the same way. If you create the files base.hdr_footer and base.qry_footer, they will be automatically appended to the query output when queries are respectively absent or present.

Alternatively, the header and footer pages can be specified in the URL with the options &hdr=file and &qry=file for headers and &hdr_footer=file and &qry_footer=file. Note that the files are assumed to be in the same directory as the file being searched.

Options for Searching Plain Text

There are two options of particular interest if you are searching a plain text file:

&style=pre
If the source document is a plain text file, this will cause special characters to be escaped and each paragraph to be surrounded by <PRE> and </PRE>.

&grab=yes
Causes htgrep to search for URLs and ftp pointers and convert them into hypertext links. This is most interesting in combination with the tag &style=pre to query plain text files.

Maximum Matching Records

Normally a maximum of 50 records will be retrieved. This can be controlled with the tag &max=number.


Return to Help Pages