ht://Dig Copyright © 1995-1999 The ht://Dig Group
Please see the file COPYING for
license information.
<SELECT NAME="search_algorithm"> <OPTION VALUE="exact:1 prefix:0.6 synonyms:0.5 endings:0.1" SELECTED>fuzzy <OPTION VALUE="exact:1">exact </SELECT> |
allow_in_form: search_algorithm search_results_header |
bad_querystr: forum=private section=topsecret&passwd=required |
bad_word_list: ${common_dir}/badwords.txt |
The default value of this attribute is determined at compile time.
common_url_parts: |
http://www.htdig.org/ml/ \ .html \ http://dev.htdig.org/ \ http://www.htdig.org/ |
The default value of this attribute is determined at compile time.
The default value of this attribute is determined at compile time.
doc_db: ${database_base}documents.db |
endings_affix_file: /var/htdig/affix_rules |
endings_dictionary: /var/htdig/dictionary |
endings_root2word_db: /var/htdig/r2w.db |
endings_word2root_db: /var/htdig/w2r.bm |
The parser program takes four command-line
parameters, not counting parameters and parameters
given in the command string:
infile content-type URL configuration-file
Parameter | Description | Example |
---|---|---|
infile | A temporary file with the contents to be parsed. | /var/tmp/htdext.14242 |
content-type | The MIME-type of the contents. | text/html |
URL | The URL of the contents. | http://www.htdig.org/attrs.html |
configuration-file | The configuration-file in effect. | /etc/htdig/htdig.conf |
The external parser is to write information for
htdig on its standard output.
The output consists of records, each record terminated
with a newline. Each record is a series of (unless
expressively allowed to be empty) non-empty tab-separated
fields. The first field is a single character
that specifies the record type. The rest of the fields
are determined by the record type.
Record type | Fields | Description |
---|---|---|
w | word | A word that was found in the document. |
location | A number indicating the normalized location of the word within the document. The number has to fall in the range 0-1000 where 0 means the top of the document. | |
heading level |
A heading level that is used to compute the
weight of the word depending on its context in
the document itself. The level is in the range of
0-10 and are defined as follows:
|
|
u | document URL | A hyperlink to another document that is referenced by the current document. It must be complete and non-relative, using the URL parameter to resolve any relative references found in the document. |
hyperlink description | For HTML documents, this would be the text between the <a href...> and </a> tags. | |
t | title | The title of the document |
h | head | The top of the document itself. This is used to build the excerpt. This should only contain normal ASCII text |
a | anchor | The label that identifies an anchor that can be used as a target in an URL. This really only makes sense for HTML documents. |
i | image URL | An URL that points at an image that is part of the document. |
m | http-equiv | The HTTP-EQUIV attribute of a META tag. May be empty. |
name | The NAME attribute of this META tag. May be empty. | |
contents | The CONTENTS attribute of this META tag. May be empty. |
external_parsers: |
text/html /usr/local/bin/htmlparser \ application/ms-word "/usr/local/bin/mswordparser -w" |
htnotify_sender: bigboss@yourcompany.com |
http_proxy: http://proxy.bigbucks.com:3128 |
http_proxy_exclude: http://intranet.foo.com/ |
The default value of this attribute is determined at compile time.
keywords_meta_tag_names: keywords description |
limit_normalized: http://www.mydomain.com |
local_urls: http://www.foo.com/=/usr/www/htdocs/ |
local_user_urls: http://www.my.org/=/home/,/www/ |
metaphone_db: ${database_base}.mp.db |
next_page_text: <img src="/htdig/buttonr.gif"> |
no_page_list_header: <hr noshade size=2>All results on this page.<br> |
no_page_number_text: |
<strong>1</strong> <strong>2</strong> \ <strong>3</strong> <strong>4</strong> \ <strong>5</strong> <strong>6</strong> \ <strong>7</strong> <strong>8</strong> \ <strong>9</strong> <strong>10</strong> |
nothing_found_file: /www/searching/nothing.html |
page_number_text: |
<em>1</em> <em>2</em> \ <em>3</em> <em>4</em> \ <em>5</em> <em>6</em> \ <em>7</em> <em>8</em> \ <em>9</em> <em>10</em> |
The default value of this attribute is determined at compile time, to include the path to the acroread executable.
prev_page_text: <img src="/htdig/buttonl.gif"> |
remove_default_doc: default.html default.htm index.html index.htm
or remove_default_doc: |
search_algorithm: exact:1 soundex:0.3 |
search_results_footer: /usr/local/etc/ht/end-stuff.html |
search_results_header: /usr/local/etc/ht/start-stuff.html |
search_results_wrapper: ${common_dir}/wrapper.html |
server_aliases: |
foo.mydomain.com:80=www.mydomain.com:80 \ bar.mydomain.com:80=www.mydomain.com:80 |
|
|
sort_names: |
score 'Best Match' time Newest title A-Z \ revscore 'Worst Match' revtime Oldest revtitle Z-A |
star_blank: http://www.somewhere.org/icons/elephant.gif |
star_image: http://www.somewhere.org/icons/elephant.gif |
star_patterns: |
http://www.sdsu.edu /sdsu.gif \ http://www.ucsd.edu /ucsd.gif |
start_url: http://www.somewhere.org/alldata/index.html |
synonym_dictionary: /usr/dict/synonyms |
syntax_error_file: ${common_dir}/synerror.html |
template_map: |
Short short ${common_dir}/short.html \ Normal normal builtin-long \ Detailed detail ${common_dir}/detail.html |
url_part_aliases: |
http://search.example.com/~htdig *site \ http://www.htdig.org/this/ *1 \ .html *2 |
url_part_aliases: |
http://www.htdig.org/ *site \ http://www.htdig.org/that/ *1 \ .htm *2 |
word_list: ${database_base}.allwords.text |