If there is the need to query more than one WAIS database in parallel the one setting up a query interface may be lucky and the fields available in the different databases coincide exactly in each of the databases. But the more common case is that there are different fields in different databases. The schemas of the databases differ.
In most cases this is due to different types of the documents stored in the databases, e.g. the one database holds references to literature while another one holds product descriptions.
SFgate 5.1 is suited to handle the heterogeneity of databases holding documents not differing to much in their types, especially for references to literature, like e.g. articles, books, reports.
The task with respect to heterogeneous databases is to find a mapping of the attributes (or field names) used within a given query and the attributes (or field names) available in the different databases.
To understand how to use the attribute mapping facility of SFgate one has to know what happens to a given query within SFgate. As pointed out in the previous section SFgate's task is to take the attributes from a given query and map them on the most suitable attributes being part of the database schema for each database to query.
A simple (but insufficient) solution is to rename the attributes within the databases to those used in queries. But this is sufficient only if all databases include the same attributes (on the semantic level) as those used within the query. So strictly speaking this solution does not deal with heterogeneity.
The solution implemented in SFgate 5.1 is based on a predefined set of attributes for use within queries. To do a mapping of attributes there has to be knowledge of how the attributes are related to each other. So we introduced a lattice (see section 6.2 Predefined Attribute Lattice) on these attributes which reflects the specialization relationships between the attributes.
With means of this lattice the mapping process for attributes can be defined with the four operations equality, specialization, generalization and ignorance in the following order:
The set of attributes used within the lattice is mainly taken from the Scientific and Technical Attribute Set (STAS) hold by CNIDR. STAS defines standard identifiers for referring to searchable and retrievable fields within scientific and technical databases.
KL-ONE defines a diffs operator which allows for inheritance on attributes. Using the diffs construct a specialization hierarchy has been introduced on a (small) subset of the STAS attributes:
TOP | |-keywords | | | |-content | | | | | +-full-text | | | | | | | +-title | | | | | | | | | |-book-title | | | | |-article-title | | | | +-series-title | | | | | | | |-abstract | | | +-subject-descriptor | | | | | +-journal-title | | | |-initiator | | | | | |-author-name | | |-editor-name | | |-corporate | | +-conference | | | +-publisher | | | |-publisher-name | +-publisher-address | |-date | | | |-entry-date | +-publication-date | +-meta | |-issn |-isbn |-crc |-volume |-number +-edition
Now, what is to do to use SFgate's heterogeneous database handling
facility? First of all SFgate must be told to use it. So set up a FORM
tag named attributes
.
<INPUT NAME="attributes" TYPE="hidden" VALUE="1">
The next thing is to select a suitable lattice. You need not to use the
lattice proposed in the previous section. Feel free to build up your own
lattice. If not using the proposed one (i.e. file `$SFgate/lattice'
file, which is installed in the application directory (see section 3.3.1.10 Directory for Application Files)) you have to announce your lattice to SFgate. So
set up a FORM tag named lattice
with the filename of your lattice
file as value (including the complete path if you don't want to install
it in the application directory):
<INPUT NAME="lattice" TYPE="hidden" VALUE="lattice">
Now that you have a lattice you can create the query part in your form. Specify the input fields using the attribute names from your lattice only.
The next step is to do some configuration on the databases to query. In general field names in WAIS databases are not taken from your lattice, so you've to tell SFgate a mapping of the database fields onto the attributes within the lattice. Furthermore SFgate needs to know the types of the database fields since fields from different databases mapped on one lattice attribute need not to possess the same type.
Let's start with an enumeration of the different possible types:
The types (yes, a database field can have more than one type, e.g. soundex and text for personal names) of an field can be easily derived from the .fmt-file (see section `Building a Format Description' in The freeWAIS-sf Manual) used to create the database.
Now how can the knowledge about a database be told to SFgate? This is
done via an external file which should reside in the application file
directory (see section 3.3.1.10 Directory for Application Files). Instead of the
database specification as described in section 5.4 Databases the value of a
database
FORM tag must contain the name of that file. If the file
doesn't reside in the application file directory it must be specified
with the complete path.
Specifying only a file name of a database configuration file within a
database
FORM tag makes it necessary to configure server, port
and name of the WAIS database within the configuration file
itself.
Another point are converters. Different databases contain documents of different types so there might be a need to use different converters for documents resulting from different databases. This is done within database configuration files, too.
The syntax of database configuration files is taken from Perl. The three examples below form a database configuration file for the WAIS database `bibdb-html' on server `ls6.informatik.uni-dortmund.de'.
Server, port and name of the database are specified as simple perl variables:
$server = 'local'; $port = '210'; $name = 'bibdb-html';
The database fields, their types and their counterparts within the
lattice are given in an anonymous hash reference named
$attributes
. The first part of an entry is the name of a datbase
field followed by a colon and the list of types, seperated by
commas. The second part is the counterpart within the lattice:
$attributes = { 'py:numeric' => 'publication-date', 'au:text,soundex' => 'author-name', 'ti:stemming' => 'title', 'cc:text' => 'crc', 'jt:stemming' => 'journal-title', 'vo:numeric' => 'volume', 'no:numeric' => 'number', 'global:text' => 'keywords' };
Also the (optional) mapping of converters is given as an anonymous hash
reference (named $converter
). The first part of an entry is the
converter name used within the form, the second part is the name of the
converter to call:
$converter = { 'BIBTEX' => 'bibtex', 'PRETTY' => 'label:', 'DEFAULT' => 'label:' };
If you want to learn more about the bacckgrounds of SFgate's attribute mapping idea take a look at: