Searching Strategies on Unicorn


INFORMATION TO DO WITH SEARCHING STRATEGIES AND
PROBLEMS


Subject: Searching Call Numbers

Using the Display1 Item command:
  If you type in a call number and put in a $ for a truncation
  at the end you will get a list of call numbers similar to a browse


Subject: Author Keyword search

If you try a AUTHOR KEYWORD search on their example:

        HILL C$ and set the options to PARTIAL--
        via webcat you get 20 hits and no other information

        via chui you get a screen asking if you want the
        full/partial hitlist, selecting full you get 113 hits.

        selecting partial, you get the 20 hits webcat users
        see.
OUR DEFAULT IN THE OPTIONS FOR KEYWORD IS "KEYWORD" AND
IF THE OPTION IS NOT CHANGED, YES THE USER GETS THROWN
INTO A BROWSE
LIST.
 
HILL C$ is a good example of an instance where even a high truncation 
   >      threshold rolls over to a browse, presumably because C$ returns too 
   >      many results--potentially a serious retrieval problem for users with 
   >      citations in "scientific" form, where the author's first name is 
   >      abbreviated to a single initial.
   
   

Subject:      SERIAL hypertext links

This becomes somewhat longer and more complex than originally
anticipated.
Earlier this week I noticed a loose thread on my ankle, pulled and pulled,
and discovered I had unraveled the entire hem of my trouser leg. I'm
beginning to feel the same way about 78X added title entries for serials.

I was able to get on the system before anyone was active this morning,
to make and test some changes related to "hypertext entries". These
are handled slightly differently for terminal and web interfaces, but
all affect the 780 (earlier title) and 785 (later title) fields in the
SERIALS format.

For character interface, a "related subfields" attribute governs
what subfields are used to do a lookup from the LIKE button. I believe
what subfields are used to do a lookup from the LIKE button. I believe
that as part of last summer's V7 to V8 migration, these got reset to
ALL, which meant that a LIKE lookup from these fields would always
fail. As of this morning, only subfield t is included in the LIKE
lookup, and it works.

WebCat uses a different mechanism, since there is no LIKE button.
Instead, in typical web fashion, any entry that can be used for a
hypertext lookup appears as a link (usually represented as blue
underlined text). The subfields displaying in WebCat are controlled
by the "display subfields" attribute, which has also been changed
to include only subfield t for 780 and 785. As of this morning,
these lookups finally work in WebCat.

See the British periodical HEART for a good example.

NOTES:

1. Both character and webcat interfaces are actually doing a
   PARTIAL title lookup (not a periodical title lookup) for
   the hypertext entry. In the character user interface, a
   lookup that returns a large number of hits (although I believe
   the threshold is 24 hits, I haven't yet found out if it is
   configurable) first returns an intermediate screen:

=========================================================
==============
Partial search results too long
To pick a new button, first return to buttons by pressing TAB(s).
Select one of the partial search result options, then press RETURN or
HELP          GOBACK        STARTOVER     PRINT         UTILITIES
END           CLEAR         TYPE          ACCESS
REQUEST
-----------------------------------------------------------------------
                        PARTIAL SEARCH RESULT OPTIONS


         Result of partial search too long.

         Do you want to CANCEL the search, display PARTIAL results,
         or display FULL results?

=======================================================

   In WebCat, the same thing happens, BUT WebCat doesn't offer the
   PARTIAL/FULL response, and defaults to PARTIAL. This hasn't
   caused any problems in the tests that I have done, but I am
   sure that there will be situations with words more common than
   HEART where the desired title is not included in the 24 records
   returned by WebCat. Sirsi is aware of this problem with PARTIAL
   searches in WebCat.

2. While researching this, I came upon a record with the title
   "Combined cumulative index to cardiology," which leads me to
   ask if similar changes are needed for the 787 (Other Title)
   field in the Serials format. Currently this field is defined
   as a hypertext field, which means it shows up under the LIKE
   button, BUT, it is not included in the FULL entry list, so
   these titles do not show up in WebCat unless one changes the
   view option to ALL. Further, because related/display subfields
   are set to ALL, hypertext lookups fail for 787 titles as they
   did for 780/785 fields prior to today. For the example title,
   the following 787 fields exist:

          Other title: American heart journal
          Other title: American journal of cardiology
          Other title: British heart journal
          Other title: Cardiology
          Other title: Cardiovascular research
          Other title: Circulation
          Other title: Circulation research
          Other title: Journal of molecular and cellular cardiology

   If you feel that hypertext lookups for 787 would be useful,
policy definitions for this field in the SERIALS format can
   be brought in line with those for 785/787.

3. BROWSE index policies remain unchanged, so ISSN and OCLC
numbers
   for these linking fields still appear in the browse indexes. My
   recollection is that this was an express desire of OPAC and serials
   folks when we initially set up browse index policies. However, staff
   should be aware of what this actually means, which may not be
   immediately obvious. A Periodical Title browse on BRITISH HEART
JOURNAL
   returns:

                              BROWSING THE CATALOG

 1> BRITISH HEART JOURNAL                                             [2]
 2) BRITISH HEART JOURNAL DLC SC 85001051 OCOLC 1537247           
   [1]
 3) BRITISH HEART JOURNAL OCOLC 1537247                               [1]

   Browse index entry 2 actually points to the journal HEART, because
   this added title entry (including the various control numbers) is
   encoded in the 780 of this record, and nowhere else. Browse index
   entry 3 actually points to the record for Combined cumulative index
   to cardiology journal, which has a 787 entry for this complete field.
   Unicorn is behaving exactly as it should, given the way we have set
   things up. Whether the way we have set them up is what we actually
   want is a matter for consideration by ESC, and perhaps the OPAC
   working group.

4. Policy changes described in notes 1 and 2 are effective with only a
   halt/restart, and require no re-indexing. Browse index policies
   described in note 3 require a total system rebuild (tentatively
   planned for July 3-4, 1998).


There are other problems regarding the use of hypertext links in WebCat
for library catalogues and certain Z39.50 destinations.
WebCat encloses the hypertext field in double quotes and sends it to the
server. Quoted search strings tell BRS to do a literal search of
everything:

For example:  "John, Smith"  BRS searches for the comma

I have asked SIRSI to get WebCat to strip out punctuation before sending
the search string.

Another option is to NOT send the string as quoted. This poses problems of 
its own when the string contains an operator like "and".


The TY was programmed to deal with all of the above issues. As Selden
mentions, the WebCat hypertext link should not be equated with the LIKE
operator.

For our in-house workstations I have added Javascript to do some
reformatting of hypertext linked fields before they are transmitted to
the server. In the meantime, SIRSI has addressed most of these problems
within the cgiopac.
Slavko


Subject: KEYWORD SEARCHING

The default operator delivered by SIRSI for keyword searching is SAME.
This operator acts just like the AND operator, only it narrows the search
universe to a paragraph (tag).
This helps prevent false hits. For example, a search for JOHN AND
KENNEDY using SEARCH EVERYTHING could find JOHN WILSON as an
personal author, and FRED KENNEDY as an added author. This would be
perfectly legitimate when considering the definition of the AND operator
since John AND Kennedy were found.
Submitting the search with the SAME operator forces the two words to be in
the same paragraph, so it is much more likely that you will find Kennedy,
John F.,and only Kennedy, John F.

Which default operator to use is a local decision.  Unicorn supports, AND,
OR, NOT, XOR, SAME, NEAR, WITH, ADJ.
Greg

Mack Lundy wrote:

We are on Unicorn98 and use WebCat as our public access method.
Assuming that your 020 and 001 tags are keyword indexed, then you can do
a search on ISBN and OCLC number by plugging the number into the
search field and clicking on "search everything."

I will grant you there is some difference in search results when doing a
search with AND between the keywords in a general search and the same
search without the boolean operator but maybe we should just consider that
a BI opportunity : )

this is the way we configure it to work.
an exact search uses the browse (heading) indexes to start the search.
in our standard configuration we create browse indexes for author, title,
and subject. when the search screens are setup general browsing  and
general exact searching is sent to the subject browse index.  Using
systemconfig,you can alter our standard configuration.

please note that keyword searching works differently: Our standard
configuration provides for hundreds of indexes, usually for specific tags,
as well as a number of grouped indexes such as author, title, subject and
everything.
jim



Subject: Searching strategy

When you search for "Comedy of errors and rodgers"

the system looks for that fully enclosed phrase in ONE bib. tag
in any record.  Your record does NOT have all these words in that
order in one bib. tag.  That is why I search for it by enclosing
the title in quotes only and using the boolean operator to say
this phrase PLUS this word (rodgers) as in the following search
string:


        "COMEDY OF ERRORS" and rodgers

now the system looks for the phrase in any tag AND rodgers in any
tag but not necessary both all in one tag.

Does this make sense?

Original question:

 I don't under stand why when I enter "Comedy of errors and
 rodgers"  the system doesn't just ignore the "of".  Instead
 it seems that this stop word "stops" the search and no results
 can be found.  If I enter "comedy errors and rodgers" I get right
 to the record.   
 
 This is an important search feature as it could lead to alot of
 false assumptions that we don't hold items which we do in fact
 have in our collection.  It is critical then to inform our users that
 when doing a key word search they should leave off "stop" words.