This essay was originally published in the Current Contents print editions
February 28, 1994, when Thomson Scientific was known as the Institute for Scientific Information (ISI)
In the last two essays, we explained citation indexing and its
usefulness in navigating the research literature.1,
2 In this essay we will explore the possibilities of
retrieval using key word and cited reference searches.
Introduction
The Science Citation Index® (SCI®) was originally designed as an alternative approach for retrieval of relevant information; but the concept of relevance is not as simple as it sounds. Relevance, like beauty, is in the eye of the beholder.
Regardless of the initial approach to a search—whether
through a key word index or through a citation index—only
the citation index will easily permit retrieval of subsequent
papers that refer to a specific paper or book that the user
has deemed "relevant."
Systems like SCI rely on the judgment of authors
and referees who choose references for published papers.
In systems like MEDLINE®, the judgment
of indexers determines the terms used, and the systems
are based on the thesaurus called Medical Subject Headings
(MeSH). Since human effort is involved, there is always
the problem of consistency from one article to another.
And, in traditional indexing, there are economic limitations
to the number of headings that can be assigned to each new article.
In any case, thesauri have innate problems in dealing with active,
fast-moving fields in which the terminology changes rapidly.
Following the example set by the online version of SCI
on DIALOG back in 1972, MEDLINE adopted title word indexing
several years ago to partially offset this difficulty.
Nevertheless, a major complaint about MeSH indexing is
that in many cases the generally broader terms retrieve
too much information. However, skilled users of MEDLINE
can use the standard list of subheadings available to reduce
retrieval to a more manageable number of hits. Thus, as
an example, compare a search on cancer with a search on
cancer epidemiology.
Comparative Studies
Studies comparing citation-based retrieval with the use
of MeSH have been conducted, including an early study by
Spencer. She found that in the beginning of a search,
use of SCI provided results in a more rapid
and efficient manner.3
But to obtain a more comprehensive result, back-up with
Index Medicus was necessary. Later studies, including
McCain's,4 focus on
the complementary aspects of the two systems. McCain
found that retrieval by descriptor-based and citation-based
searches does not significantly overlap.4
Depending on the subject matter, there are topics for which
the use of either a single word or citation may capture
90% or more of the "relevant" literature. While a search
on a specific disease can be run by a key word, it is
almost impossible to use key words to retrieve every
paper that uses or modifies a method or theory.
McCain's study considered 11 search topics—such
as interpersonal problem solving, rehabilitation and
therapy for aphasia following stroke, and the classical
conditioning of drug effects—which were suggested b
y researchers. McCain also asked the researchers to
identify relevant older contributions that were likely
to be cited in more recent work. In either case, the
search results were evaluated in terms of relevance
and novelty. Interpreting the results, McCain
suggests that "subsets of both literatures may be
relevant to a given researcher's information needs,
serving related rather than identical
functions."4
Relevance
Relevance is a vast subject that deserves a discussion
in its own right. Nevertheless, most evaluation studies
designed to measure relevance do not capture the significance
of "being cited." If you specifically ask whether a particular
author or paper has been cited, then any citing paper is
relevant. However, a paper on topic A could be cited in
a paper on topic B, but the latter might not be deemed
relevant in a traditional comparison of A and B (or other
papers C and D) since they may not be terminologically connected.
There are countless examples in which two or more subsequent
articles will cite a designated paper, but the various citing
titles will not necessarily overlap in the terms used to
describe their content—neither in the title nor in the
key words or abstracts. Whether the citation-based
common thread is methodological, theoretical, or otherwise,
only the searcher can determine its relevance. Indeed,
it is frequently the unexpected connection that may prove
to be most relevant—that is, the most interesting.
This will vary with the purpose of the search. That
is why I often contrast the needs of information recovery
with those of information discovery.
Novelty
If your primary aim is to find the known literature on
a topic, then precision of search may be all-important.
But if you are interested in finding previously unknown
connections, then the system must facilitate your ability
to do this without retrieving everything that is published.
In traditional searching, this is done by using boolean
combinations of terms.
Timing
One of the problems with traditional indexing is
the inherent delay introduced by using human indexers.
To overcome this problem, many journals have
implemented author key word indexing. Unfortunately,
only about 25% of published articles contain author
assigned key words. Thomson Scientific
uses these to augment its unique capability to provide
derivative indexing called KeyWords
Plus®.
KeyWords Plus
KeyWords Plus is called derivative indexing
because the terms are derived from the titles of articles
cited by the author of the article being
indexed.5 KeyWords
Plus augments traditional key word or title retrieval
to a varied extent—anywhere from 10% to over 100%.
For example, using Current Contents on
Diskette®, you can search on
an article such as "The spectrum of autoimmune thyroid
disease with uticaria" from Clinical Endocrinology,
and find that the key words UTICARIA, VASCULITIS, THYROID
DISEASE, and HASHIMOTO THYROIDITIS are expanded to include
the additional KeyWords Plus terms ASSOCIATION and
ANGIODERMA. Again, the user is the ultimate filter.
When KeyWords Plus® is used
in a weekly or monthly file, as with Current
Contents®, you can readily filter out the noise
from the music. On the other hand, doing an annual
search may require further refinement, as mentioned
above, by combining one or more words and cited references.
Conclusions
Both descriptor-based and citation-based systems have
unique advantages. In the next installment, I will illustrate
by example how these systems can work to narrow or maximize
search results.
Dr. Eugene Garfield
Founder and Chairman Emeritus, ISI
References
1. Garfield, E. The concept of citation indexing:
A unique and innovative tool for navigating the research literature. Current
Contents® (1-4):3-5, 3-24 January 1994.
2. ----------. Where was this paper cited? Current
Contents (5-8):3-5, 31 January - 21 February 1994.
3.
Spencer, C C.
Subject searching with Science Citation Index®; Preparation of a drug bibliography using Chemical Abstracts, Index Medicus, and Science Citation Index 1961 and 1964. Am. Doc. 18(2):87-96, 1967.
4.
McCain, K W.
Descriptor and citation retrieval in the medical behavioral sciences literature:
Retrieval overlaps and novelty distribution. J. Amer. Soc. Inform.
Sci. 40(2):110-4, 1989.
5. Garfield, E. KeyWords Plus®:
ISI®'s breakthrough retrieval method. Part I. Expanding
your searching power on Current Contents on Diskette®.
Current Contents (32):5-9, 6 August 1990. (Reprinted in: Essays of
an Information Scientist.) Philadelphia: ISI Press®, 1991.
Vol. 13. 295-9.