Drug monitoring through multi-file chemical structure searching
Bob Stewart
Thomson Scientific
May 2006
Thousands of chemical substances are investigated each year for their possible benefits in alleviating human and animal diseases. Private companies, universities and government agencies are involved in important drug research and development. Only by searching multiple pipelines and using chemical structure searching, can researchers be truly assured of a comprehensive search, thereby protecting investments made in current research.
Entering the Drug Pipeline
You often hear about the "drug pipeline,” which is the progress of a drug/compound from the point it is discovered to its market launch. For every compound that makes it to the market as a pharmaceutical, an estimated 5,000 others may be investigated. And every compound undergoes rigorous testing as required by government regulations. In the United States, drug evaluation is overseen by the Food & Drug Administration (FDA). Similar regulatory bodies exist in the European Union and other countries.
Drug pipeline directories compile scientific, patent, regulatory and commercial information, each assembling data from conference papers, peer-reviewed journals, patents, news stories, analyst and market research reports, and interviews with company representatives and industry experts.
The Drug pipeline directory databases on Dialog contain similar basic information for a given drug, such as drug name, originating organization, etc. However, there are key differences between directories: each drug pipeline directory is produced by a different publisher and offers unique features, and the ways in which data are organized and presented also differ.
Consulting multiple databases
Researchers, marketing departments, competitive intelligence professionals and others use drug pipeline databases to monitor:
- competitor profiles
- new drugs in development
- regulatory milestones
- new product launches
- new or existing patents in drug pipeline databases.
For a comprehensive understanding of the current drug research environment, it is vital for these professionals to consult multiple databases. This is reinforced in a study by Diane Q. Webb and John A. Willmore of BizInt Solutions Inc., which shows that searching only one pipeline database will generally retrieve only about 40 percent of the unique compounds in the pipelines of all drug companies.
Chemical searching techniques for drug pipeline databases on Dialog
Just as critical as searching multiple drug pipeline databases is the method of the search. Let’s look at four pipeline directories available on Dialog:
- IMS R&D Focus (File 445 on Dialog)
- Pharmaprojects (File 128,928)
- Prous Drug Data Report (File 452)
- Prous Drugs of the Future (File 453)
Figure 1: Features available in the drug pipeline databases on Dialog
Searching via Drug Name, Laboratory Code, Trade Name or Chemical Name can be effective, but typically is a difficult process. Many chemical compounds are known by different names. As an example, the commonly prescribed heart drug Coreg is also known by the generic name carvedilol. According to Chemical Abstracts Service (CAS), the chemical name is 1-(9H-carbazol-4-yloxy)-3-((2-(2-methoxyphenoxy)ethyl)amino)- 2-Propanol. It is also identified by at least two laboratory codes, several other variations of generic and trade names, and variations on the chemical name, making it difficult to retrieve inclusive results from a name search.
The Chemical Abstracts Service (CAS) Registry number is a unique identifier for a particular chemical compound, which poses a problem with pipeline database searches: the registry number has no chemical significance, so it cannot be used to find similar substances. CAS Registry numbers are often assigned to the drug, while different CAS Registry Numbers are assigned to various salts and isomers of the drug. A comprehensive search for a substance requires knowing all CAS Registry Numbers and it still may not uncover any compounds with similar structures. One more complication is that the CAS Registry Number field in most pipeline databases is not always populated.
Better results through structure searching
The only way to consistently identify chemical compounds is by their structures. Structure searches make it possible to retrieve compounds similar to a known drug simply by focusing on a particular structural feature. Through a chemical structure search, researchers can better determine novelty of a compound and find correlations between structures and biological activity, which allows them to draw conclusions about likely pharmaceutical effects.
Using DialogLink® 5, chemical searchers currently have access to eight structure searchable databases:
- Beilstein Facts (File 390)
- Derwent Chemistry Resource (File 355)
- IMS R&D Focus (File 445, 955)
- IMS Patent Focus (File 447, 947)
- Index Chemicus (File 302)
- Pharmaprojects (File 128, 928)
- Prous Drug Data Report (File 452)
- Prous Drugs of the Future (File 453)
By performing a chemical structure search directly in one or more drug pipeline databases, the searcher avoids the complications involved with name and CAS Registry Number searching. Unlike other search tools, DialogLink 5 supports chemical structure searching in multiple databases, and searches can be conducted in all eight databases simultaneously.
Figure 2: Highlighted substructures in DialogLink 5 make it easy for searchers to locate their structure within search results.
Chemical structure searching on Dialog is accomplished by uploading an MDL® MOL file to the Dialog search engine. Many chemical structure drawing packages support the creation of MDL MOL files, so the searcher can use one of several drawing packages (including MDL® Draw and MDL® ISIS/Draw). Alternatively, any pre-existing MOL file can be easily uploaded to Dialog.
Figure 3: Researchers can generate search queries for a Dialog chemical structure search using industry standard MDL® MOL files.
Simplified post-processing
DialogLink 5 also simplifies post-processing of search results, enabling researchers to download their search results in XML, a format that has rapidly become the standard for data interchange. With XML output, it is possible to collect information, transform and compile it to meet individual requirements. DialogLink 5 also has built-in capabilities to transform Dialog XML results into Microsoft® Word or Microsoft Excel reports, or XML data can be transformed and processed with any third-party application that can utilize XML. Further, Dialog’s Electronic Redistribution and Archiving feature (ERA), knowledge workers can distribute the final results to decision makers in the organization in full compliance with copyright law.
Figure 4: DialogLink 5's integrated report builder allows researchers to create high-quality reports in Microsoft(r) Word or Excel.
Conclusions
While chemical name and CAS Registry Number searching certainly have their value, chemical structure searching directly in multiple pipeline databases is certainly the preferable technique. Uniquely, Dialog searchers using DialogLink 5 can quickly and easily perform a chemical structure search across multiple files simultaneously. Additionally, Dialog’s XML output allows information to be easily compiled and shared across the organization.