Generating novel insights in biology
In this interview, first published in vol 2, issue 2 of Next Generation Pharmaceutical Europe (October 2007), Thomson Scientific's Jon Brett Harris discusses the company's vision for bioinformatics.
The Thomson name is familiar to information users throughout the pharmaceutical industry. There can't be many researchers who don't rely on IDdb , the company's storehouse on investigational drugs. Thomson Scientific provides intelligence solutions to scientists, research organizations, and everyone from niche and specialty startups to all 20 of the world's largest pharmaceutical organizations. Parent company The Thomson Corporation was one of the first to recognize the possibilities of the Internet to distribute timely, relevant information to the desktop, and now boasts more than 20 million users worldwide.
Earlier this year, Thomson acquired Canadian-based life sciences data management company Unleashed Informatics, owner of BIND, the largest repository of value-added biomolecular interaction records in the world. NGP caught up with Jon Brett Harris to learn more about the company's vision for bioinformatics.
NGP. Is content still at the heart of pharmaceutical research?
JBH. Biotechnology companies are helping to drive innovation. This puts even greater importance on the quality of the information available to this market. Decreasing pipelines drive pressure backwards through the discovery chain. We're witnessing a shift away from the idea of creating a library of chemicals and seeing whether any of them have a promising effect to target generation. It's this thought-based approach that makes the kind of service Thomson Scientific provides ever more central.
NGP. But for areas such as biological research, there's no lack of available data
JBH. The challenge isn't finding information, it's leveraging it intelligently so that it promotes innovation. The volume and variety of biological data being generated in laboratories worldwide can be a problem for researchers. As tools and techniques have improved, the information sources haven't kept up — they've simply accumulated more content. So researchers may be faced with a vast amount of fragmented data held on multiple databases in incompatible data types. Companies may even be expected to assemble and deploy their own solutions, discouraging data convergence. It's a wood-for-the-trees situation. If all you can see is the data, you're not going to be able to look through it to the solution.
NGP. How important is data integration to Thomson Scientific?
JBH. We're an integration enabler — it's what we do. We take disparate data sources and slot them together to see what insights they can provide. For the companies I just mentioned, the data may be there, the solution might even be there, but it's not meaningful — you have to do the analysis yourself. By building intelligent links between data types, performing the analysis, and providing the abstracts and thought-leadership that puts the information into context, we can promote new ways of thinking.
Our goal is to create a single, fully-integrated workflow platform in which any content, including public sources, third party and proprietary data, can be added in, mapped intelligently into the big picture, and delivered as each user needs it. No more than they need, and certainly no less than is significant, whatever the source.
NGP. Why was it crucial to bring Unleashed Informatics into the Thomson solution set?
JBH. Unleashed was already highly respected for its BIND database and its integration with the more than 80 million published, available biological sequences. We believed there was so much more this data could be doing. Our first goal was to integrate the content with the nine million biological sequences in our GENESEQ database, a process we've achieved with the BONDplus database we launched in August this year.
At its heart, BONDplus focuses on the biological sequence, of course. It also equips you to look beyond it to the interaction, taxonomy, publication, annotation, domain and cross-reference data that surrounds it. We gathered this data from 15 assembled information types, and you can build your own content seamlessly into it. So in one place, you've got all the relevant biological and IP relationships for your molecule of interest. This will provide a significant advantage to biological researchers and their managers. They can expect to see a decrease in the time spent searching and analysing disparate databases, and more time spent on bench experiments, driven by more focused information.
NGP. What about the longer term?
JBH. BONDplus is a vitally important first step. Our next aim is to map it into our Thomson Pharma workflow solution, which is already unrivalled in terms of the latter part of the pipeline — from intellectual property, literature and news to brand optimization and the generics market. With the continued development of BONDplus a part of this wealth of intelligence, Thomson Pharma will become a single, central repository for all information of interest to biological researchers.
Our vision for this data is greater still. We want to do for biological research what IDdb did for chemical drug research — to generate insights, not just storehouse data. So we're constantly engaged in building our analytic capabilities and functionality, and looking for ways in which we can make the data work smarter all along the pipeline. We stopped thinking of basic content alone a long time ago.
The really exciting thing will be how the interface between all the different content types evolves, the way we link them together using new analytical tools, predictive mapping and so on. We're creating a research space where smart, focused thinking drives efficiency, innovation and success. That's what information sharing — from our first online Dialog index in 1966 to today's Internet data communities — always intended to achieve, and Thomson continues to lead the way in realizing it.