Blasting Through Bedrock at AstraZeneca


By BIO-IT World

Blasting Through Bedrock at AstraZeneca


Bio·IT World's Mark D. Uehling spoke with William Hayes, project lead for text mining and bioinformatics at AstraZeneca.


Q: What's changed in the vendor community?
A: Bioinformatics and drug discovery and text mining have started taking off. There are about a hundred companies out there trying to get our attention. Overall, the market is headed in the right direction. There are a lot of very good point solutions. But there aren't any very good integrated solutions.


What do you recommend?
You really want to pull in as much text, have as many different sources as you can, and run your analyses across all of them, because the results reinforce each other. The statistics get better the more data you have.


Any surprises?
Text search. I thought that would be one of the easiest projects. It was really hard. It seems like a mature technology. I consider 4 gigabytes of text not that large, especially when Google can take the entire Internet and turn it around in a few seconds. We have the requirement that we do a term search and get back the result, on average, in under a second. It was quite daunting to find something that could manage that.


Can vendors deal with the quantities of text you're expecting?
They're not able to scale up to all the text we're interested in. The few integrated solutions I've seen so far have been looking at a few hundred to a few thousand documents. We're looking at 40 gigs-plus — over 10 million documents.


What solutions do you like for text categorization?
For our particular purposes, Reel Two fitted us quite well. We've not been interested in doing really large-scale categorization. Their interface lends itself well to the domain experts who build up a small, focused, filtering process computationally. We've also been looking at using Reel Two's engine for term disambiguation.


What stands out in what you've seen so far?
The NLP [natural language processing] solution we're testing right now is pretty impressive. I consider it the crown jewel of our text-mining capabilities. We can do a protein-protein interaction query and get very accurate results.


Back to Digging Into Digital Quarries





Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

White Papers & Special Reports

sas whitepaper92

Managed Innovation, Assured Compliance 
sponsored by SAS
Discovery organizations are identifying a lot of promising compounds, but clinical research processes haven't kept pace with timely testing of all those potential therapies. This white paper describes how SAS® Drug Development supports true innovation across the clinical trial process.

In this white paper you will learn how to:

  • Assemble data to foster better collaboration
  • Get up-to-date information during clinical trials
  • Make informed decisions earlier in the trial process Download now 


BlueArc white paper image 1

Addressing Life Sciences Constantly Growing Data Challenges Research Environments
sponsored by BlueArc
The continued explosion of raw experimental data, the increased use of video, the growing adoption of new data retention practices, and the move to high throughput computational workflows are all placing new demands on the way life sciences organizations store and manage their data.

Download this white paper to learn about:

  • Factors driving the data explosion in the life sciences
  • New data management issues that must be addressed
  • HPC trends that are placing new demands on storage
  • Storage solution attributes that address performance, manageability, and energy efficiency. Download now 


isilon white paper

“Storage for Science – Methods for Managing Large and Rapidly Growing Data Stores in Life Science Research Environments” sponsored by Isilon
Large and rapidly growing stores of file-based and other data are a hallmark of life science research and bioinformatics. Determining how best to manage those data stores has become a significant challenge for Researchers and IT Pros alike.

This paper is intended to:

  • Provide guidance on the many storage requirements common to Life Science research;
  • Explain the evolution of modern storage architectures;
  • Summarize the major data storage architectures currently in use.

Additionally, it will present the Isilon IQ clustered storage product as a strong and flexible solution to those needs. Download now



Life Science Webcasts & Podcasts

Adobe

Hospital Paperwork No Longer Has to Be an In-patient Procedure 

Adobe podcast imageHow many times have you filled out that same patient registration form when visiting a doctor or the hospital? If you are a hospital administrator, nurse or registrar, you know that your patients and particularly your staff have managed hundreds of consent forms for medication, procedures, anesthesia, and HIPAA. Paperwork redundancy has become a significant bottleneck in the healthcare system. In this podcast, we’ll learn about how Adobe solutions for healthcare can help you streamline your paperwork and stop making paperwork an in-patient procedure.

Download Now 



More Podcasts

Job Openings

Oxford Nanopore Technologies, Oxford, UK
We seek a highly motivated individual to lead the administration, expansion and maintenance of our IT infrastructure, supporting our business operations and technological development of a DNA third generation sequencing system.  Includes administration and configuration of core corporate servers, high performance scientific computing and disk systems, security systems, network infrastructure and backups, maintenance of service levels, implementation of any IT related legal compliance issues and policies, and disaster recovery. to apply: www.nanoporetech.com/vacancies





For reprints and/or copyright permission, please contact The YGS Group, 1808 Colonial Village Lane, Lancaster, PA;

(717) 399-1900 ext. 125, or via email to Ashley.Zander@theYGSgroup.com.