Accelerating Intuition: Reynders on Big Challenges for Informatics



By Catherine Varmazis

May 7, 2008 | BOSTON | Integration and convergence were the main themes of John Reynders’ keynote at the 2008 Bio-IT World Conference & Expo last week. Both have profound implications for informatics and the future of personalized medicine, he said.

Reynders, VP and CIO at Johnson & Johnson (J&J), said the “absolutely insane” amount of data being generated in the life sciences, and the heterogeneity of that data, require new ways of working and connecting with people. Specialists in all fields need to work in the “white space” between fields, and to collaborate more with colleagues within, and partners beyond, their organizations. “The company that thinks it’s sufficient to connect great minds behind its walls – well, it won’t be a great company for very long,” Reynders said.

 John Reynders 
John Reynders
Reynders identified four critical layers of convergence that are occurring or have yet to happen: 1) converge the data; 2) converge information semantically; 3) converge people; and 4) converge knowledge to lead to reasoning capability.

Converge the Data
The first convergence layer involves the capture, processing, filtering and management of data. When he worked at Los Alamos National Laboratory, Reynders ran a research center that had the largest dedicated unclassified supercomputer in the U.S. -- a 1-Teraflop system that had 7-Terabyte spinning disk for storing data. When he moved to Celera, which was assembling the human genome, its system had the same computational capacity, but the spinning disk was 150 Terabytes – 20 times greater. 

That amount of data creates storage challenges. Unable to get federal funding for petabytes of storage, Celera built data visualization corridors, but Reynders said these can leave researchers “data-rich but information-poor.”

Hierarchical approaches to data storage are needed because “we can’t keep all the data all the time.” Such approaches involve not just technology solutions but also business models. For example, if you temporarily need a petabyte of storage, do you have the option to lease it?

Reynders illustrated the concept with the image of bears waiting on a riverbank as salmon leap upstream: “The bears grab just the salmon they need. They don’t want all the water, just the fish.” This concept goes by various terms, but “cloud computing” is the latest buzzword. Reynders argued that we must find ways to temporarily store vast amounts of data, mine it and extract the specific information required – then “drain the pool” and start over.

Find Meaningful Connections
Data exponentials are growing in two dimensions -- size and scale -- as well as in terms of heterogeneity, said Reynders. The challenge facing biomedical scientists is how to converge all this data semantically and find meaningful connections between different kinds of data. Traditional means of finding semantic relationship in text data include latent semantic indexing (LSI) and natural language processing (NLP). While useful, Reynders said, such tools do not serve the desired purpose, as we’re now storing all kinds of data, including text, compounds, and genetic data.

Ontologies are promising platforms for forming semantic relationships across arbitrary data, he said, but only get you so far. “This is the other shoe dropping. Not only do we have an immense amount of data, but the data is very heterogeneous.”

Reynders said the data integration challenge the life science community faces is similar to that of the intelligence community, which is also grappling with the problem of heterogeneous data from disparate sources (satellite data, multi-language text data, signal intelligence, human intelligence), and they have to put it all together and make sense of it. “Some of the algorithms that are needed for this very heterogeneous data integration, which typically go beyond ontology to a very large graph-type problem, are very relevant to our challenges,” he said. “It’s not enough that we can navigate in one of these domains because you cannot find those connections if you’re only looking at one class of information.”

Open Innovation
Reynders’ third layer of convergence is relatively new, he said: that of converging people. Innovation requires not just bringing all the integrated data to one scientist, but to a team of scientists in different domains working together. “Where is the MySpace for scientists? Or that in silico watering hole where clinicians can share their ideas?” he asked.

Some organizations are harnessing internal wisdom within their ranks. J&J, for example, has built the LINK (Leverage INternal Knowledge) expert locator system to enable collaboration among 14,000 enrollees across far-flung J&J subsidiaries. LINK scans the emails of everyone who signs in, searching for keywords. “You have a problem, you throw in these terms, and LINK connects you with the person in [say] Belgium who can help you.”

Other open innovation business models include Eli Lilly’s InnoCentive website. Originally focused on chemical compounds, it now includes a wider range of problems. NineSigma lets seekers post RFPs and solvers bid against them. Your Encore is a space for retired scientists and engineers who want to get back to solving problems. Online IP markets such as yet2.com help innovators who have great ideas navigate all the steps of locking down their IP.

“More and more often, innovation will be coming from outside the organization,” said Reynders, “so paying close attention to how these open innovation models are evolving is going to be very critical to all of us.”

Reasoning Layer: Accelerate Intuition
It was surprising to hear Reynders assert that one bottleneck in convergence is the human brain brain's ability to absorb and connect across multiple complex representations of information. For example, in 1997, IBM’s Deep Blue beat world chess champion Garry Kasparov, even though it only ranked in the middle of Top 500 supercomputers and had less computing power than a modern laptop. How could a machine with such modest computational power defeat the human brain?

Reynders cited futurist Ray Kurzweil, who predicts that within ten years or so, the “Turing test” will be passed. “You’re [instant messaging] on two computer terminals,” said Reynders. “On the other side of the conversation, in another room, there’s a person at one terminal and a computer at the other, and you can’t tell which is which. That’s passing the Turing test.”

Doctors and clinicians of the future will not be replaced by computers, he said, but they will have the very best digital reasoning capacity available to help them sort through immense amounts of heterogeneous information to find the needle in the haystack. That reasoning layer will be the final layer, said Reynders. And it’s much more than “connecting the dots.” It involves connecting information and forming neural circuitry between concepts that can be traversed by any kind of reasoning platform.

“We can’t even start to think about the next layer until this capacity has been formed,” Reynders concluded. “This kind of convergence in the future – Ah, it gets exciting.”

_________________________________________________

This article appeared in Bio-IT World Magazine. 
Subscriptions are free for qualifying individuals.  Apply Today.

 

 

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 
Apply at http://jobs.tessella.com   

oxford nanopore logo 


Early Access Collaborations ManagersClick here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Click to  Apply  

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .