April 12, 2007 | Over the past 20 years, bioinformatics in the workplace has evolved from a flashing green cursor connected to a server via telnet to complex applications with rich graphical user interfaces. With this evolution there has been the rise and fall of bioinformatics groups to build out and now replace existing applications and systems with configurable off-the-shelf solutions that are more easily maintainable. As discussed in Bio-IT World last year (see Web 2.0: Scientists Need to Mash It Up, Bio-IT World May 2006) there needs to be more of a bottom-up approach to knowledge sharing.
Biopharma companies have many lessons to learn from the emerging and maturing Web 2.0 technologies. Numerous Web 2.0 technologies share the basic principle of providing website visitors the ability to provide and manage the content (see Fig 1). Essentially, it is turning the relationship between a web portal and the web visitor on its head. After planting the first few seeds of content, users are encouraged to contribute material and expand the content. Over time, the site evolves where the users become the administrators, establishing a self- organizing site. This is hardly a new concept but is gaining momentum because of the usability, flexibility, and proliferation of multiple applications of this web-based technology.
A 2006 Booz Allen Hamilton study shows that the use of social networking sites, wikis, and other Web 2.0 technologies is a mass phenomenon. The study found that 71% of U.S. adults surveyed (via the web) have used Web 2.0. In fact, Wikipedia and a popular Blog site, Blogger, were among the top 5 Web 2.0 tools used (see Fig 2).
How does this relate to collaborative research? Wikis, a collection of information stored in one place and shared with others, and blogs, a personal collection of thoughts, ideas, and information, are the holy grail of most research organizations . They provide an easy means of getting information to those that need it.
Wikis for Science
Last February, the NIH held its first Wiki Fair, showcasing how application of this technology is being handled within the NIH as well as in other content-rich fields. Wikis are merely collections of text instead of information linked together via relationships in a relational database. But emerging research-focused wikis such as www.wikiprofessional.info are planning several scientific wikis including WikiProteins that will launch soon. Other examples such as Fluwiki, or wiki farms (collection of wikis) such as Biowiki are actively used by global participants.
On the other hand, blogs, or more formally web logs, are more like a diary — typically with input from one or more people they have a distinctive focus and tenor. In the sciences, there are blogs that espouse personal opinions and progress on current research (see www.biowiki.org). Others aggregate and discuss interesting articles, news, and publications around a specific discipline. Scienceblogs is a wide ranging open forum that hosts 59 weblogs ranging from archaeology to neuroscience. Blogs typically allow readers to add comments at the discretion of the person “in the driver’s seat” so that the user controls content to foster scientific interactions. Positions and theories can be stated and then discussed among interested parties in an iterative manner. One of the major challenges of e-lab notebooks has been to get scientists to enter their personal diary of scientific escapades into a form that can be readily shared. Blogging offers lessons on what users really need to share in their streams of consciousness.
Some organizations stress the need for networking to help members find available expertise. Many create the physical environment where scientists can actively meet, share, and discuss, but this is difficult in larger organizations, separated across a campus or spread out globally. Social networking sites, that have been growing at an exponential rate, have the potential to compliment physical social interactions. Social networking and interaction is growing among professionals — as demonstrated by the 7.5 million members who belong to LinkedIn a business networking site or the 120+ million registered users of MySpace. Commercial systems exist (and most likely open source solutions also) so that this concept can be applied within any research organization. Simply allowing or encouraging employees to post their CV internally and requiring annual research updates would transform many organizations that repeatedly outsource, or hire individuals because they cannot keep track of their human capital!
While social networking will help foster the exchange of ideas, sharing data remains one of the biopharma industry’s greatest challenges. Data are often in multiple formats and not usually available for download or manipulation, even when results are published in reputable journals. Professional Wikis are seeking to include not just peer-reviewed articles but also the supporting data. Such sharing, especially across disciplines, can truly broaden our ability to turn data into knowledge by fostering collaboration and data repurposing at an unheard of level. Taking another Web 2.0 example, Flickr — a simple web-based image repository where images are tagged and searchable — one can see how this could be applied in research. As one researcher commented recently: “My grandma can share images on flickr, but I cannot do this in my company!”
Data and content are the staple diet of scientists, but only when combined in interesting ways can insight and knowledge be extracted. Facilitating this is a key for fostering innovation and a number of Web 2.0 tools are emerging to assist in visualizing a wide range of online data. Such “mashups” combine information from multiple sources (datasets, news feeds, etc.) and filter them to provide additional insights or knowledge. Again, this is no new concept as robots and crawlers have been typically dispatched by researchers to build collections of information. Due to the complexity of the access and filtering required, developers basically had to understand programming to be able to send the necessary queries and then parse the data. With content being ubiquitously transformed into XML, the process of combining data feeds from multiple sources is being simplified. The use of web services to generate XML data feeds is growing within organizations and, when part of a Services Oriented Architecture, has the potential of opening up information across an organization.
Where data feeds are not accessible to scientists, there are several Web 2.0 tools that can help combine and share information. Swivel, based on the pivot table commonly used in Excel spreadsheets, and Many Eyes can enable scientists to perform data exploration without customized scripting from supporting informatics scientists. Swivel users can upload a swath of data and then ‘swivel’ it or slice it in multiple dimensions and perform multiple graphic representations. This is a departure from the traditional upload, parse, and report format of most data acquisition/visualization tools, offering considerable flexiblity to the users. Many Eyes, an even more sophisticated data visualization tool allows users to upload datasets, visually explore them, share, discuss and collaborate.
Mashup.com and pipes.yahoo.com enable data feeds to be integrated with some conditional reasoning, so that all or some output of one data feed can be used to mine a second data feed, and so on. This results in a complex data query and narrowing down of a large dataset to a more focused dataset specific to the person making the query. What is needed is truly a hybrid between the open access of information through web services, visual exploration, and conditional reasoning so that triage can focus research and then be discussed and shared.
If you build it they will come! This mantra needs to be stricken from the informatics vocabulary, as empowerment of users is what is needed. If THEY build it, THEY will come. But not yet. The technologies are still maturing and despite promising examples, some have still to be applied in a hardened research environment.
Furthermore, semantic interoperability is still a developing issue that needs to be applied to this class of technologies. For example, Wikipedia is really a wiki farm working in multiple languages, with little to no sharing of information between the different language versions. Being able to freely combine information in many different ways is an exciting concept, but understanding the context of data and being able to compare and contrast it is crucial.
Furthermore, the delicate socio-dynamics that exist between persons in IT, informatics, and research must be addressed. The informatics scientists have typically been the compilers of the information, providing customized analyses, dataset extractions, and tools to enable research. As more power is put at the hands of the scientists, the role of the informatics scientists may evolve into something that looks more like library sciences. While a crucial role, it is probably not what most bioinformaticians have in mind.
The bottom line is that social networking and information sharing tools are being utilized at increasing rates to interact and share information. It is only logical that scientific communities leverage these to share ideas and exchange data. Web 2.0 is a philosophy enabled by tools rather than a technology; it is driven by changing use of the Internet. Now content creators can start to define how much they share and collaborate, rather than having organizational structures and IT infrastructure dictating the parameters for integration and sharing.
How much of an impact these tools will have on breaking down silos between disciplines, and even companies, may not have so much to do with sheer numbers of users, but rather just an evolutionary trend.
Martin Leach, Ph.D. is a principal and Michael Tedeschi an associate with Booz Allen Hamilton in New York. Email: firstname.lastname@example.org
Subscribe to Bio-IT World magazine.