WORKFLOW · CEO Yike Guo discusses Shanghai's giant grid computing project to link scientists
By Kevin Davies
March 8, 2005 | InforSense, which was spun out of Imperial College London six years ago by computing science professor Yike Guo, made two noteworthy deals in the past six months — one with Shanghai to build a grid computing network uniting research scientists, the other with GlaxoSmithKline to develop an open workflow platform for enterprise chemoinformatics. But as Guo explains to Kevin Davies, both deals have, at their core, a common goal: sharing data, software, and expertise effectively among large, diverse user communities.
Q: How did you get involved with China?
Our initial involvement was to create a software development base in China — it's where we develop our workflow components, especially for bioinformatics and protein informatics. We have a joint lab with the Shanghai Center for Bioinformation Technology (SCBIT), and with IBM, which provides the hardware facilities. InforSense provides the development projects, including development of components for advanced integrative studies of biological information.
China not only has cheaper developmental costs than Britain, but also has a very good scientific base — plenty of young scientists looking for good places to work. Shanghai has the best bioinformaticians in the whole country. Another reason is that China has big projects in life science research, such as the Human Genome Project, and now the Human Proteome Project. So they need software for large-scale integrative analytics.
Q: What is the new grid computing project in Shanghai?
InforSense was spun out of Imperial College (where Guo is on the faculty) and still has an IP pipeline with the university, so we are deeply involved in grid computing. I've been working with China in the field of high-performance computing and its applications. [Our] software is used extensively in Chinese bioscience, chemistry, and informatics fields.
SILO SMASHER: Guo (center) plans to help provide Shanghai scientists with ready access to software applications, data resources, and automated analytic workflows over the Web.
The Shanghai municipality is advanced in grid computing, and its infrastructure allows Shanghai's scientists in research and industry to share four things: information, computation, instruments, and expertise. It's an extremely ambitious project, which will last about 15 to 20 years. Phase 1 is three years, with 200 million yuan (£15 million) investment. In addition, there will be ongoing research collaborations between Europe and Shanghai in grid computing. The focus isn't about scheduling CPUs but understanding, on a higher level, service-based computing environments to utilize information, computation, instruments, and expertise — that is, to build a virtual research community. Q: How will this architecture empower the Shanghai scientific community?
The goal is to establish a virtual organization for Chinese scientists in Shanghai for collaborative research. Efficiency is one priority. If you put your people and resources in silos, it's very inefficient. The key to improving research efficiency is to share, across the organization and across domains, resources, information, and knowledge ...
Our discovery workflow is mechanized to dynamically integrate analytical services, including software applications, data sources, and instruments in a dynamic and personal way to form protocols or workflows. This workflow is in fact a plan or record of what kind of protocol a scientist has created. It can be stored, optimized, and reused.
We provide the analyst portal approach for publishing workflows as reusable end-user applications. Web portal deployment technology allows scientists to automatically turn an analytic workflow into a Web application. This is achieved by a simple click, so you don't need scientists to write Web applications themselves. Scientists can easily publish their research results together with the protocols from which the results are derived.
Q: Tell us about your recent partnership with GSK.
The GSK partnership is a micro view of our integrative analytical computing. They have exactly the same problems as Shanghai — sharing data, instruments, expertise, etc. The only difference is vertical. GSK is focused on one area: drug discovery.
The major problem in any pharma is integration — from data to applications and finally to expertise ... In discovery science, how one integrates information is very task driven and personal. It reflects an individual scientist's method of doing a particular scientific job.
On the other hand, in the IT world, after many years, software integration has finally come to a solution using Web services architecture: Publish everything with standard Web service protocols; software can then adopt an SOA (service-oriented architecture). If we put this together, whether information or application, so long as it gets published in a standard way, then we can join them together and provide an environment so that each scientist can also connect them in a flexible way. This is our Open Discovery Workflow concept: We allow scientists to define discovery protocols for a particular task.
GSK Chemistry adopted this concept — they put all their algorithms, perhaps a couple of hundred, even in library design, developed over the past 20 years into databases and into Web services and used us to join them together. Moreover, all the integrative workflows can be stored in our workflow warehouse so they become the company's knowledge base of informatics processes. This is an enterprisewide deal with GSK. Within 10 years, I expect that every scientist can do their job using this mechanism.
Q: What do you mean by Open Workflow?
We're not a vertical company. We don't do anything vertical. We're purely an IT company. Thus, we have developed the concept of an Open Workflow Partner Network, to help our vertical application software partners make their algorithms and applications compatible with our workflow. Some partners are licensing our workflow technology in-house. But others such as LION, CCG, Daylight, MDL, etc., made their vertical software applications easy for us to integrate. The basic idea is to make software chosen by mutual customers compatible with workflows — they can choose what they need to do their job and easily put pieces together.