By Martyn Williams, IDG News Service, Tokyo Bureau
August 18, 2004 | TOKYO -- Fujitsu Ltd. and Japan's National Institute of Genetics are working on building what they expect will be the world's fastest database when it opens later this year.
A prototype of the system based on Fujitsu's Shunsaku XML database engine has already been completed and is currently undergoing in-house testing at the genetics institute, which is also known as Idenken in Japan.
Idenken's database is one of the world's three main genetics databases, and it is a repository for data from all genome projects conducted by Japan's government in addition to all public-domain data from the Japan Patent Office. It currently includes 35 million records, including the DNA pattern of 39.8 billion bases, and its size is doubling every year, the two said in a joint statement.
More than 10,000 users consult the database each day, making speedy searches a top priority for Idenken. Its current system is based on a relational database and takes about 10 minutes to complete a two- or three-keyword search, while the prototype system has already slashed the search time to about five seconds, said Osamu Akiba, director of Fujitsu's Triole Business Development Center. He demonstrated the system at the Fujitsu Solution Forum event in Tokyo last week.
The secret to Shunsaku's speed is a search algorithm that doesn't require an index. Each search is done in real time, and new documents can begin appearing in search results as soon as they are added to the database, said Nick Hayashi, a spokesperson for Fujitsu in Tokyo.
Given a database with static contents, a relational database and Shunsaku would be able to complete a search in about the same amount of time. However, the Idenken database is constantly growing, which means the relational database index always needs to be updated. If it can't keep up with the speed at which new information is being added, the result is a much slower search, Hayashi said. But because Shunsaku is always working on the database in real time, such problems do not affect it, he said.
Part of the ongoing work between Fujitsu and Idenken will cover optimizing Shunsaku, which was originally designed for high-speed processing of text searches, to better handle complex data such as those found in the biotechnology field.
"We created the prototype to copy the functions of the existing database, and are adding functions to it," Hayashi said. "We are going to enhance it further, and it may become faster, maybe 200 times faster [than the current relational database]."
Shunsaku is already available in Japan under the name "Interstage Shunsaku Data Manager Enterprise Edition," and Fujitsu plans to put it on sale in the United States later this year, Hayashi said.