Got PubMed? Pubget Searches and Delivers Scientific PDFs



Loading...

By Kevin Davies

June 10, 2009 | Imagine a search tool for the life sciences literature that could, with one click, pull up a full-text PDF of any paper. That in essence is the attraction of Pubget, the first product of a small Cambridge, Mass. start-up.

Following a quiet launch last year, the company has just announced its first 50 partners, including Caltech, Dartmouth, Harvard, MGH, MIT, NIH, Princeton, UCSF, the University of Michigan, and University of Virginia.  A further 200 organizations are waiting to partner as well. Ryan Jones, Pubget president, says the firm has already enrolled “tens of thousands of users at this point, and we’re doubling every month.” A couple of thousand users are inside the Harvard hospitals.

The original Pubget product was developed by one of the three co-founders, a clinical pathologist at Beth Israel Hospital (Harvard Medical School) named Ramy Arnaout. He got his PhD in mathematical biology from Oxford, but was frustrated by the challenge of getting full-text PDF access to science journal articles -- even while working inside well-endowed institutions like Harvard and Oxford. Arnaout joined forces with Ian Connor, formerly with Lotus and IBM, and started building the Pubget search tool.

“Pubget is a platform for life science research,” says Jones, who cites familiar statistics that are propelling the project: the rate at which data are growing exceeds Moore’s Law, and on average two new life sciences journals are launched every day. “A scientist’s tasks are shifting from working with test tubes, reagents and diagnostics equipment to, more and more, interfacing with the data that’s already out there.”

Jones, who was previously with a start-up acquired by Microsoft enterprise search, says Pubget is built on three key components. “One is a search engine that has all the content that Medline or the NIH’s PubMed has in it – 20 million research documents.” Pubget’s open-source search engine uses a relevancy algorithm similar to PubMed, Jones explains, except a little fresher. “We took an initial data dump from PubMed, and now we’ve based direct connections to the publishers themselves, so as soon as research is available, we get that feed from the publisher.”

Second, Pubget built a ‘pathing engine’ that understands the location of the full-text PDFs across all 20,000 journal titles. “It knows exactly where on the web that full-text document lives,” says Jones. “We have crawlers that go out and understand at Nature or Cell or Science where those full-text documents live. In very much the same way that Google finds HTML, we can find the PDF.”

The third component is what Jones calls “a credentials engine, which understands the credentials of the subscriptions you have based on where you are… It can go into a library’s holdings page and interpolate what they have rights to.”

What this means is that when scientists use Pubget to search by author for example, the results are delivered in the form of the full-text PDF, without having to navigate through abstracts or publisher’s electronic portals. “The end user sees us in two ways,” says Jones. “If they are not associated with a larger institution, we are the most thorough resource for free full-text documents. We not only have everything that’s in PubMed Central and the other free resources, but we spider the web for other full-text documents that happen to be out there. If you’re at an institution, we’re the fastest way to take advantage of the subscriptions your institution has provided for you.”

Pubget offers various links for functionality, including a Firefox plug-in to download PDFs; access to the publishers’ web page and the equivalent page in PubMed; email forwarding; and tagging (using a virtual cloud-based storage system) to metatag articles and keep them in a ‘locker.’ A widget, which works via RSS, allows continuous updates on topics or authors inside a lab web page.

The First 50

The first 50 partners are about two thirds academic organizations, as well as hospitals and some commercial. Jones says Pubget already has users at all of the top 12 big pharmas, but no formal relationships as yet (“meaning we haven’t turned them on yet”).

Pubget will in time make money in two ways. One will be the provision of premium services. The other will be by aggregating analytics about current life science search topics. “We can help vendors like Agilent or Bio-Rad understand what the community is searching on,” says Jones. “If you do a search on swine flu, and someone did a virus study and in the methods of that study cited a specific type of microscopy, we can present ads relevant to that.” Host institutions can decide if they want those ads presented or, for a fee, they can opt for “a closed, white label site.” Jones says a handful of the first 50 partners are paying.

Jones credits the staff at the Harvard Countway Library for their early assistance. His team was nervous about the reaction of the publishers at first, “but the reaction has been vastly positive.” Those publisher relationships will be nurtured over time. “We strongly believe that search is paramount and that the papers are really the center of science – it’s how scientists communicate with one another. We want to participate most strongly in those two things, search and papers,” and potentially partner with groups in the social networking space.

Pubget can be found at pubget.com

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

White Papers & Special Reports

Quantum
StorNext 4.0: Technical Product Brief
Sponsored by Quantum

 
Proven in the world’s most data intensive industries, Quantum StorNext is a scalable, high-performance file system which allows data sharing across Linux, Mac, Unix, and Windows operating systems and manages data in enterprise storage environments. In this Technical Brief you'll learn:

  • How a high-performing file system can accelerate your business
  • How to simplify your data management
  • How a tiered storage approach can save you money


SURETY-IP_WPx108
Protect Your Scientific Intellectual Property: Proof of Lab Informatics Data Authenticity is Your Best Legal Defense
Sponsored by Surety, LLC

As a bio-technology or life sciences organization, your formulas, treatments and research and discoveries are the “lifeblood” of your business. But if you aren't protecting the integrity of your scientific data in your lab informatics systems, you risk losing IP ownership, revenue and consequently your business if you can't prove time-of-creation and data authenticity. Learn how you can implement simple, cost-effective and automated controls to protect your scientific intellectual property. Consider:

  • IP protection requirements in bio-pharma and other science-oriented industries can extend out 20, 30, 40 or more years
  • Most electronic lab management solutions include generic authenticity controls, so how "legally defensible" is yours?
  • Only standards-compliant, independent controls can future-proof your approach to long-term IP integrity protection and authenticity.
  • Learn more - get the free whitepaper now


BlueArc_WP_DataMigration.jpg
The Key to Life Sciences Data Management: Transparent Migration
Sponsored by BlueArc

Life sciences organizations face new data management challenges as the volume of research data grows and more data is kept online for longer times. Read this paper to learn about:

  • The benefits of transparent data migration (TDM)
  • How TDM technologies can simplify data management.
  • How using TDM can help increase storage utilization, improve computational workflow performance, and optimize the use of storage resources.


Life Science Webcasts & Podcasts

adobe_i3_btn_webinarNext-Generation Clinical Trial and Data Management Applications
Sponsored by Adobe

This webinar introduces i3Cube - a web-based, fully integrated, clinical trial and data management system built on Adobe’s LiveCycle® Enterprise Suite.  I3 cube provides end-to-end automation that delivers unprecedented visibility into information that sponsors need to accelerate the study process and complete trials efficiently. Viewers will learn more about:

  • Creating faster and more efficient trial processes
  • Reducing investigator burden 
  • Real-time sponsor transparency into study information
  • Enterprise solutions based on Adobe LiveCycle® ES utilizing cross-platform clients of Reader, Flash and AIR

    Download now.



More Podcasts

Job Openings

Employers -- Don't miss this opportunity to reach well-qualified life science candidates.

Loading...

For reprints and/or copyright permission, please contact The YGS Group, 3650 West Market Street, York, PA;

(717) 505-9701 ext. 125, or via email to Ashley.Zander@theYGSgroup.com.