By Chris Dagdigian
February 15, 2012 | More than a year ago, Oracle made a decision that while not unexpected within the HPC community was nonetheless met with no small measure of concern. In December 2010, Oracle announced that Grid Engine (a very popular life science cluster scheduler and distributed resource manager that Oracle inherited via its purchase of Sun Microsystems) would no longer be freely available as an open-source product.
Oracle's decision to make Grid Engine available only to commercially licensed customers left a large community of scientific and high performance computing users questioning the viability of their long term technical planning and HPC roadmaps.
Life science users in particular were affected by the announcement as Grid Engine has become the de-facto standard in many bioinformatics, chemistry and genomics computing environments. The Grid Engine "standard" is pervasive enough that even laboratory instrument vendors target it to address customer concerns with integration into existing enterprise environments.
Is Grid Engine still relevant?
Back in 2010 I was less concerned with the future of resource management software as I was knee deep in several cloud projects and quickly saw firsthand how IaaS cloud platforms upend traditional scientific computing environments. Grid Engine enables multiple users, projects and groups to share the same infrastructure effectively. Why would this even be needed "on the cloud" where dynamically provisioning perfectly sized resources and infrastructure on a per-user or per-workflow basis is so trivial?
I was clearly incorrect. Grid Engine and similar software packages are still being actively deployed on cloud platforms. The ability to replicate "legacy" research computing environments is turning out to be a "must have" cloud capability. Grid Engine even has a place in the "new" style of cloud deployment architectures—it turns out that a scheduling and resource allocation system is very handy for organizations that prefer to keep some amount of their infrastructure persistently running and instantly reconfigurable.
In 2011, my work shifted to a number of infrastructure and datacenter refresh projects with biotech and pharmaceutical customers. This is when I realized that the future of Grid Engine was very much still a pressing concern. By mid-2011 organizations that had simply kept on using existing versions of Grid Engine were beginning to think about "what next". Even groups that were totally happy with existing scientific computing environment were starting to plan for the future as major technology advances in multi-core CPU management and GPU computing needed to be reflected in the capabilities of their job schedulers and resource allocation engines. A static, old, or unchanging Grid Engine environment will not be able to handle major advances in HPC, CPU, and GPU computing technologies.
Grid Engine in 2012
Immediately after the Oracle announcement in December 2010, multiple people announced intent to "fork" the last available open-source codebase that Oracle had released, or take that code, and begin independent development of it, creating a distinct piece of software. Of the forks that were created, there were two in particular led by individuals with deep familiarity with Grid Engine internals. This provided significant comfort to people interested in the long term viability of the product as forking the code is simply not enough—viability depends on people with deep prior experience with the complex codebase.
Another major event occurred in January 2011 when Univa announced that it had recruited a number of Oracle employees, including key members of the Grid Engine development and product management team. These members would now be working on a new version of Grid Engine sold and supported by Univa. Seeing an additional commercial company actively investing in and supporting the future of Grid Engine was the final piece I needed to be personally convinced that Grid Engine still had a future.
So where are we in 2012?
In a pretty good position, actually. Grid Engine users now have two sources for commercially-licensed and commercially supported products—both Oracle and Univa supply this. Free software fans and other related open-source projects that depend upon access to an unrestricted resource manager also have two different projects from which to chose.
Even better, a new company called Scalable Logic has announced its intent to provide commercial support and consulting services for one of the free Grid Engine variants. The ability to buy a support contract or even per-incident assistance for a free version of Grid Engine closes my last personal "must-have" feature wishlist.
This is how I have been handling the "what next?" discussions in my own projects:
- I do not personally expect Grid Engine to survive within Oracle for much longer. The product simply does not generate enough revenue for a company the size of Oracle and mention of Grid Engine has been missing from Oracle discussions regarding plans and roadmaps for high performance computing. It's unclear what Oracle's level of commitment will be moving forward. Oracle Grid Engine is not part of our current plans or projects.
- For organizations wanting explicit commercial support from Day 1 I'm continuing to recommend and use Univa Grid Engine. This represents for me the most suitable commercially licensed version of Grid Engine to get behind. Currently I have several clients using this and many more evaluating it.
- For organizations that prefer to use a freely-available version, I'm recommending the "Open Grid Scheduler" project. It was actually hard to make a concrete selection between the two popular forks (I use both daily in my work). The tipping point was the formation of Scalable Logic and their intent to offer commercial support and per-incident assistance with Open Grid Scheduler. This for me is the best of both worlds: a solid free & open source product that also allows for the possibility of commercial support when (and if) needed.
I'd be interested in hearing your stories. Have you switched from Grid Engine? To what? What other options are people looking at? I know within BioTeam, OpenLava is the next resource manager on our internal list of "to-try" items in the lab.
Available Grid Engine Options
Free & Open Source
Son of Grid Engine
News & Announcements: http://arc.liv.ac.uk/repos/darcs/sge/NEWS
Description: Baseline code comes from the Univa public repo with additional enhancements and improvements added. The maintainer(s) have deep knowledge of SGE source and internals and are committed to the effort. Future releases may start to diverge from Univa as Univa pursues an "open core" development model. Maintainers have made efforts to make building binaries from source easier and the latest release offers RedHat Linux SRPMS and RPM files ready for download. Support: Supported via the maintainers and the users mailing list.
Open Grid Scheduler
Description: Baseline code comes from the last Oracle open source release with significant additional enhancements and improvements added. The maintainer(s) have deep knowledge of SGE source and internals and are committed to the effort. No pre-compiled "courtesy binaries" available at the SourceForge site (just source code and instructions on how to build Grid Engine locally). In November 2011 a new company ScalableLogic announced plans to offer commercial support options for users of Open Grid Scheduler.Support: Supported via the maintainers and the users mailing list. Commercial support from ScalableLogic.
Commercially Supported & Licensed
Univa Grid Engine
Description: Commercial company selling Grid Engine, support and layered products that add features and functionality. Several original SGE developers are now employed by Univa. Evaluation versions and "48 cores for free" are available from the website.Support: Univa supports its own products.
Oracle Grid Engine
Description: Continuation of "Sun Grid Engine" after Oracle purchased Sun. This is the current commercial version of Oracle Grid Engine after Oracle discontinued the open source version of their product and went 100% closed-source. Support: Oracle supports their own products, a web support forum for Oracle customers can be found at https://forums.oracle.com/forums/forum.jspa?forumID=859
Chris Dagdigian is a consultant with the BioTeam. He can be reached at firstname.lastname@example.org.