Dakota is a freely available software framework for large-scale engineering optimization and uncertainty analysis. The Dakota toolkit provides a flexible, extensible interface between analysis codes and iterative systems analysis methods. Dakota contains algorithms for:

  • optimization with gradient and nongradient-based methods;
  • uncertainty quantification with sampling, reliability, stochastic expansion, and epistemic methods;
  • parameter estimation with nonlinear least squares methods; and
  • sensitivity/variance analysis with design of experiments and parameter study methods.


Itzï is a hydrologic and hydraulic model that simulates 2D surface flows on a regular grid using simplified shallow water equations. It uses GRASS GIS as a back-end for reading entry data and writing results. It simulates surface flows from direct rainfall or user-given point inflows, and uses raster time-series as entry data, allowing the use of radar rainfall or varying friction coefficients.

Itzï is developed by Laurent Courty at the engineering institute of the National Autonomous University of Mexico.

Open Science Codefest

Open Science CodefestThe National Center for Ecological Analysis and Synthesis (NCEAS) at UCSB is co-sponsoring the Open Science Codefest 2014, which aims to bring together researchers from ecology, biodiversity science, and other earth and environmental sciences with computer scientists, software engineers, and developers to collaborate on coding projects of mutual interest.

Do you have a coding project that could benefit from collaboration, or software skills you’d like to share? The codefest will be held from September 2-4 in Santa Barbara, CA.

Inspired by hack-a-thons and organized in the participant-driven, unconference style, the Open Science Codefest is for anyone with an interesting problem, solution, or idea that intersects environmental science and computer programming. This is the conference where you will actually get stuff done – whether that’s coding up a new R module, developing an ontology, working on a data repository, creating data visualizations, dreaming up an interactive eco-game, discussing an idea, or any other concrete collaborative goal that interests a group of people.

Looks like a great program!

Stripe’s Open Source Retreat

rechargeThe Open-Source Retreat that is being sponsored by stripe looks quite intriguing.  Stripe relies on a lot of open source software, and they’ve announced a program to give a grant to a small number of developers to come to San Francisco to work full-time on an open-source project for a period of 3 months. The awardees will have space in Stripe’s SF office, and will be asked to give a couple of internal tech talks over the course of the program, but otherwise it’ll be no-strings-attached.

This is a clever model for supporting open source development, and I hope this idea catches on with other companies that benefit from open source. I can think of a number of academic developers who would love the idea of a sabbatical to work on an open source code project, to meet new people who might use their code, and to get a fresh perspective in new surroundings – an open source sabbatical.  This could be a great way for companies that benefit from open source scientific software to help encourage and influence the development of the tools they use.

The deadline for applying to the Stripe program is May 31st, and the program will run from September 1st through December 1st.

New PLOS Open data policy

PLOS one logoPLOS has announced some changes to their publishing policies, and these changes are great news.  The new PLOS policies will go a significant way towards encouraging open data and open source.  Although the announcement itself is somewhat vague on the subject of source code, the actual PLOS One Sharing Policy is excellent:

…if new software or a new algorithm is central to a PLOS paper, the authors must confirm that the software conforms to the Open Source Definition, have deposited the following three items in an open software archive, and included in the submission as Supporting Information:

  • The associated source code of the software described by the paper. This should, as far as possible, follow accepted community standards and be licensed under a suitable license such as BSD, LGPL, or MIT (see http://www.opensource.org/licenses/alphabetical for a full list). Dependency on commercial software such as Mathematica and MATLAB does not preclude a paper from consideration, although complete open source solutions are preferred.
  • Documentation for running and installing the software. For end-user applications, instructions for installing and using the software are prerequisite; for software libraries, instructions for using the application program interface are prerequisite.
  • A test dataset with associated control parameter settings. Where feasible, results from standard test sets should be included. Where possible, test data should not have any dependencies — for example, a database dump.

However, the one loophole is that they allow for code that runs on closed source platforms in “common use by the readership”  (e.g. MATLAB), although it must run without dependencies on proprietary or otherwise unobtainable ancillary software.  That “common use” loophole could potentially be a mile wide in some fields.  Is Gaussian a common use platform in computational chemistry and therefore exempt from this new policy?   If so, the policy is a bit toothless.  I’d like to see the limits and bounds of the “common use” loophole more clearly stated.

The announcement makes PLOS ONE a much more attractive place to send our next paper.

The RosettaCon 2012 Collection: Rosetta Developers Meet the Challenges in Macromodeling Head On

Rosetta2012 Collection ImageReproducibility continues to be one of the major challenges facing computational biologists today. Complicated experiments, massive data sets, scantily described protocols, and constantly evolving code can make experimental documentation and replication very difficult.  In addition, the need for specialized knowledge and access to large computational resources can create barriers when trying to design and model macromolecules.

Every year, the Rosetta developer community meets to discuss these challenges and advancements via Rosetta, a software suite that models and helps design macromolecules. In 2010, PLOS announced the RosettaCon2010 Collection, which made the latest research on protocols used to create macromolecular models available to all. Now, the PLOS ONE RosettaCon 2012 Collection continues to tackle issues related to use, reproducibility and documentation by highlighting new scientific developments within the Rosetta community.

The RosettaCon 2012 Collection comprises 14 articles detailing the scientific advancements made by developers that use Rosetta. In order to address reproducibility and documentation challenges, each article within this Collection includes an archive containing links to the exact version of the code used in the paper, all input data, links to external tools and example scripts.

This year’s Collection marks the tenth anniversary of RosettaCon and focuses on three long-term goals of the community: increase the usability of Rosetta, improve its current methods, and introduce completely new protocols.

Increasing the usability of Rosetta – Rosetta still requires specialized knowledge and large computational resources, but this collection features two articles describing advancements that make it easier for non-experts to use its applications. These articles introduce the Rosetta Online Server that Includes Everyone (ROSIE) workflow, which allows for rapid conversion of Rosetta applications into public web servers, and PyRosetta, a new graphical user interface (GUI) which allows users to run standard Rosetta design tasks.

Improving current prediction methods – Several articles describe improvements to Rosetta’s structure prediction capabilities and design methodologies. Some examples include improvements to loop conformational sampling, and a recently developed ray-casting (DARC) method for small molecule docking now enables virtual screening of large compound libraries.

Introducing new protocols – A number of articles featuring new procedures and applications that debuted at the conference are introduced in the Collection. Highlights include new methods for dealing with ligand docking, advancements to pre-refine scaffold proteins prior to computational design of functional sites, and new protocols to drive Rosetta de novo modeling.

The RosettaCon 2012 Collection continues to help serve the Rosetta community in an effort to ensure that newly developed protocols are as usable as more established workflows, are transparent, and are accurately documented even in an active development environment.

This post has been adapted from “The RosettaCon 2012 Special Collection: Code Writ on Water, Documentation Writ in Stone” which serves as a more in-depth overview of the new collection. To read all that this Collection has to offer, click here.

OpenAPIs for scientific instrumentation?

382119_573424529339454_1784469895_nAn interesting question from Dale Smith:  Are there OpenAPIs for remote sensing and monitoring of scientific instruments?  Dale pointed us at this very cool RSOE EDIS alert map as an example of what could be possible with distributed consumer-grade sensors that had OpenAPIs.   I can imagine a number of very cool things that could be done with distributed weather or earth motion sensors.  Are there software tools out there that make querying these sensors easy?

(One suggestion,  however, would be for the RSOE EDIS to look for a slightly less ominous-sounding motto).

Playing with MultiGraph

multigraph-logo72x72I’ve been playing around with a cool JavaScript library called MultiGraph which lets you interact with graphical data embedded in a blog post.   The data format is a simple little xml file called a “MUGL“.   Here’s a sample that took all of about 10 minutes to create:

Note that you can pan and zoom in on the data.   For those readers who are interested, this data is the Oxygen-Oxygen pair distribution function, \(g_{OO}(r)\), for liquid water that was inferred from X-ray scattering data from  G. Hura, J. M. Sorenson,  R. M. Glaeser, and  T. Head-Gordon, J. Chem. Phys. 113(20), pp. 9140-9148 (2000).

Inserting this into the blog post involved uploading two files, the javascript library itself and the MUGL file. After those were in place, there were only two lines that needed to be added to the blog post:

<script type="text/javascript" src="http://www.openscience.org/blog/wp-content/uploads/2013/05/multigraph-min.js"></script>

<div class="multigraph" data-height="300" data-src="http://www.openscience.org/blog/wp-content/uploads/2013/05/gofrmugl.xml" data-width="500"></div>

One thing that would be nice would be a way to automate the process of going from an xmgrace file directly to the MUGL format.

SimThyr – simulation software for pituitary thyroid feedback

feedback_overview_smallThis is a bit outside our normal area of expertise, but it looks interesting.

Thyroid hormones play an important role in metabolism, growth and differentiation. Therefore, exact regulation of thyroid hormone levels is vital for most organisms. The mechanism for the feedback control known, but the dynamics are still a bit of a mystery.  There’s an interesting page on the different models for thyrotropic feedback control at the Midizinische Kybernetic (Medical Cybernetics) site.  SimThyr is an open source Pascal-based simulation program for the pituitary thyroid feedback control mechanism that explores these models and makes predictions for dynamics based on parameters of the feedback mechanism.

Reversible Random Number Generators

random_numberThis news comes by way of John Parkhill, my new colleague here at Notre Dame.

William G. Hoover (of the Nosé-Hoover Thermostat) and Carol G. Hoover issued a $500 challenge on arXiv to generate a time-reversible random number generator.  The challenge itself would be quite remarkable news.  What’s even better is that the challenge (including the source code for an implementation) was solved in 6 days by Frederico Ricci-Tersenghi.

Why is this a big deal?  Most of the equations in physics that govern time evolution of particles obey time-reversal symmetry; the same differential equations that govern molecular or planetary motion will take you back to your starting point if you suddenly reverse the time variable.  This is a usually a fantastic way to check to see if you are doing the physics correctly in your simulations, and also means that collections of  starting points that are related to each other behave in certain predictable ways when they evolve.

Stochastic approaches to physical motion introduce an aspect of randomness to mimic the behavior of complex phenomena like the motion of solvent surrounding the molecule we’re interested in, or to mimic the transitions between different electronic states of a molecule.   The introduction of random numbers has meant we had to give up time-reversibility, and we’ve been willing to live with that for a long time because we can study more complicated phenomena.

If we have access to a time-reversible pseudo-random number generator, however, we get that very powerful tool back in our toolbox.

Now, the Langevin equation,

\(m \frac{d^2 x}{dt^2} = F – \gamma(t) \frac{dx}{dt} + R(t)\)


has two things that prevent it from being time-reversible.  Besides the stochastic or random force, \(R(t)\), there’s also a drag or friction force, \(-\gamma(t) \frac{dx}{dt}\), that depends on the velocities of the particles.  There’s no solution yet to time reversibility for this piece (and I have my doubts that there ever will be a way to reverse this).  I suppose if we offer up another $500 prize for time-reversible drag, we’d make some traction on this problem…

(The comic above courtesy of xkcd).