[Originally posted at LinkedIn]

I have just stumbled upon this thread on why one should use Galaxy (https://www.biostars.org/p/50034/). One of the reasons posted is reproducibility, but Galaxy only solves one level of reproducibility, "functional reproducibility" (What I did with the data). There is at least two other levels, one "bellow" Galaxy and another one "above" Galaxy:
  • Bellow: computational environment: Operating System, library dependencies, binaries.
  • Above: semantics. What the data means.
In order to be completely reproducible, one has to be reproducible on the three levels:
  1. Computational: Docker.
  2. Functional: Galaxy.
  3. Semantics: URIs, RDF, SPARQL, OWL.
And how to do it is described in our GigaScience paper, "Enhanced reproducibility of SADI Web Service Worfkflows with Galaxy and Docker" :-) (http://www.gigasciencejournal.com/content/4/1/59)
Just to emphasize and clarify, the 3 levels would be:
3.- Semantics: what the data means.
2.- Functional: what I did with the data.
1.- Computational: how I did it.