Galaxy tutorial


CBGP 2013


Mikel Egaña Aranguren
http://mikeleganaaranguren.com / mikel.egana.aranguren@gmail.com
Biological Informatics Group (http://wilkinsonlab.info), CBGP, UPM, Madrid


Galaxy tutorial



GitHub repo (slides, data, documentation, ...): https://github.com/mikel-egana-aranguren/Galaxy_tutorial


GitHub page (this slides): http://mikel-egana-aranguren.github.io/Galaxy_tutorial/galaxy.html



by-nc-sa

What is Galaxy?

What is Galaxy for?


A web server that offers the usual bioinformatics tools in a central space with ...


... data storage


... history


... workflows


Why is Galaxy so good?


Very complex computational analyses

... easily

... with provenance


Reproducible science!


On reproducible science


On reproducible science


Rule 1: For Every Result, Keep Track of How It Was Produced


Rule 5: Record All Intermediate Results, When Possible in Standardized Formats


Rule 9: Connect Textual Statements to Underlying Results


Rule 10: Provide Public Access to Scripts, Runs, and Results


Reproducible science: Galaxy for executable papers


http://usegalaxy.org/u/aun1/p/heteroplasmy

More information


Galaxy main server (Limitations on data size, computational resources): https://usegalaxy.org/


Our own server at CBGP: http://biordf.org:8983/


Galaxy documentation: http://wiki.galaxyproject.org


Using Galaxy

Main interface


Hands-on 1 (Guided exercise based on Galaxy 101)


http://usegalaxy.org/u/aun1/p/galaxy101


Hands-on 2 (On your own)


What genes of Arabidopsis thaliana are annotated against the "cell cycle" (GO:0007049) subtree of the Gene Ontology (GO)?


http://biordf.org:8983/


Hints (General steps)


  1. get data; upload file
    • data/gene_ontology.1_2.obo (GO)
    • data/gene_association.tair (GAF)
  2. Prepare GAF by removing lines starting with "!" (with a regular expression) and converting tab separated content to actual columns
  3. Use OBO ontology manipulation; Get the descendent terms of a given OBO term to obtain the cell cycle subtree from GO
  4. Compare the GO ids list with GAF to extract the rows that match


Solution


Use the Workflow (import and run) or the history


Sharing your stuff


You can make workflows and histories public with a URL, share with other users, and import/export them


You can also create a Galaxy page


Local installation


Usually a good idea to install Galaxy locally, even just for yourself:


... provenance when you sit down and write the paper


... sensitive data


... install/develop other Galaxy tools


Local installation requirements


Some UNIX flavour (GNU/Linux, MAC OS X*)


Python


Mercurial (hg)


Install with Mercurial


hg clone https://bitbucket.org/galaxy/galaxy-dist/


cd galaxy-dist


hg update stable


Use


Run:

  • ./run.sh
  • nohup ./run.sh &


Open http://127.0.0.1:8080/ and do your thing


Stop (history, workflows etc. will be stored but execution will interrupt):

  • Close terminal or CTRL-C
  • kill -9 PID

Update


hg incoming


hg pull -u


Installing tools


Add yourself as admin (add admin_users = your_email to universe_wsgi.ini)


Restart, log in and install through admin; tool sheds


or install manually* (Very basic XML and UNIX skills needed)


Finishing remarks


Feel free to use http://biordf.org:8983/ (No security/performance warranty though ;-)


If you need a tool and you don't find it in the tool shed, let us know and perhaps we can develop it (specially if there is a CLI version of the tool and/or the tool is interesting).


This presentation was produced using Reveal.js