Galaxy Initiation

name: inverse
layout: true
class: center, middle, inverse

---
layout: false
background-image: url("../images/presentation.png")
<div style="position: absolute; bottom: 36.5%; left: 51%; color: white" >
<h1 style="position: relative; width:400px; height:70px">DBC</h1>
 </div>
<div style="position: absolute; bottom: 4%; left: 1%; color: white" >
<h3 style="position: relative; width:400px; height:70px">From command line to web interface</h3>
    <h4 style="font-size: 15px"><p>Fabien Mareuil, Rémi Planel</p>
    </h4>
</div>
<img style="position: absolute; top: 2%; right: 2%" src="../images/LogoIP-CNRS-C3BI-NBV4small-e1460524231316.png" width="200">
---
name: plan

## Plan

* Quick introduction to Galaxy; a workflow managment software
* Part 1 - **The web interface**:
    * Homepage, tools, history, tool interface.
    * Data Library pannel
    * Workflow building panel
* Part 2 - **Managing data**:
    * Data import:
        * data import using 'upload file'
	    * data import using 'shared data'
    * Job launching
    * Data annotation
* Part 3 - **Workflow construction**:
    * NGS workflow
    * Workflow annotation
* Part 4 - **Galaxy and Reproducibility**
    * Pages
    * Sharing galaxy object

---
class: center, middle, inverse
## PART 1
## Quick introduction to Galaxy

---
layout: false
## Galaxy : a workflow managment software
* Galaxy is a web interface providing scientists, tools usually launched through command lines 
* **Basics** :
    * A galaxy tool is a program linked to a xml file
    * The XML file gathers options, types, inputs, outputs, formats and the tool command line.
    * Galaxy interprets the xml to generate the web interface 
* **Advantages++** :
    * Deals with Big Data
    * Creates workflows (tool chains) 
    * Enables saving and sharing workflows and data

* Galaxy@Pasteur is available at **galaxy.pasteur.fr**:
    * It is available for internal users since March 2013 and external users since October 2016
    * It is linked to the Institut Pasteur storage bay, enabling big data upload
    * We provide support for tools developement and tools installation in Galaxy

---
## Galaxy presentation: Homepage, central panel

---
## Galaxy presentation: Homepage, left panel

---
## Galaxy presentation: Homepage, right panel

---
## Data libraries management panel

---
## Workflow construction panel

---
## Hands on: It's your turn

* **Connexion to Galaxy**:
    * https://galaxy.pasteur.fr (linux/mac/windows)
    * if you do not have an account, register (menu User/Register)
    * email address (you will receive an email to activate your account):
        * use your pasteur address
    * password: as you want
    * public name: as you want
* **Exploration of the "Analyse Data" tab**: 
    * Tools
        * search your prefered tools and look for its different options
    * History
        * use the wheel icon in the history panel to create a new history and rename it
    * Central panel
        * load a tool (by clicking on it) and look the central panel

---
## Hands on: It's your turn

---
layout: false
## Upload a data
<img src="../images/upload.png" width="100%" height="100%" align="center" valign="center">
* Each data has a **FORMAT** and **FORMATS** have a central role for Galaxy
* **It defines which program can be executed on a selected data**

---
layout: false
## Upload a data
<img src="../images/upload_2.png" width="100%" height="100%" align="center" valign="center">
* If the user knows the data format, he should set it at the upload step.

---
## Hands on: It's your turn

* Go to Galaxy
    * Search the category "Get Data" in the tool list

* Select "upload file"  
        --> import the two fastq files

* The two fastq files should be in the current history
    
---
## Launch a job
<img src="../images/launch_job.png" width="100%" height="100%" align="center" valign="center">

---
## Hands on - Get familiar with Galaxy tools (1/2)

The aim here is to **filter** duplicate reads from a fastq file (or 2), first to **detect** them, then **extract** those reads.
Finally we want you to **run a mapping** on a reference genome that we will provide you.

Select a tool in the tool list: 
* Category *NGS: QC and manipulation*, tool: *fqduplicate*

- No input possible because it needs fastqsanger format and not fastq

---
## Hands on - Get familiar with Galaxy tools (1/2)

Select a tool in the tool list: 
* Category *NGS: QC and manipulation*, tool: *fqduplicate*
    - No input possible because it needs fastqsanger format and not fastq

* Solution: Choose the *fqconvert* tool in the category *Convert Formats*
    - RUN *fqconvert* on one of the fastq input files

---
## Hands on - Get familiar with Galaxy tools (1/2)

Select a tool in the tool list: 
* Category *NGS: QC and manipulation*, tool: *fqduplicate*
    - No input possible because it needs fastqsanger format and not fastq

* Solution: Choose the *fqconvert* tool in the category *Convert Formats*
    - RUN *fqconvert* on one of the fastq input files

* Go back to *fqduplicate*
    - RUN with *single read* parameter

---
## Hands on - Get familiar with Galaxy tools (2/2)

To extract the duplicate reads, use: 
* *fqextract* - parameters setting
    - **Type of input data **: single read
    - **Original *fastq* dataset from your history**: *fastqsanger* output from *fqconvert*
    - ** *Fastq* format **: *illumina1.8/Sanger*
    - **Ignore pair information for the extraction (-p)**: True
    - **Dataset from your history containing the list of read names to extract or exclude (-l)**: *fqduplicate* from *fastqsanger*
    - **Extract or exclude reads in the list**: Extract listed reads
    - RUN *fqextract*

---
## Hands on - Select a tool in the tool list (2/2)

To map the reads, use: 
* Category NGS: Mapping: Parallel Map with BWA for Illumina
    * Will you select a reference genome from your history or use a built-in index?: Use one from the history
    * You need the reference genome,  warning: It can be a big file
---
## Upload a data from ftp (command line) (1/2)

*Command line (Terminal) solution*
* To establish the connection with the sftp server from your Linux or Mac computer:  
```bash
sftp -P 2222 yourlogin@galaxy.pasteur.fr
```
* Remark if data are on tars, first connect to it (with your pasteurID password):  
```bash
ssh yourlogin@tars.pasteur.fr
```
.center[**REMARK: Use the galaxy password for the sftp connection**]
---
## Upload a data from ftp  (command line) (2/2)

*Command line (Terminal) solution*
* To upload a file on the sftp server:
```bash
sftp>put PathToYourFile
```
* To upload a directory on the sftp server:
    * On the sftp server create a directory with the same name that the directory you wish to upload:
    ```bash
    sftp>mkdir mydirectory
    sftp>put -r mydirectory
    ```

---
## Upload a data from ftp (Filezilla)

*Filezilla solution*: You can use filezilla to transfer data from your computer to your Galaxy transfer directory 
* Launch filezilla
* Open the site manage: ctrl+s 
* Click on new site
   * host: galaxy.pasteur.fr
   * port: 2222
   * Protocol: sftp - ssh File Transfer Protocol
   * Logon Type: Ask for password **galaxy password**
   * User: yourlogin
* Drag'n drop your data from the left windows to the right windows
<img src="../images/filezilla.png" width="100%" height="100%" align="center" valign="center">

---
## Upload a data from ftp (in Galaxy) (1/2)

* Then use the galaxy web interface to transfer the data in your Galaxy library 
* Click on your library
<img src="../images/shared_data_1.png" width="100%" height="100%" align="center" valign="center">
* It's possible to create folder to organize your data
<img src="../images/shared_data_2.png" width="100%" height="100%" align="center" valign="center">
* To upload file you transfered in your link directory (from User Directory):
<img src="../images/shared_data_3.png" width="100%" height="100%" align="center" valign="center">

---
## Upload a data from ftp (in Galaxy) (2/2)

<img src="../images/shared_data_4.png" width="100%" height="100%" align="center" valign="center">
* Then select your data and click on the icon *to History*

<img src="../images/history_upload.png" width="100%" height="100%" align="center" valign="center">
<img src="../images/history_upload_2.png" width="100%" height="100%" align="center" valign="center">
---
## Manage the datatype

* The format can be fixed but if you forget or put a wrong format, it is possible to change it within Galaxy:

---
## Data Annotation / Tag

* Add tag or annotation to datasets or to histories
* Possibility to search the tag within several histories

---
## Hands On: Upload from ftp

* If data are on Atlas:
    * In a shell terminal launch the following commands: 
```bash
ssh mylogin@central-bio.pasteur.fr or tars.pasteur.fr
sftp commands (see slide 19-20)
```
* If data are on your computer, with filezilla:
    * Launch filezilla
    * Enter Host, Username, ... (see slide 26)
    * Drag and drop your data, NC_002929_Bpertussis.fasta 
* If data are on your computer, with command line:
    * In a new shell terminal, launch the following commands:
```bash
cd Pathofmydata
sftp commands (see slide 19-20)
```

---
## Hands On: Upload from ftp and Launch BWA

* When data are in the sftp server:
* Go to Galaxy:

* On the TAB *shared data/Data libraries*
    * Click on your library; mylogin@pasteur.fr
    * Click on add datasets
        * Upload option: Upload directory of files
        * Copy files into Galaxy 
        * Upload to library
        * Check that your data is selected and import to current history
    * Go back to the TAB "*Analyse data*" to check your history
    * Launch the BWA tool with the imported genome as the reference genome

---
class: center, middle, inverse
## PART 3
## Workflow Construction

---
layout: false
## Workflow construction

* Create an empty workflow

---
## Workflow construction

---
## Workflow construction

---
## Workflow annotation

* Add tag or annotation for a whole workflow 
* But also for each step of the workflow (description, or keyword to easily retrieve the concerned datasets

---
## Run your workflow

<img src="../images/workflow_run.png" width="100%" height="100%" align="center" valign="center">
---
## Export a Data on ftp

* Data will be export on the sftp server.
* To copy it on your computer you can use filezilla or the command line (Terminal): 
 
```bash
sftp -P 2222 yourlogin@galaxy.pasteur.fr
sftp> get yourData
```

* You can delete data with the cross, and purge them with the wheel menu option *purge deleted datasets*

---
## Hands On: Create a workflow and run it
* Create a workflow

* Create workflow, give it a name
    * Search, link and set parameters for each tool
    * Click on the wheel, save, then RUN the workflow

---
## Hands On: Create a workflow and run it
* Create a workflow

* Create workflow, give it a name
    * Search, link and set parameters for each tool
    * Click on the wheel, save, then RUN the workflow
* To run the workflow several times

* Add the Input dataset box of the category Workflow control/ Inputs
    * Save then run
    * Click on the little file logo near the Input Dataset of the Step 1
    * Add the 2 input files and RUN.

---
## Hands On: Create a workflow and run it
* Create a workflow

* Create workflow, give it a name
    * Search, link and set parameters for each tool
    * Click on the wheel, save, then RUN the workflow
* To run the workflow several times

* Add the Input dataset box of the category Workflow control/ Inputs
    * Save then run
    * Click on the little file logo near the Input Dataset of the Step 1
    * Add the 2 input files and RUN.
* To finish, delete data

* Click on the cross of each file
    * Click on the history wheel and on include deleted datasets
        * Data are not really deleted
    * Click on the wheel and select purge deleted datasets or on delete permanently to delete the complete history or delete each data individually

---
class: center, middle, inverse
## PART 4
## Galaxy & Reproductibility

---
layout: false

## Pages
<img src="../images/pages_1.png" width="100%" height="100%" align="center" valign="center">

* Building Pages

---
## Pages

* Pages example:
    * https://galaxy.pasteur.fr/u/fmareuil/p/formation-galaxy
<img src="../images/pages_4.png" width="100%" height="100%" align="center" valign="center">
    * https://galaxy.pasteur.fr/u/dcorreia/p/ngphylogenyfr-oneclick-workflows
<img src="../images/pages_damien.png" width="100%" height="100%" align="center" valign="center">

---
## Sharing Data with Galaxy

* Click on the wheel menu and select *Share or Publish*

---
## Sharing Workflow with Galaxy

---
class: center, middle, inverse
## PART 5
## To go a little further

---
layout: false
## Toolshed

* Pasteur users can also access to a pasteur graphical tool repositories: 
    * https://toolshed.pasteur.fr/
* In addition to official repositories
    * http://toolshed.g2.bx.psu.edu/ et http://testtoolshed.g2.bx.psu.edu

---
## Writing XMLs

* 4 parts:

* Description, ID and requirements
    * Command line
    * Arguments and command line options descriptions
    * Documentation and functional tests

* https://docs.galaxyproject.org/en/master/dev/schema.html

---
## Writing XMLs
* Once the software is installed on the cluster; the XML is to be put in a toolshed repository and at the end, contact us to install it in the Galaxy instance

---
## From Galaxy to the command line:
   * Galaxy provides an API to allow scripting and administration of all your tasks
   
   	* https://docs.galaxyproject.org/en/master/api/quickstart.html
	
   * Bioblend is a python library for interacting with Galaxy's API.
   
       * https://bioblend.readthedocs.io/en/latest/
       * https://c3bi-pasteur-fr.github.io/Galaxy_training_material/pasteur_bioblend/slides/bioblend_api#1

---
##.center[Thank you for your attention]

* Contact :
    * https://c3bi.pasteur.fr/ask/?ask=Ask+Short+Question  
    *(this address may change but it will appear on the Galaxy Page)*

* Useful Addresses:
    Galaxy at Institut Pasteur: 
    * https://galaxy.pasteur.fr/
    * https://toolshed.pasteur.fr/

* Use Galaxy :
    * https://galaxy.pasteur.fr/tours
    * http://wiki.galaxyproject.org/Learn
    * https://vimeo.com/galaxyproject/videos

* Write Galaxy xmls:
    * https://docs.galaxyproject.org/en/master/dev/schema.html