name: inverse layout: true class: center, middle, inverse --- layout: false background-image: url("../images/presentation.png")
DBC
From command line to web interface
Fabien Mareuil, RĂ©mi Planel
--- name: plan ## Plan * Quick introduction to Galaxy; a workflow managment software * Part 1 - **The web interface**: * Homepage, tools, history, tool interface. * Data Library pannel * Workflow building panel * Part 2 - **Managing data**: * Data import: * data import using 'upload file' * data import using 'shared data' * Job launching * Data annotation * Part 3 - **Workflow construction**: * NGS workflow * Workflow annotation * Part 4 - **Galaxy and Reproducibility** * Pages * Sharing galaxy object --- class: center, middle, inverse ## PART 1 ## Quick introduction to Galaxy --- layout: false ## Galaxy : a workflow managment software * Galaxy is a web interface providing scientists, tools usually launched through command lines * **Basics** : * A galaxy tool is a program linked to a xml file * The XML file gathers options, types, inputs, outputs, formats and the tool command line. * Galaxy interprets the xml to generate the web interface * **Advantages++** : * Deals with Big Data * Creates workflows (tool chains) * Enables saving and sharing workflows and data * Galaxy@Pasteur is available at **galaxy.pasteur.fr**: * It is available for internal users since March 2013 and external users since October 2016 * It is linked to the Institut Pasteur storage bay, enabling big data upload * We provide support for tools developement and tools installation in Galaxy --- ## Galaxy presentation: Homepage, central panel
--- ## Galaxy presentation: Homepage, left panel
--- ## Galaxy presentation: Homepage, right panel
--- ## Data libraries management panel
--- ## Workflow construction panel
--- ## Hands on: It's your turn * **Connexion to Galaxy**: * https://galaxy.pasteur.fr (linux/mac/windows) * if you do not have an account, register (menu User/Register) * email address (you will receive an email to activate your account): * use your pasteur address * password: as you want * public name: as you want --- ## Hands on: It's your turn * **Connexion to Galaxy**: * https://galaxy.pasteur.fr (linux/mac/windows) * if you do not have an account, register (menu User/Register) * email address (you will receive an email to activate your account): * use your pasteur address * password: as you want * public name: as you want * **Exploration of the "Analyse Data" tab**: * Tools * search your prefered tools and look for its different options * History * use the wheel icon in the history panel to create a new history and rename it * Central panel * load a tool (by clicking on it) and look the central panel --- ## Hands on: It's your turn * **Connexion to Galaxy**: * https://galaxy.pasteur.fr (linux/mac/windows) * if you do not have an account, register (menu User/Register) * email address (you will receive an email to activate your account): * use your pasteur address * password: as you want * public name: as you want * **Exploration of the "Analyse Data" tab**: * Tools * search your prefered tools and look for its different options * History * use the wheel icon in the history panel to create a new history and rename it * Central panel * load a tool (by clicking on it) and look the central panel * **Exploration of 2 other tabs**: * Workflow * we will play with it later * Shared Data * find your library, it should be easy --- class: center, middle, inverse ## PART 2 ## Managing your data with Galaxy --- layout: false ## Upload a data
* Each data has a **FORMAT** and **FORMATS** have a central role for Galaxy * **It defines which program can be executed on a selected data** --- layout: false ## Upload a data
* If the user knows the data format, he should set it at the upload step. --- ## Hands on: It's your turn * Go to Galaxy * Search the category "Get Data" in the tool list * Select "upload file" --> import the two fastq files * The two fastq files should be in the current history --- ## Launch a job
--- ## Hands on - Get familiar with Galaxy tools (1/2) The aim here is to **filter** duplicate reads from a fastq file (or 2), first to **detect** them, then **extract** those reads. Finally we want you to **run a mapping** on a reference genome that we will provide you. Select a tool in the tool list: * Category *NGS: QC and manipulation*, tool: *fqduplicate* - No input possible because it needs fastqsanger format and not fastq --- ## Hands on - Get familiar with Galaxy tools (1/2) The aim here is to **filter** duplicate reads from a fastq file (or 2), first to **detect** them, then **extract** those reads. Finally we want you to **run a mapping** on a reference genome that we will provide you. Select a tool in the tool list: * Category *NGS: QC and manipulation*, tool: *fqduplicate* - No input possible because it needs fastqsanger format and not fastq * Solution: Choose the *fqconvert* tool in the category *Convert Formats* - RUN *fqconvert* on one of the fastq input files --- ## Hands on - Get familiar with Galaxy tools (1/2) The aim here is to **filter** duplicate reads from a fastq file (or 2), first to **detect** them, then **extract** those reads. Finally we want you to **run a mapping** on a reference genome that we will provide. Select a tool in the tool list: * Category *NGS: QC and manipulation*, tool: *fqduplicate* - No input possible because it needs fastqsanger format and not fastq * Solution: Choose the *fqconvert* tool in the category *Convert Formats* - RUN *fqconvert* on one of the fastq input files * Go back to *fqduplicate* - RUN with *single read* parameter --- ## Hands on - Get familiar with Galaxy tools (2/2) To extract the duplicate reads, use: * *fqextract* - parameters setting - **Type of input data **: single read - **Original *fastq* dataset from your history**: *fastqsanger* output from *fqconvert* - ** *Fastq* format **: *illumina1.8/Sanger* - **Ignore pair information for the extraction (-p)**: True - **Dataset from your history containing the list of read names to extract or exclude (-l)**: *fqduplicate* from *fastqsanger* - **Extract or exclude reads in the list**: Extract listed reads - RUN *fqextract* --- ## Hands on - Select a tool in the tool list (2/2) To extract the duplicate reads, use: * *fqextract* - parameters setting - **Type of input data **: single read - **Original *fastq* dataset from your history**: *fastqsanger* output from *fqconvert* - ** *Fastq* format **: *illumina1.8/Sanger* - **Ignore pair information for the extraction (-p)**: True - **Dataset from your history containing the list of read names to extract or exclude (-l)**: *fqduplicate* from *fastqsanger* - **Extract or exclude reads in the list**: Extract listed reads - RUN *fqextract* To map the reads, use: * Category NGS: Mapping: Parallel Map with BWA for Illumina * Will you select a reference genome from your history or use a built-in index?: Use one from the history * You need the reference genome, warning: It can be a big file --- ## Upload a data from ftp (command line) (1/2) *Command line (Terminal) solution* * To establish the connection with the sftp server from your Linux or Mac computer: ```bash sftp -P 2222 yourlogin@galaxy.pasteur.fr ``` * Remark if data are on tars, first connect to it (with your pasteurID password): ```bash ssh yourlogin@tars.pasteur.fr ``` .center[**REMARK: Use the galaxy password for the sftp connection**] --- ## Upload a data from ftp (command line) (2/2) *Command line (Terminal) solution* * To upload a file on the sftp server: ```bash sftp>put PathToYourFile ``` * To upload a directory on the sftp server: * On the sftp server create a directory with the same name that the directory you wish to upload: ```bash sftp>mkdir mydirectory sftp>put -r mydirectory ``` --- ## Upload a data from ftp (Filezilla) *Filezilla solution*: You can use filezilla to transfer data from your computer to your Galaxy transfer directory * Launch filezilla * Open the site manage: ctrl+s * Click on new site * host: galaxy.pasteur.fr * port: 2222 * Protocol: sftp - ssh File Transfer Protocol * Logon Type: Ask for password **galaxy password** * User: yourlogin --- ## Upload a data from ftp (Filezilla) *Filezilla solution*: You can use filezilla to transfer data from your computer to your Galaxy transfer directory * Launch filezilla * Open the site manage: ctrl+s * Click on new site * host: galaxy.pasteur.fr * port: 2222 * Protocol: sftp - ssh File Transfer Protocol * Logon Type: Ask for password **galaxy password** * User: yourlogin * Drag'n drop your data from the left windows to the right windows
--- ## Upload a data from ftp (in Galaxy) (1/2) * Then use the galaxy web interface to transfer the data in your Galaxy library * Click on your library
* It's possible to create folder to organize your data
* To upload file you transfered in your link directory (from User Directory):
--- ## Upload a data from ftp (in Galaxy) (2/2)
* Then select your data and click on the icon *to History*
--- ## Manage the datatype * The format can be fixed but if you forget or put a wrong format, it is possible to change it within Galaxy:
--- ## Data Annotation / Tag * Add tag or annotation to datasets or to histories * Possibility to search the tag within several histories
--- ## Hands On: Upload from ftp * If data are on Atlas: * In a shell terminal launch the following commands: ```bash ssh mylogin@central-bio.pasteur.fr or tars.pasteur.fr sftp commands (see slide 19-20) ``` * If data are on your computer, with filezilla: * Launch filezilla * Enter Host, Username, ... (see slide 26) * Drag and drop your data, NC_002929_Bpertussis.fasta * If data are on your computer, with command line: * In a new shell terminal, launch the following commands: ```bash cd Pathofmydata sftp commands (see slide 19-20) ``` --- ## Hands On: Upload from ftp and Launch BWA * When data are in the sftp server: * Go to Galaxy: * On the TAB *shared data/Data libraries* * Click on your library; mylogin@pasteur.fr * Click on add datasets * Upload option: Upload directory of files * Copy files into Galaxy * Upload to library * Check that your data is selected and import to current history * Go back to the TAB "*Analyse data*" to check your history * Launch the BWA tool with the imported genome as the reference genome --- class: center, middle, inverse ## PART 3 ## Workflow Construction --- layout: false ## Workflow construction * Create an empty workflow
--- ## Workflow construction
--- ## Workflow construction
--- ## Workflow annotation * Add tag or annotation for a whole workflow * But also for each step of the workflow (description, or keyword to easily retrieve the concerned datasets
--- ## Run your workflow
--- ## Export a Data on ftp
* Data will be export on the sftp server. * To copy it on your computer you can use filezilla or the command line (Terminal): ```bash sftp -P 2222 yourlogin@galaxy.pasteur.fr sftp> get yourData ``` * You can delete data with the cross, and purge them with the wheel menu option *purge deleted datasets* --- ## Hands On: Create a workflow and run it * Create a workflow * Create workflow, give it a name * Search, link and set parameters for each tool * Click on the wheel, save, then RUN the workflow --- ## Hands On: Create a workflow and run it * Create a workflow * Create workflow, give it a name * Search, link and set parameters for each tool * Click on the wheel, save, then RUN the workflow * To run the workflow several times * Add the Input dataset box of the category Workflow control/ Inputs * Save then run * Click on the little file logo near the Input Dataset of the Step 1 * Add the 2 input files and RUN. --- ## Hands On: Create a workflow and run it * Create a workflow * Create workflow, give it a name * Search, link and set parameters for each tool * Click on the wheel, save, then RUN the workflow * To run the workflow several times * Add the Input dataset box of the category Workflow control/ Inputs * Save then run * Click on the little file logo near the Input Dataset of the Step 1 * Add the 2 input files and RUN. * To finish, delete data * Click on the cross of each file * Click on the history wheel and on include deleted datasets * Data are not really deleted * Click on the wheel and select purge deleted datasets or on delete permanently to delete the complete history or delete each data individually --- class: center, middle, inverse ## PART 4 ## Galaxy & Reproductibility --- layout: false ## Pages
* Building Pages
--- ## Pages * Pages example: * https://galaxy.pasteur.fr/u/fmareuil/p/formation-galaxy
* https://galaxy.pasteur.fr/u/dcorreia/p/ngphylogenyfr-oneclick-workflows
--- ## Sharing Data with Galaxy * Click on the wheel menu and select *Share or Publish*
--- ## Sharing Workflow with Galaxy
--- class: center, middle, inverse ## PART 5 ## To go a little further --- layout: false ## Toolshed * Pasteur users can also access to a pasteur graphical tool repositories: * https://toolshed.pasteur.fr/ * In addition to official repositories * http://toolshed.g2.bx.psu.edu/ et http://testtoolshed.g2.bx.psu.edu
--- ## Writing XMLs * 4 parts: * Description, ID and requirements * Command line * Arguments and command line options descriptions * Documentation and functional tests * https://docs.galaxyproject.org/en/master/dev/schema.html --- ## Writing XMLs * Once the software is installed on the cluster; the XML is to be put in a toolshed repository and at the end, contact us to install it in the Galaxy instance
--- ## From Galaxy to the command line: * Galaxy provides an API to allow scripting and administration of all your tasks * https://docs.galaxyproject.org/en/master/api/quickstart.html * Bioblend is a python library for interacting with Galaxy's API. * https://bioblend.readthedocs.io/en/latest/ * https://c3bi-pasteur-fr.github.io/Galaxy_training_material/pasteur_bioblend/slides/bioblend_api#1 --- ##.center[Thank you for your attention] * Contact : * https://c3bi.pasteur.fr/ask/?ask=Ask+Short+Question *(this address may change but it will appear on the Galaxy Page)* * Useful Addresses: Galaxy at Institut Pasteur: * https://galaxy.pasteur.fr/ * https://toolshed.pasteur.fr/ * Use Galaxy : * https://galaxy.pasteur.fr/tours * http://wiki.galaxyproject.org/Learn * https://vimeo.com/galaxyproject/videos * Write Galaxy xmls: * https://docs.galaxyproject.org/en/master/dev/schema.html