.. _manual-main:

=========
Snakemake
=========

.. image:: file:///usr/share/doc/snakemake/html/_svg/install_with-bioconda-brightgreen.svg
    :target: https://bioconda.github.io/recipes/snakemake/README.html

.. image:: file:///usr/share/doc/snakemake/html/_svg/pyversions_snakemake.svg
    :target: https://www.python.org

.. image:: file:///usr/share/doc/snakemake/html/_svg/snakemake.svg
    :target: https://pypi.python.org/pypi/snakemake

.. image:: file:///usr/share/doc/snakemake/html/_svg/status.svg
       :target: https://quay.io/repository/snakemake/snakemake

.. image:: file:///usr/share/doc/snakemake/html/_svg/stack-overflow-orange.svg
    :target: http://stackoverflow.com/questions/tagged/snakemake

.. image:: file:///usr/share/doc/snakemake/html/_svg/johanneskoester_follow.svg
    :target: https://twitter.com/johanneskoester

The Snakemake workflow management system is a tool to create **reproducible and scalable** data analyses.
Workflows are described via a human readable, Python based language.
They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition.
Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.


.. _manual-quick_example:

-------------
Quick Example
-------------

Snakemake workflows are essentially Python scripts extended by declarative code to define **rules**.
Rules describe how to create **output files** from **input files**.

.. code-block:: python

    rule targets:
        input:
            "plots/dataset1.pdf",
            "plots/dataset2.pdf"

    rule plot:
        input:
            "raw/{dataset}.csv"
        output:
            "plots/{dataset}.pdf"
        shell:
            "somecommand {input} {output}"


* Similar to GNU Make, you specify targets in terms of a pseudo-rule at the top.
* For each target and intermediate file, you create rules that define how they are created from input files.
* Snakemake determines the rule dependencies by matching file names.
* Input and output files can contain multiple named wildcards.
* Rules can either use shell commands, plain Python code or external Python or R scripts to create output files from input files.
* Snakemake workflows can be easily executed on **workstations**, **clusters**, **the grid**, and **in the cloud** without modification. The job scheduling can be constrained by arbitrary resources like e.g. available CPU cores, memory or GPUs.
* Snakemake can automatically deploy required software dependencies of a workflow using `Conda <https://conda.io>`_ or `Singularity <http://singularity.lbl.gov/>`_.
* Snakemake can use Amazon S3, Google Storage, Dropbox, FTP, WebDAV and SFTP to access input or output files and further access input files via HTTP and HTTPS.

.. _main-getting-started:

---------------
Getting started
---------------

To get started, consider the :ref:`tutorial`, the `introductory slides <http://slides.com/johanneskoester/snakemake-tutorial-2016>`_, and the :ref:`FAQ <project_info-faq>`.

.. _main-support:

-------
Support
-------

* First, check the :ref:`FAQ <project_info-faq>`.
* In case of questions, please post on `stack overflow <http://stackoverflow.com/questions/tagged/snakemake>`_.
* To discuss with other Snakemake users, you can use the `mailing list <https://groups.google.com/forum/#!forum/snakemake>`_. **Please do not post questions there. Use stack overflow for questions.**
* For bugs and feature requests, please use the `issue tracker <https://bitbucket.org/snakemake/snakemake/issues>`_.
* For contributions, visit Snakemake on `bitbucket <https://bitbucket.org/snakemake/snakemake>`_ and read the :ref:`guidelines <project_info-contributing>`.

--------
Citation
--------

`Köster, Johannes and Rahmann, Sven. "Snakemake - A scalable bioinformatics workflow engine". Bioinformatics 2012. <http://bioinformatics.oxfordjournals.org/content/28/19/2520>`_

See :doc:`Citations <project_info/citations>` for more information.

---------
Resources
---------

`Snakemake Wrappers Repository <https://snakemake-wrappers.readthedocs.org>`_
    The Snakemake Wrapper Repository is a collection of reusable wrappers that allow to quickly use popular tools from Snakemake rules and workflows.

`Snakemake Workflows Project <https://github.com/snakemake-workflows/docs>`_
    This project provides a collection of high quality modularized and re-usable workflows.
    The provided code should also serve as a best-practices of how to build production ready workflows with Snakemake.
    Everybody is invited to contribute.

`Snakemake Profiles Project <https://github.com/snakemake-profiles/doc`_
    This project provides Snakemake configuration profiles for various execution environments.
    Please consider contributing your own if it is still missing.

`Bioconda <https://bioconda.github.io/>`_
    Bioconda can be used from Snakemake for creating completely reproducible workflows by defining the used software versions and providing binaries.


.. project_info-publications_using:

----------------------------
Publications using Snakemake
----------------------------

In the following you find an incomplete list of publications making use of Snakemake for their analyses.
Please consider to add your own.

* Uhlitz et al. 2017. `An immediate–late gene expression module decodes ERK signal duration <http://msb.embopress.org/content/13/5/928>`_. Molecular Systems Biology.
* Akkouche et al. 2017. `Piwi Is Required during Drosophila Embryogenesis to License Dual-Strand piRNA Clusters for Transposon Repression in Adult Ovaries <http://www.sciencedirect.com/science/article/pii/S1097276517302071>`_. Molecular Cell.
* Beatty et al. 2017. `Giardia duodenalis induces pathogenic dysbiosis of human intestinal microbiota biofilms <>`_. International Journal for Parasitology.
* Meyer et al. 2017. `Differential Gene Expression in the Human Brain Is Associated with Conserved, but Not Accelerated, Noncoding Sequences <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5400397/>`_. Molecular Biology and Evolution.
* Lonardo et al. 2017. `Priming of soil organic matter: Chemical structure of added compounds is more important than the energy content <http://www.sciencedirect.com/science/article/pii/S0038071716304539>`_. Soil Biology and Biochemistry.
* Beisser et al. 2017. `Comprehensive transcriptome analysis provides new insights into nutritional strategies and phylogenetic relationships of chrysophytes <https://peerj.com/articles/2832/>`_. PeerJ.
* Dimitrov et al 2017. `Successive DNA extractions improve characterization of soil microbial communities <https://peerj.com/articles/2915/>`_. PeerJ.
* de Bourcy et al. 2016. `Phylogenetic analysis of the human antibody repertoire reveals quantitative signatures of immune senescence and aging <http://www.pnas.org/content/114/5/1105.short>`_. PNAS.
* Bray et al. 2016. `Near-optimal probabilistic RNA-seq quantification<http://www.nature.com/nbt/journal/v34/n5/abs/nbt.3519.html>`_. Nature Biotechnology.
* Etournay et al. 2016. `TissueMiner: a multiscale analysis toolkit to quantify how cellular processes create tissue dynamics <https://elifesciences.org/content/5/e14334>`_. eLife Sciences.
* Townsend et al. 2016. `The Public Repository of Xenografts Enables Discovery and Randomized Phase II-like Trials in Mice <http://www.cell.com/cancer-cell/abstract/S1535-6108%2816%2930090-3>`_. Cancer Cell.
* Burrows et al. 2016. `Genetic Variation, Not Cell Type of Origin, Underlies the Majority of Identifiable Regulatory Differences in iPSCs <http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005793>`_. PLOS Genetics.
* Ziller et al. 2015. `Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing <http://www.nature.com/nmeth/journal/v12/n3/full/nmeth.3152.html>`_. Nature Methods.
* Li et al. 2015. `Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR <https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0843-6>`_. Genome Biology.
* Schmied et al. 2015. `An automated workflow for parallel processing of large multiview SPIM recordings <http://bioinformatics.oxfordjournals.org/content/32/7/1112>`_. Bioinformatics.
* Chung et al. 2015. `Whole-Genome Sequencing and Integrative Genomic Analysis Approach on Two 22q11.2 Deletion Syndrome Family Trios for Genotype to Phenotype Correlations <http://onlinelibrary.wiley.com/doi/10.1002/humu.22814/full>`_. Human Mutation.
* Kim et al. 2015. `TUT7 controls the fate of precursor microRNAs by using three different uridylation mechanisms <http://emboj.embopress.org/content/34/13/1801.long>`_. The EMBO Journal.
* Park et al. 2015. `Ebola Virus Epidemiology, Transmission, and Evolution during Seven Months in Sierra Leone <http://doi.org/10.1016/j.cell.2015.06.007>`_. Cell.
* Břinda et al. 2015. `RNF: a general framework to evaluate NGS read mappers <http://bioinformatics.oxfordjournals.org/content/early/2015/09/30/bioinformatics.btv524>`_. Bioinformatics.
* Břinda et al. 2015. `Spaced seeds improve k-mer-based metagenomic classification <http://bioinformatics.oxfordjournals.org/content/early/2015/08/10/bioinformatics.btv419>`_. Bioinformatics.
* Spjuth et al. 2015. `Experiences with workflows for automating data-intensive bioinformatics <http://www.biologydirect.com/content/10/1/43>`_. Biology Direct.
* Schramm et al. 2015. `Mutational dynamics between primary and relapse neuroblastomas <http://www.nature.com/ng/journal/v47/n8/full/ng.3349.html>`_. Nature Genetics.
* Berulava et al. 2015. `N6-Adenosine Methylation in MiRNAs <http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0118438>`_. PLOS ONE.
* The Genome of the Netherlands Consortium 2014. `Whole-genome sequence variation, population structure and demographic history of the Dutch population <http://www.nature.com/ng/journal/v46/n8/full/ng.3021.html>`_. Nature Genetics.
*  Patterson et al. 2014. `WhatsHap: Haplotype Assembly for Future-Generation Sequencing Reads <http://online.liebertpub.com/doi/10.1089/cmb.2014.0157>`_. Journal of Computational Biology.
* Fernández et al. 2014. `H3K4me1 marks DNA regions hypomethylated during aging in human stem and differentiated cells <http://genome.cshlp.org/content/25/1/27.long>`_. Genome Research.
* Köster et al. 2014. `Massively parallel read mapping on GPUs with the q-group index and PEANUT <https://peerj.com/articles/606/>`_. PeerJ.
* Chang et al. 2014. `TAIL-seq: Genome-wide Determination of Poly(A) Tail Length and 3′ End Modifications <http://www.cell.com/molecular-cell/abstract/S1097-2765(14)00121-X>`_. Molecular Cell.
* Althoff et al. 2013. `MiR-137 functions as a tumor suppressor in neuroblastoma by downregulating KDM1A <http://onlinelibrary.wiley.com/doi/10.1002/ijc.28091/abstract;jsessionid=33613A834E2A2FDCCA49246C23DF777E.f04t02>`_. International Journal of Cancer.
* Marschall et al. 2013. `MATE-CLEVER: Mendelian-Inheritance-Aware Discovery and Genotyping of Midsize and Long Indels <http://bioinformatics.oxfordjournals.org/content/29/24/3143.long>`_. Bioinformatics.
* Rahmann et al. 2013. `Identifying transcriptional miRNA biomarkers by integrating high-throughput sequencing and real-time PCR data <http://www.sciencedirect.com/science/article/pii/S1046202312002605>`_. Methods.
* Martin et al. 2013. `Exome sequencing identifies recurrent somatic mutations in EIF1AX and SF3B1 in uveal melanoma with disomy 3 <http://www.nature.com/ng/journal/v45/n8/full/ng.2674.html>`_. Nature Genetics.
* Czeschik et al. 2013. `Clinical and mutation data in 12 patients with the clinical diagnosis of Nager syndrome <http://link.springer.com/article/10.1007%2Fs00439-013-1295-2>`_. Human Genetics.
* Marschall et al. 2012. `CLEVER: Clique-Enumerating Variant Finder <http://bioinformatics.oxfordjournals.org/content/28/22/2875.long>`_. Bioinformatics.


.. toctree::
   :caption: Getting started
   :name: getting_started
   :hidden:
   :maxdepth: 1

   getting_started/installation
   getting_started/examples
   tutorial/tutorial


.. toctree::
  :caption: Executing workflows
  :name: execution
  :hidden:
  :maxdepth: 1

  executable

.. toctree::
    :caption: Defining workflows
    :name: snakefiles
    :hidden:
    :maxdepth: 1

    snakefiles/writing_snakefiles
    snakefiles/rules
    snakefiles/configuration
    snakefiles/modularization
    snakefiles/remote_files
    snakefiles/utils
    snakefiles/deployment


.. toctree::
    :caption: API Reference
    :name: api-reference
    :hidden:
    :maxdepth: 1

    api_reference/snakemake
    api_reference/snakemake_utils


.. toctree::
    :caption: Project Info
    :name: project-info
    :hidden:
    :maxdepth: 1

    project_info/citations
    project_info/more_resources
    project_info/faq
    project_info/contributing
    project_info/authors
    project_info/history
    project_info/license
