Skip to end of metadata
Go to start of metadata
You are viewing an old version of this page. View the current version. Compare with Current ·  View Page History

firehose_get v0.3.10 (released 2013_01_31)


To help simplify access to TCGA data and analysis results we've introduced the firehose_get retrieval script.  To use it, simply download the zip file from here, perform these 2 steps from a Unix-compatible command line

        unix%   unzip firehose_get<VERSION>.zip
  unix%  ./firehose_get 

and follow the instructions (documentation excerpt below).    If you are missing wget, please look here for links to pre-built versions for your system.


Please note that downloading data from the Broad TCGA GDAC site constitutes agreement to this data usage policy.

 

 Documentation
firehose_get : retrieve open-access results of Broad Institute TCGA GDAC runs
Version: 0.3.11 (Author: Michael S. Noble)

Usage: firehose_get [flags]  RunType  Date  [disease_cohort, ... ]

Two arguments are required; the first must be one of

    analyses  awg_lgg  awg_luad  awg_pancan8
    awg_skcm  awg_thca  stddata

while the second must EITHER be a date (in YYYY_MM_DD form) of an
existing GDAC run of the given type OR 'latest'; use the -runs flag
to discern what RunType+Date combinations are available.  An optional
3rd, 4th etc argument may be specified to prune the retrieval, given
as a subset of these case-insensitive TCGA disease cohort names:

    BLCA  BRCA  CESC  COAD  COADREAD  DLBC  ESCA  GBM  HNSC  KICH
    KIRC  KIRP  LAML  LGG  LIHC  LUAD  LUSC  OV  PAAD  PANCANCER
    PANCAN8  PANCAN12  PRAD  READ  SARC  SKCM  STAD  THCA  UCEC

Note that as a convenience 'analysis' and 'data' are accepted as
synonyms for the 'analyses' and 'stddata' run types

Flags:
  -b | -batch         do not prompt: assume YES answer to all queries
  -c | -cohorts       list available disease cohorts
  -e | -echo          show commands that would be run, but do nothing
  -h | -help | --help this message
  -l | -log           write output to log file, instead of stdout
  -p | -platforms     list data platforms available in Firehose runs
                      (not implemented yet)
  -r | -runs          list available Firehose runs
  -t | -tasks <list>  further prune the set of archives retrieved, by
                      INCLUDING only the tasks (pipelines) whose
                      names match the given space-delimited list of
                      patterns; matching is performed with glob-style
                      wildcards, and is case-insensItive; prepending
                      a tilde (i.e. ~) to a task name will cause it
                      to be EXCLUDED from download; when no pattern
                      list is given firehose_get will display all tasks in
                      the selected run.
                      NOTE: not all tasks will execute for all disease
                            cohorts; what tasks are run depends upon the
                            data available for that disease cohort
  -v                  display the version of firehose_get
  -x                  debugging: turn on bash set -x (warning: very verbose)

Broad GDAC website:   http://gdac.broadinstitute.org
Broad GDAC email  :   gdac@broadinstitute.org
 Change Log
v0.3.10:    2013_01_31
   remove hardcoded disease names, in favor of downloading from Broad site
   enhanced firehose_get_scan output (as was done for -runs below)
v0.3.9:     2012_12_22
   use firehose_get_scan tool (on Broad servers), and download its output
   to client sites, to speed up discovery for -runs flag, etc
v0.3.8:     2012_11_16
   support potentially any AWG with generic awg_<disease> run type
v0.3.7:     2012_10_21
   support PANCAN8 analysis working group (AWG) runs
v0.3.6:     2012_09_20
   -runs considers ONLY those GDAC runs with ./data subdir
v0.3.5:     2012_09_12
   discontinue use of static run lists, in favor of dynamically querying GDAC
                site to display list of runs, what kinds of runs, etc
    support EXCLUDE in -tasks with tilde/~ prefix
v0.3.4:     2012_09_07
    tweak date regex to correctly detect October months
v0.3.3:     2012_07_12
    fix printf msg emitted when nothing downloaded
    employ --cache=off, so that most up-to-date run lists are always retrieved
v0.3.2:     2012_06_08
    added -b/-batch for headless use
    'latest' now translated to date prior to download
    be less compulsive when cleaning up
v0.3.1:     2012_05_02
    accept --version, too
    use tumor types to subset list of tasks returned, too
    warn user when subsetted runs return nothing for the given tumor(s)
v0.3.0:     2012_04_22
    tweak awkward wording of -tasks help
    allow --help, too
    -runs flag to display list of available runs
    -tasks flag to subset by glob-pattern matching against task names
 Copyright and Disclaimer
#===============================================================================
# This software and its documentation are copyright 2012 by the
# Broad Institute/Massachusetts Institute of Technology. All rights reserved.
#
# This software is supplied without any warranty or guaranteed support whatsoever.
# Neither the Broad Institute nor MIT can be responsible for its use, misuse, or
# functionality.
#===============================================================================
Labels: