Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 67 Next »

firehose_get   version 0.3.12 (released 2013_06_07)

To help simplify access to TCGA data and analysis results we've introduced the firehose_get retrieval script.  To use it, simply download the zip file from here, perform these 2 steps from a Unix-compatible command line

        unix%   unzip firehose_get<VERSION>.zip
  unix%  ./firehose_get 

and follow the instructions (documentation excerpt below).   If you are missing wget, please look here for links to pre-built versions for your system, or just Google it. Finally, rather than keeping firehose_get in the directory into which you downloaded and unzipped it, it's better if you put it somewhere on your system where it can be found along your $PATH any time you might want to use it again, no matter what directory you might be working within.

Please note that downloading data from the Broad TCGA GDAC site constitutes agreement to this data usage policy.

  • firehose_get analyses latest
    Retrieves: every result, for every disease cohort, in the latest GDAC Firehose run

  • firehose_get -tasks mutsig gistic  analyses latest brca ucec
    Retrieves: only Gistic and MutSig results for breast and uterine cancer
  • firehose_get -tasks mut analyses latest prad
    Retrieves: all results which have "mut" in their name, such as MutSig, Mutation_Assessor, and any correlations to mutation data

  • firehose_get -tasks rna clinical stddata 2013_05_23
    Retrieves: any data package with (case-insensitive) "rna" or "clinical" in their name, from the May 23, 2013 data run
%  firehose_get --help

firehose_get : retrieve open-access results of Broad Institute TCGA GDAC runs
Version: 0.3.12 (Author: Michael S. Noble)

Usage: firehose_get [flags]  RunType  Date  [disease_cohort, ... ]

Two arguments are required; the first must be one of

	analyses  awg_gbm  awg_hnsc  awg_lgg  
	awg_luad  awg_pancan8  awg_skcm  awg_stad  
	awg_thca  stddata  

while the second must EITHER be a date (in YYYY_MM_DD form) of an
existing GDAC run of the given type OR 'latest'; use the -runs flag
to discern what RunType+Date combinations are available.  An optional
3rd, 4th etc argument may be specified to prune the retrieval, given
as a subset of these case-insensitive TCGA disease cohort names:


(taken from

Note that as a convenience 'analysis' and 'data' are accepted as
synonyms for the 'analyses' and 'stddata' run types

  -b | -batch         do not prompt: assume YES answer to all queries
  -c | -cohorts       list available disease cohorts
  -e | -echo          show commands that would be run, but do nothing
  -h | -help | --help this message
  -l | -log           write output to log file, instead of stdout
  -p | -platforms     list data platforms available in Firehose runs
                      (not implemented yet)
  -r | -runs          list available Firehose runs
  -t | -tasks <list>  further prune the set of archives retrieved, by
                      INCLUDING only the tasks (pipelines) whose
                      names match the given space-delimited list of
                      patterns; matching is performed with glob-style
                      wildcards, and is case-insensItive; prepending
                      a tilde (i.e. ~) to a task name will cause it
                      to be EXCLUDED from download; when no pattern
                      list is given firehose_get will display all tasks in
                      the selected run.
                      NOTE: not all tasks will execute for all disease
                            cohorts; what tasks are run depends upon the
                            data available for that disease cohort
  -v                  display the version of firehose_get
  -x                  debugging: turn on bash set -x (warning: very verbose)

Broad GDAC website:
Broad GDAC email  :
 Change Log
v0.3.10:    2013_01_31
   remove hardcoded disease names, in favor of downloading from Broad site
   enhanced firehose_get_scan output (as was done for -runs below)
v0.3.9:     2012_12_22
   use firehose_get_scan tool (on Broad servers), and download its output
   to client sites, to speed up discovery for -runs flag, etc
v0.3.8:     2012_11_16
   support potentially any AWG with generic awg_<disease> run type
v0.3.7:     2012_10_21
   support PANCAN8 analysis working group (AWG) runs
v0.3.6:     2012_09_20
   -runs considers ONLY those GDAC runs with ./data subdir
v0.3.5:     2012_09_12
   discontinue use of static run lists, in favor of dynamically querying GDAC
                site to display list of runs, what kinds of runs, etc
    support EXCLUDE in -tasks with tilde/~ prefix
v0.3.4:     2012_09_07
    tweak date regex to correctly detect October months
v0.3.3:     2012_07_12
    fix printf msg emitted when nothing downloaded
    employ --cache=off, so that most up-to-date run lists are always retrieved
v0.3.2:     2012_06_08
    added -b/-batch for headless use
    'latest' now translated to date prior to download
    be less compulsive when cleaning up
v0.3.1:     2012_05_02
    accept --version, too
    use tumor types to subset list of tasks returned, too
    warn user when subsetted runs return nothing for the given tumor(s)
v0.3.0:     2012_04_22
    tweak awkward wording of -tasks help
    allow --help, too
    -runs flag to display list of available runs
    -tasks flag to subset by glob-pattern matching against task names
 Copyright and Disclaimer
# This software and its documentation are copyright 2012-2013 by the
# Broad Institute/Massachusetts Institute of Technology. All rights reserved.
# This software is supplied without any warranty or guaranteed support whatsoever.
# Neither the Broad Institute nor MIT can be responsible for its use, misuse, or
# functionality.
  • No labels