Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 60 Next »

firehose_get v0.3.12 (released 2013_06_07)

To help simplify access to TCGA data and analysis results we've introduced the firehose_get retrieval script.  To use it, simply download the zip file from here, perform these 2 steps from a Unix-compatible command line

        unix%   unzip firehose_get<VERSION>.zip
  unix%  ./firehose_get 

and follow the instructions (documentation excerpt below).    If you are missing wget, please look here for links to pre-built versions for your system.

Please note that downloading data from the Broad TCGA GDAC site constitutes agreement to this data usage policy.


firehose_get : retrieve open-access results of Broad Institute TCGA GDAC runs
Version: 0.3.11 (Author: Michael S. Noble)

Usage: firehose_get [flags]  RunType  Date  [disease_cohort, ... ]

Two arguments are required; the first must be one of

    analyses  awg_lgg  awg_luad  awg_pancan8
    awg_skcm  awg_thca  stddata

while the second must EITHER be a date (in YYYY_MM_DD form) of an
existing GDAC run of the given type OR 'latest'; use the -runs flag
to discern what RunType+Date combinations are available.  An optional
3rd, 4th etc argument may be specified to prune the retrieval, given
as a subset of these case-insensitive TCGA disease cohort names:


Note that as a convenience 'analysis' and 'data' are accepted as
synonyms for the 'analyses' and 'stddata' run types

  -b | -batch         do not prompt: assume YES answer to all queries
  -c | -cohorts       list available disease cohorts
  -e | -echo          show commands that would be run, but do nothing
  -h | -help | --help this message
  -l | -log           write output to log file, instead of stdout
  -p | -platforms     list data platforms available in Firehose runs
                      (not implemented yet)
  -r | -runs          list available Firehose runs
  -t | -tasks <list>  further prune the set of archives retrieved, by
                      INCLUDING only the tasks (pipelines) whose
                      names match the given space-delimited list of
                      patterns; matching is performed with glob-style
                      wildcards, and is case-insensItive; prepending
                      a tilde (i.e. ~) to a task name will cause it
                      to be EXCLUDED from download; when no pattern
                      list is given firehose_get will display all tasks in
                      the selected run.
                      NOTE: not all tasks will execute for all disease
                            cohorts; what tasks are run depends upon the
                            data available for that disease cohort
  -v                  display the version of firehose_get
  -x                  debugging: turn on bash set -x (warning: very verbose)

Broad GDAC website:
Broad GDAC email  :
 Change Log
v0.3.10:    2013_01_31
   remove hardcoded disease names, in favor of downloading from Broad site
   enhanced firehose_get_scan output (as was done for -runs below)
v0.3.9:     2012_12_22
   use firehose_get_scan tool (on Broad servers), and download its output
   to client sites, to speed up discovery for -runs flag, etc
v0.3.8:     2012_11_16
   support potentially any AWG with generic awg_<disease> run type
v0.3.7:     2012_10_21
   support PANCAN8 analysis working group (AWG) runs
v0.3.6:     2012_09_20
   -runs considers ONLY those GDAC runs with ./data subdir
v0.3.5:     2012_09_12
   discontinue use of static run lists, in favor of dynamically querying GDAC
                site to display list of runs, what kinds of runs, etc
    support EXCLUDE in -tasks with tilde/~ prefix
v0.3.4:     2012_09_07
    tweak date regex to correctly detect October months
v0.3.3:     2012_07_12
    fix printf msg emitted when nothing downloaded
    employ --cache=off, so that most up-to-date run lists are always retrieved
v0.3.2:     2012_06_08
    added -b/-batch for headless use
    'latest' now translated to date prior to download
    be less compulsive when cleaning up
v0.3.1:     2012_05_02
    accept --version, too
    use tumor types to subset list of tasks returned, too
    warn user when subsetted runs return nothing for the given tumor(s)
v0.3.0:     2012_04_22
    tweak awkward wording of -tasks help
    allow --help, too
    -runs flag to display list of available runs
    -tasks flag to subset by glob-pattern matching against task names
 Copyright and Disclaimer
# This software and its documentation are copyright 2012-2013 by the
# Broad Institute/Massachusetts Institute of Technology. All rights reserved.
# This software is supplied without any warranty or guaranteed support whatsoever.
# Neither the Broad Institute nor MIT can be responsible for its use, misuse, or
# functionality.
  • No labels