This is the supplemental Confluence page for the Broad GDAC Firehose Pipeline. Please visit gdac.broadinstitue.org for our main public site.
Born of the desire to systematize analyses from The Cancer Genome Atlas pilot and scale their execution to the dozens of remaining diseases to be studied, Firehose now sits atop ~40 terabytes of TCGA data and reliably executes more than 6000 pipelines per month.
The Broad Institute TCGA GDAC Firehose Provides
Version-stamped, standardized datasets
Precursor to automated analyses: aggregates all available sample batches into a single, uniformly-formatted bolus (one per disease X datatype), which can be immediately fed to algorithmic codes without further data preparation
Version-stamped packages of standard scientific analysis results
Automatically generated for dozens of algorithms: GISTIC, MutSig, Clustering, Correlation, ...
Version-stamped, biologist-friendly reports
Encapsulating analysis results in a form accessible to a wide audience, online for public browsing, and citable in the literature through DOIs
Version-stamped custom runs for TCGA analysis working groups
Performed by request in support of TCGA marker paper analysis, on a much shorter timescale than the monthly data runs and quarterly analysis runs.
For a discussion of Firehose in the broader context of Big Cancer Data, see Nature Methods 10, 293–297 (2013) doi:10.1038/nmeth.2410