Born of the desire to systematize analyses from The Cancer Genome Atlas pilot and scale their execution to the dozens of remaining diseases to be studied, GDAC Firehose now sits atop ~55 terabytes of analysis-ready TCGA data and reliably executes thousands of pipelines per month.
The Broad Institute TCGA GDAC Firehose Provides
Version-stamped, standardized datasets
Precursor to automated analyses: aggregates all available sample batches into a single, uniformly-formatted bolus (one per disease X datatype), which can be immediately fed to algorithmic codes without further data preparation
Version-stamped packages of standard scientific analysis results
Automatically generated for dozens of algorithms: GISTIC, MutSig, Clustering, Correlation, ...
Version-stamped, biologist-friendly reports
Encapsulating analysis results in a form accessible to a wide audience, online for public browsing, and citable in the literature through DOIs
Version-stamped custom runs for TCGA analysis working groups
Performed by request in support of TCGA marker paper analysis, on a much shorter timescale than the monthly data runs and quarterly analysis runs.