Here we describe the Broad GDAC standardized data runs, which aim to produce version-stamped packages representing a frozen snapshot of all TCGA analysis data at a given time. Our goal is that this aggregation and assignment of unique chronological identifiers will help:
- Provide a consistent point of reference for citation by marker papers and users of TCGA data.
- And a formal definition of what constitutes a given tumor dataset.
- Cast in a form amenable to immediate algorithmic analysis (no additional data preparation required).
- Which minimizes redundant effort across centers & groups to download & prepare data for further analysis.
- While enhancing provenance and reproducibility.
The effort originated in this presentation from the April 2011 TCGA meeting, and was refined in subsequent presentations on May 12th and May 19th as well as ongoing discussions with TCGA collaborators. The standardized data packages may be accessed as described here.