Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


titleArchive Nomenclature

As of August 2017, our archives follow the new nomenclature given below:



Description of Permissible Values


A string of the form


for example: TCGA-ACC-TP.

The <disease_specification> most often refers to a single disease study given by its disease abbreviation , such as GBM for Glioblastoma Multiforme;  but may also refer to an aggregate of multiple diseases, such as PANCAN12 (which refers to a cohort of 12 diseases created to study pan-cancer trends) or COADREAD (which combines the single diseases COAD and READ into one cohort).

The optional <sample_type> suffix consists of a literal dash followed by a sample type code designating the tissue sample type; for example, the suffix "-TP" indicates that the given archive contains results based upon primary tumor data.  As a final example, here's how sample type codes  would most commonly map to sample sets in Firehose, for a single disease study:

Sample Set NameDescription
BLCAall tumor and normal samples for Bladder Urothelial Carcinoma (union of everything below)
BLCA-TPonly primary tumor samples
BLCA-TMonly metastatic tumor samples (if any)
BLCA-TRonly tumor recurrence samples (if any)

only tissue normal samples (if any)

BLCA-NBonly blood normal samples (if any)


Tasks should be named as


For example: CopyNumber_Gistic2. The datatypes correspond to columns 2-12 in any of our sample data tables

with several types spelled out in longer form for clarity as follows:

Short FormLong FormDescription
CNCopyNumberSNP6 copy number data
LowPCopyNumberLowPassLow pass DNASeqC copy number data
MAFMutationmutation calls


Eight numeric characters representing the date. For example, 20170807 indicates August 7, 2017.


A small integer (usually single digit) indicating how many times the given <TaskName> was successfully run in the given pass.