Re: [GDAC-users] confusion exon level expression data in firehose.

Subject:   Re: [GDAC-users] confusion exon level expression data in firehose. (find more)
From:   David Heiman <hidden> (find more)
Date:   Oct 01, 2015 13:42

Hi Andy,
 
Please note that there are two files of exon_quantification data for UCEC -
illuminahiseq_rnaseqv2 and illuminaga_rnaseqv2
 
The latest stddata run has 201 aliquots in illuminahiseq_rnaseqv2:
% head -n 1
gdac.broadinstitute.org_UCEC.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__exon_quantification__data.Level_3.2015082100.0.0/UCEC.rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__exon_quantification__data.data.txt
| cut -f 2-`head -n 1
gdac.broadinstitute.org_UCEC.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__exon_quantification__data.Level_3.2015082100.0.0/UCEC.rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__exon_quantification__data.data.txt
| wc -w` | tr "\t" "\n" | sort -u | wc -l
     201
176 of which are TP aliquots.
 
and illuminaga_rnaseqv2 has 381 aliquots:
% head -n 1
gdac.broadinstitute.org_UCEC.Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__exon_quantification__data.Level_3.2015082100.0.0/UCEC.rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__exon_quantification__data.data.txt
 | cut -f 2-`head -n 1
gdac.broadinstitute.org_UCEC.Merge_rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__exon_quantification__data.Level_3.2015082100.0.0/UCEC.rnaseqv2__illuminaga_rnaseqv2__unc_edu__Level_3__exon_quantification__data.data.txt
| wc -w` | tr "\t" "\n" | sort -u | wc -l
     381
370 of which are TP aliquots
 
Combined, I see a total of 581 exon_quantification aliquots (545 of which
are TP aliquots) - TCGA-AX-A1C7-01A-11R-A137-07 exists in both sets.
TCGA-EO-A3KW-01A-11R-A22K-07 is in illuminahiseq_rnaseqv2.
 
Hope this helps,
David
 
 
On Thu, Oct 1, 2015 at 11:48 AM, Andrew Cherniack <
hidden> wrote:
 
> Hi GDAC people,
>
>
> I have been working with the exon level expression data from UCEC.
> When I downloaded this file:
> illuminahiseq_rnaseqv2-exon_quantification
> <http://gdac.broadinstitute.org/runs/stddata__2015_08_21/data/UCEC/20150821/gdac.broadinstitute.org_UCEC.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__exon_quantification__data.Level_3.2015082100.0.0.tar.gz>
> from gdac website.
> I get 267 which is much less than number of tumors with exon data that is
> available from the DCC.
>
> If I go into firehose and download this file from *Workspace*:
> analyses__2015_04_02__ucec:
>
>
> GDAC_MergeDataFiles_12986509/UCEC-TP.rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__exon_quantification__data.data.txt
>
> I get a different but overlapping list of 371 tumors.
> See attached spreadsheet
>
> There are also cases that I can get exon level data from the DCC that are
> not in either of these files.
> (For example TCGA-EO-A3KW-01A).
>
> This is all very confusing.
> Regards,
> Andy
>
>
> ___________________________
> Andrew Cherniack, PhD
> Group Leader
> Cancer Program
> Broad Institute of Harvard and MIT
> 415 Main Street
> Cambridge, Mass 02142
> email: hidden
> ___________________________
>
 
To unsubscribe from this group and stop receiving emails from it, send an email to hidden.
Entire Thread (Showing 2 of 2)

  • [GDAC-users] confusion exon level expression data in firehose. Andrew Cherniack <hidden>

    Hi GDAC people, I have been working with the exon level expression data from UCEC. When I downloaded this file: illuminahiseqrnaseqv2exonquantification

    • Re: [GDAC-users] confusion exon level expression data in firehose. David Heiman <hidden>