Re: [GDAC-users] question about CRC MutSig outputs

Subject:   Re: [GDAC-users] question about CRC MutSig outputs (more for this subject)
From:   Daniel DiCara <> (more from this sender)
Date:   Sep 05, 2012 11:55

Hi Sheila,
This boils down to bad timing. We manually ingested new maf/wig files on
July 23rd for COAD. Unfortunately, a step in our ingestion process was
incorrectly performed and resulted in our software flagging the new wig
files as duplicates. When duplicates are flagged, they are removed. In this
case, 102 wig files were flagged as duplicates and removed. If MutSig
doesn't have a corresponding wig for a given maf, then that individual is
removed from the analysis. We caught this error and corrected it. However,
we didn't catch it in time for our July 25th analysis run. Typically, we
would manually fix the July 25th run - but apparently this error escaped
our notice. Thank you for pointing this out and we apologize for this
mistake. Our 08/25 run that will be released soon that has all 224
individuals included in the analysis.
On Wed, Sep 5, 2012 at 10:25 AM, Daniel DiCara <>wrote:
> Hi Sheila,
> You are correct - only 122 individuals made it into the 2012_07_25 run,
> whereas 224 were included in the 2012_06_23 run. Let me check into this and
> get back to you.
> -Dan DiCara
> On Tue, Sep 4, 2012 at 8:34 PM, Sheila Reynolds <
> > wrote:
>> Hi folks,
>> I was trying to understand why we had different counts of mutations for a
>> particular gene in our mutation analysis vs the 20120725 Firehose MutSig
>> outputs for COADREAD when we seem to be using the same MAF file (the md5sum
>> and the file name match the one given in your MAF provenance google doc),
>> and it appears that MutSig is only using data for 122 patients while the
>> MAF file contains mutations for 225 patients.  (I get the number 122 by
>> looking at unique barcodes in several of the MutSig files, for example
>> COADREAD.patients.counts_and_rates.txt, and the 122 barcodes are a subset
>> of the 225 unique patients identified within the 08nov11 MAF file.)
>> thanks for looking into this,
>> Sheila
>> --
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Sheila M Reynolds, PhD
>> Research Scientist
>> Institute for Systems Biology
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> --
> Dan DiCara
> The Broad Institute of MIT and Harvard
> 301 Binney Street, Desk 5159-C
> Cambridge, MA 02142
> (617) 714-8281
Dan DiCara
The Broad Institute of MIT and Harvard
301 Binney Street, Desk 5159-C
Cambridge, MA 02142
(617) 714-8281