July 2017: FYI on proteomics deliverables from FireCloud CGA team
Per Chet Birger/D.R. Mani meeting:
- FireCloud data workspaces
- one (possibly two - see below) for each of the three CPTAC AWGs (breast, ovarian, colon)
- The workspaces will contain, at a minimum, the end results (protein level quantification) produced by each AWG.
- We may also include the raw MS files, and/or the standardized mzML files. But all of the pipelines used for analyzing these files rely on windows-based software, and so cannot be run on FireCloud.
- We will include the TCGA genomic, clinical and biospecimen data as well - this will help researchers who want to conduct correlative analyses. It will mean, however, that we'll want to create both open and controlled access versions of these workspaces, as the BAMs and VCF files are controlled access.
- We may also include the outputs of the CDAP pipeline, which are published on the CPTAC data portal.
- We will aim to get these workspace in place by the end of August
- Since all of the workflows that run on either the raw MS files or the mzML files (CDAP) include windows-based tasks, they cannot be run on firecloud.
- Mani and Mike's teams are developing workflows for correlative analysis; we agreed to touch base with them at the end of August to see how far along any of these pipelines are and whether they could be included in our deliverables. If not, so be it....I'm hoping that NCI will see the value in the data workspaces for the future development of workflows.
May 31, 2017 On-Site (Broad Institute, Cambridge MA)
- Mike's slides: here
April 4, 2017 Face-to-Face (Bethesda, MD)