June 11, 2018
Progress report: content due soon
- Upcoming site visit: July 17, can mostly come from expanded length/depth of F2F presentation (which was abbreviated to only 10mins)
May 7, 2018
Recap of last week's F2F:
Good FireCloud workshop attendance & feedback
GDC has made progress automating their pipelines
CPTAC genomic data will be HG38 going forward;
Phased delivery CCRC & UCEC first, to allow publications to be in pipeline by next funding cycle (Gantt chart)
- Two special journals in Fall 2018:
- FOA: Sustained Support for Informatics Resources for Cancer Research and Management (U24)
- Due June 14
- Submit LOI by May 14
- Letters of support: gathering now
- Time Permitting: genomic pathway analyses
- GDAN Lung pathifier analyses
- iCluster: Hailei
April 2, 2018
- Access to the GDAC bucket for reference files
- egress pay: need to turn on requestor_pays bit
- authorization domain
- proxy groups and how to keep track of them in a bucket: currently have 2 proxy groups
- CPTAC3 data in FC may help entice new CPTAC users, but it is also akin to replicating DCC
- so, let's wait until potential CPTAC users make explicit requests
- Mani will explore using auth domain for new CPTAC FC users
- Mike will ping NCI about FireCloud SW session in May F2F
- Mike/Sam will price physical hardware & compare to Google VMs, as potential spend for $25K FC disbursement
March 19, 2018
- Review draft agenda for May F2F
- Decide additional attendees
- (Fire)Cloud costs for CPTAC-wide usage: $25K seems to have effectively been reduced
Batches 1 and 2 of genomic data are located at /xchip/gdac_data/cptac3/genomic_data_mirror
So we should be able, in principle, to proceed with our mRNA pipelines
- Mike has a short, unifying wrapper to all 3 of the DCC upload/download utilities, and can install to Unix server upon request
- Integrative Analyses:
- Karsten: proteomic pathways ...
- Mani: map genomic CN data to LINCs, correlate w/ RNAseq signatures?
- Possibly: multi-omics clustering
- Firecloud workshop scheduling
- Mike push cptac wrapper script to Unix servers
March 5, 2018
Summation of HG19 WashU/GDC workaround & potential recommendations to CPTAC leadership & collaborators.
- Consider: combing the GDC website to see if Dockers are available for their pipelines, and whether these could be instantiated in FC
- Consider: running local MOAP-style pipeline on WXS data, to generate CN, mutation, RNASeq
- Decision: wait for now, it's not fully baked yet
- open edit permissions (on this page) to all viewers
- Identify 2-3 integrative proteo-genomic analyses: but must be on CPTAC3 data
- Combined into iCoMut output
- Planning F2F in May 1,2,3:
- Quilts in FC?
- Although CustomProDB (from Baylor/Bing Zhang group) does similar as Quilts and is already in FC (from Karsten)
- FireCloud workshop
- Karsten & Mani: current instantiation of proteomic pipeline
- On prospective BRCA data
Update to Jan 22, 2018 entry:
Genomic data for CPTAC3 downloaded to: /xchip/gdac_data/cptac3/2018_02_02_genomic_data
- Only 2 cohorts (CRCC and UCEC, i.e. kidney and endometrial) have genomic data available so far
- The 3rd cohort (LUAD, lung) proteomic data not submitted by Broad yet, so WashU has apparently not processed the genomic either
Jan 22, 2018
- Brief review of items missed from last meeting
- Proteogenomic Data Commons Steering Committee:
- Held 2nd advisory meeting call last Wed
WashU/GDC workaround: summary of discussion & decisions from 1/19 call
- New science: degradome?
Jan 8, 2018
- Welcome Yifat Geffen, newest member of CGA
- Brief review of latest suite of genomic run reports (total of 830)
- Whither pathology image browser in CPTAC? The GTEX pathology browser was authored here (and we have strong knowledge of cancer path viewer), so we have a good deal of expertise & code that could in principle be leveraged. I've drafted a suggestion for PAAD dwg here.
- NMF clustering module question (auto-selection of K) from Mani?
- FireCloud hosting of CPTAC data (as partial workaround to lack of CPTAC genomic data at GDC)
- Medblast paper
Dec 11, 2017
- Items from 11/27 meeting that was cancelled
- GDC and CPTAC: summary notes from week of 2017_12_06
- Original plan (and data products) given here
- Impact to CGDAC (the CGA part of proteomics GDAC) sketched below
Initial data generation will be shifting from GDC to WashU
Mutation calls (both WES and WGS)
WashU products deposited to Georgetown DCC
Broad download & remap names as needed/appropriate
FireCloud (as a trusted partner) now being considered as a distribution point
Per Chris Kinsinger feeler conversation on 2017_12_01
So, because these data will be HG19 ... our CGA/GDAC in CPTAC may be better utilized by shifting gears, from running existing FireCloud HG38 genomic pipelines on HG19 data (which lead to broken results) ... to loading these HG19 data products from WashU into FireCloud so that it can serve as a distribution point
Side Q: why Georgetown DCC not considered for this? Scale? Absence of trusted partner status?
- Status on $25K to fund use of FireCloud across entire CPTAC?
- any progress: NO, there was an attempt to issue as AWS credits ... currently stuck w/r/t GoogleCredits ... stay tuned
- billing project?
Nov 27, 2017
- Timeline for LUAD, UCEC and KIRC projects: given here
Oct 30, 2017: tentative agenda
- Discuss CPTAC-wide use of FireCloud: how to allot funding, make billing projects, add users etc
- Recall supervisor modein FISSFC:
- Update on DSDE collaboration:
- Show recent CGA/DSDE collaboration proposal
- FISS backbone of Jupyter notebooks in FireCloud
- Code generator progress:
- standalone tool
- works on GTEX
- Swagger2 / FireCloud proof of concept has been done
- Full Swagger2 support is next
- Discussion for Wed 11/1 AWG telecon:
- Thoughts omitted from F2F talk, for time constraint: slides 21-39
- Chet: recent CPTAC2 workspaces ... where to go next?
July 2017: FYI on proteomics deliverables from FireCloud CGA team
- FireCloud data workspaces
- one (possibly two - see below) for each of the three CPTAC AWGs (breast, ovarian, colon)
- The workspaces will contain, at a minimum, the end results (protein level quantification) produced by each AWG.
- We may also include the raw MS files, and/or the standardized mzML files. But all of the pipelines used for analyzing these files rely on windows-based software, and so cannot be run on FireCloud.
- We will include the TCGA genomic, clinical and biospecimen data as well - this will help researchers who want to conduct correlative analyses. It will mean, however, that we'll want to create both open and controlled access versions of these workspaces, as the BAMs and VCF files are controlled access.
- We may also include the outputs of the CDAP pipeline, which are published on the CPTAC data portal.
- We will aim to get these workspace in place by the end of August
- Since all of the workflows that run on either the raw MS files or the mzML files (CDAP) include windows-based tasks, they cannot be run on firecloud.
- Mani and Mike's teams are developing workflows for correlative analysis; we agreed to touch base with them at the end of August to see how far along any of these pipelines are and whether they could be included in our deliverables. If not, so be it....I'm hoping that NCI will see the value in the data workspaces for the future development of workflows.
May 31, 2017 On-Site (Broad Institute, Cambridge MA)
- Mike's slides: here
April 4, 2017 Face-to-Face (Bethesda, MD)
- Mike's slides: here