Global compRehensive Atlas of Peptide and Protein Abundance

Study Code / Acronym
GRAPPA
Award Number
BB/T019557/1
Programme
Research Grant
Status / Stage
Active
Dates
1 August 2020 -
31 July 2023
Duration (calculated)
02 years 11 months
Funder(s)
BBSRC (UKRI)
Funding Amount
£328,372.00
Funder/Grant study page
BBSRC UKRI
Contracted Centre
University of Liverpool
Principal Investigator
Professor Andrew Jones
PI Contact
andrew.jones.3@city.ac.uk
PI ORCID
0000-0001-6118-9327
WHO Catergories
Understanding Underlying Disease
Disease Type
Dementia (Unspecified)

CPEC Review Info
Reference ID685
ResearcherReside Team
Published07/07/2023

Data

Study Code / AcronymGRAPPA
Award NumberBB/T019557/1
Status / StageActive
Start Date20200801
End Date20230731
Duration (calculated) 02 years 11 months
Funder/Grant study pageBBSRC UKRI
Contracted CentreUniversity of Liverpool
Funding Amount£328,372.00

Abstract

The world-leading PRIDE database now contains >14,000 proteomics datasets, all of which contain raw mass spectrometry (MS) data, some contain standardised lists of protein identifications but currently none contain quantitative data expressed in a standard format. As such, there is vast untapped potential for quantitative data re-use, for the majority of research groups who do not have the capability to re-process data sets themselves. In this project, we will develop robust open cloud-based data analysis pipelines that will be used to process 100s of publicly available datasets, using standardised data processing and normalisation protocols. All datasets will be made available within a new portal, PRIDE Quant to support computational users, and will be passed to the Expression Atlas database to provide a biologist-friendly view of the data. Data processing will largely focus on human samples for which the highest data volumes exist, including both “baseline” datasets e.g. to provide cell line or tissue/organ-level estimates of protein abundance, and “differential” expression datasets for various diseases including cancer, dementia, diabetes and major infectious diseases. We will develop several exemplar applications of the data, including displays showing correlations between gene and protein expression for matched samples, generation of co-expression networks from proteomics data, and generating vast maps of peptide-level abundance to support new research in proteome bioinformatics.