Global compRehensive Atlas of Peptide and Protein Abundance

Study Code / Acronym
Award Number
Research Grant
Status / Stage
1 March 2021 -
29 February 2024
Duration (calculated)
02 years 11 months
Funding Amount
Funder/Grant study page
Contracted Centre
EMBL - European Bioinformatics Institute
Principal Investigator
Dr Juan Antonio Vizcaino
PI Contact
WHO Catergories
Understanding Underlying Disease
Disease Type
Dementia (Unspecified)

CPEC Review Info
Reference ID687
ResearcherReside Team


Study Code / AcronymGRAPPA
Award NumberBB/T019670/1
Status / StageActive
Start Date20210301
End Date20240229
Duration (calculated) 02 years 11 months
Funder/Grant study pageBBSRC UKRI
Contracted CentreEMBL - European Bioinformatics Institute
Funding Amount£671,803.00


The world-leading PRIDE database now contains >14,000 proteomics datasets, all of which contain raw mass spectrometry (MS) data, some contain standardised lists of protein identifications but currently none contain quantitative data expressed in a standard format. As such, there is vast untapped potential for quantitative data re-use, for the majority of research groups who do not have the capability to re-process data sets themselves. In this project, we will develop robust open cloud-based data analysis pipelines that will be used to process 100s of publicly available datasets, using standardised data processing and normalisation protocols. All datasets will be made available within a new portal, PRIDE Quant to support computational users, and will be passed to the Expression Atlas database to provide a biologist-friendly view of the data. Data processing will largely focus on human samples for which the highest data volumes exist, including both “baseline” datasets e.g. to provide cell line or tissue/organ-level estimates of protein abundance, and “differential” expression datasets for various diseases including cancer, dementia, diabetes and major infectious diseases. We will develop several exemplar applications of the data, including displays showing correlations between gene and protein expression for matched samples, generation of co-expression networks from proteomics data, and generating vast maps of peptide-level abundance to support new research in proteome bioinformatics.