115	ProteomeXchange datasets
369	experiments
70,470,125	PSMs
596,839	distinct peptides
18,267	canonical core proteins

About

The Arabidopsis PeptideAtlas provides a compendium of results from uniformly reprocessed mass spectrometry proteomics datasets.

Publicly-available Arabidopsis thaliana columbia-0 datasets were downloaded from ProteomeXchange and reprocessed from the raw files using the Trans-Proteomic Pipeline suite of tools. A publication describing the build is available.

Explore

SEARCH

form...

BROWSE

Chromosome Summary

Chromosome	Entries	Canonical		Uncertain		Redundant		Not Observed
M	35	27	77.1%	5	14.3%	0	0.0%	3	8.6%
C	79	63	79.7%	12	15.2%	0	0.0%	4	5.1%
1	7,156	4,730	66.1%	502	7.0%	384	5.4%	1,540	21.5%
2	4,317	2,762	64.0%	290	6.7%	240	5.6%	1,025	23.7%
3	5,460	3,630	66.5%	353	6.5%	296	5.4%	1,181	21.6%
4	4,180	2,788	66.7%	282	6.7%	247	5.9%	863	20.6%
5	6,332	4,267	67.4%	412	6.5%	373	5.9%	1,280	20.2%
2023-10 Total	27,559	18,267	66.3%	1,856	6.7%	1,540	5.6%	5,896	21.4%

Chromosome	Entries	Canonical	Uncertain	Redundant	Not Observed
M	122	15	12.3%	10	8.2%	17	13.9%	80	65.6%
C	88	59	67.0%	15	17.0%	7	8.0%	7	8.0%
1	7,156	4,622	64.6%	545	7.6%	397	5.5%	1,592	22.2%
2	4,317	2,695	62.4%	291	6.7%	247	5.7%	1,084	25.1%
3	5,460	3,561	65.2%	365	6.7%	308	5.6%	1,226	22.5%
4	4,180	2,723	65.1%	306	7.3%	230	5.5%	921	22.0%
5	6,332	4,183	66.1%	410	6.5%	394	6.2%	1,345	21.2%
2021-03 Total	27,655	17,858	64.6%	1,942	7.0%	1,600	5.8%	6,255	22.6%

Column descriptions ▲

Entries: Number of entries in each chromosome
Canonical: Proteins seen with 2 distinct, uniquely mapping peptides
Uncertain: Proteins with some evidence that is not sufficient for canonical status
Redundant: Proteins that have peptides that map to them, but not uniquely and thus not needed to explain the observed peptides
Not Observed: No detections at all in PeptideAtlas above our very stringent threshold

Publications

Detection of the Arabidopsis Proteome and Its Post-translational Modifications and the Nature of the Unobserved (Dark) Proteome in PeptideAtlas; van Wijk KJ, Leppert T, Sun Z, Kearly A, Li M, Mendoza L, Guzchenko I, Debley E, Sauermann G, Routray P, Malhotra S, Nelson A, Sun Q, Deutsch EW.
J Proteome Res. 2024 Jan 5;23(1):185-214. doi: 10.1021/acs.jproteome.3c00536. Epub 2023 Nov 21.PMID: 38104260

van Wijk et al, The Arabidopsis PeptideAtlas: Harnessing worldwide proteomics data to create a comprehensive community proteomics resource, published in The Plant Cell (2021 Aug 19) -- [ Pubmed ]

Short ICAR 2021 presentation of the 2021 Arabidopsis PeptideAtlas by Klaas Van Wijk -- [8 minute video on YouTube]

Download

Below are individual Arabidopsis thaliana PeptideAtlas builds available for download in various flat file formats. Note that not all files contain all information from the build. A build subtitled "PSM FDR=0.002" denotes a PSM FDR threshold of 0.002 (0.2%) is applied to every sample in the build.

A. thaliana 2023-10 PSM FDR = 0.0008Latest Build

Biosequence Set in FASTA format [128MB]
Peptide CDS and chromosomal coordinates [371MB]
Peptide CDS coordinates [218MB]
Peptide sequences in FASTA format [17MB]
Peptide sequences in GFF format [17MB]
Spectral library (HR-HCD) [2GB]
Spectral library (HR-QTOF) [180MB]
Spectral library (LR-HCD) [71MB]
Spectral library (LR-IT-CID) [204MB]
Spectral library (dimethyl_HR-HCD) [162MB]
Spectral library (dimethyl_LR-IT-CID) [1MB]
Spectral library (iTRAQ_HR-HCD) [12MB]
Spectral library (iTRAQ_LR-IT-CID) [13MB]

A. thaliana 2021-03 PSM FDR = 0.001

Biosequence Set in FASTA format [170MB]
Database tables exported as TSV dump file [5GB]
Database tables exported as mysql dump file [4GB]
Peptide CDS and chromosomal coordinates [331MB]
Peptide CDS coordinates [195MB]
Peptide sequences in FASTA format [15MB]

Help

Complete description of each of the available download formats

Arabidopsis Protein Sequences Download

Filename	Size	# Sequences	Description
revised_mito_plastid_edited.fasta	48KB	114	Protein ids and sequences after application of editing for the 79 plastid-encoded and 35 mitochondrial-encoded proteins and pseudogenes; minor frequency edits are not applied but all high frequency edits are included
revised_mito_plastid_edited.peff	52KB	114	Protein ids and their amino acid sequences with all possible variants supplied in PEFF format. PEFF allows for encoding the variants in a compact way in the file. Comet and some other search engines support PEFF.
revised_mito_plastid_pre-edit.fasta	48KB	114	Protein ids and unedited sequences (with the exception of essential edits for start and stop codons that need to be applied to generate a protein) for the 79 plastid-encoded and 35 mitochondrial-encoded proteins and pseudogenes
revised_mito_plastid_all-editing-permutations.fasta	1.2MB	3,818	Proteins ids and the >10.000 sequence variants (see Materials and Methods) to allow for complete and exhaustive MSMS data base search of all possible edits and allow for partial editing (similar as we did in this study)
Araport11_genes.201606.pep.2.fasta	25MB	48,359	Araport11 2016-06 [ link ]
TAIR10_label_pep_20101214.2.fasta	20MB	35,386	TAIR10 20101214 [ link ]
Refseq_GCF_000001735.4.protein.2.faa	24MB	48,265	Refseq 000001735.4 [ link ]
uniprot-rename-proteome_UP000006548.2.fasta	21MB	39,342	Uniprot UP000006548 [ link ]
araport11_pseudogene_3frame_newid.2.fasta	1.1MB	3,720	Araport11 from Qi Sun, Cornell University, select transcripts
crap_GFP.fasta	4KB	3	Our custom GFP contaminants Klaas Van Wijk 09-2020
crap_CONTAM.2.fasta	44KB	116	Our PeptideAtlas custom contaminants
CONTRIB_LW_peptides.2.fasta	1.3MB	16,809	Contributed peptides LW 16809 very short peptides [ link ]
CONTRIB_SIPs_peptides.2.fasta	44KB	607	Contributed peptides SIPs 607 very short peptides [ link ]
CONTRIB_sORFs_peptides.2.fasta	568KB	7,901	Contributed peptides sORFs 7901 very short peptides [ link ]
Arabidopsis_RNA_Edits.fasta	20KB	50	RNA Edits from Joshua Heazlewood 02-2021
Mito_ORFS_new.fasta	4KB	3	Mito ORFS from Philippe Giege
refseq_var_sloan_30.fasta	2.6MB	8,672	Sloan edits all possible permutations
refseq_var_IS_30.fasta	1.3MB	3,099	Permutations of RNA edits from Ian Small UWA
CONTRIB_Iowa_peptides.2.fasta	1.6MB	7,481	Contributed peptides from Iowa State University, Eve Wurtele
Araport11_CORE.fasta	14MB	27,559	Core proteome
Arabidopsis_PeptideAtlas_search.fasta	57MB	176,064	Full database used in the searches

Other Resources

Acknowledgements

We gratefully acknowledge the support for the Arabidopsis PeptideAtlas from NSF grant 1922871 “TRTech-PGR: A PeptideAtlas for Arabidopsis thaliana and other plant species; harnessing world-wide proteomics data and mining for biological features”.