Full 2022-06 Build Summary
115ProteomeXchange datasets
596,839distinct peptides
18,267canonical core proteins


The Arabidopsis PeptideAtlas provides a compendium of results from uniformly reprocessed mass spectrometry proteomics datasets.

Publicly-available Arabidopsis thaliana columbia-0 datasets were downloaded from ProteomeXchange and reprocessed from the raw files using the Trans-Proteomic Pipeline suite of tools. A publication describing the build is available.


Chromosome Summary

ChromosomeEntriesCanonicalUncertainRedundantNot Observed
M 35 27 77.1% 5 14.3% 0 0.0% 3 8.6%
C 79 63 79.7% 12 15.2% 0 0.0% 4 5.1%
1 7,156 4,730 66.1% 502 7.0% 384 5.4% 1,540 21.5%
2 4,317 2,762 64.0% 290 6.7% 240 5.6% 1,025 23.7%
3 5,460 3,630 66.5% 353 6.5% 296 5.4% 1,181 21.6%
4 4,180 2,788 66.7% 282 6.7% 247 5.9% 863 20.6%
5 6,332 4,267 67.4% 412 6.5% 373 5.9% 1,280 20.2%
2022-06 Total 27,559 18,267 66.3% 1,856 6.7% 1,540 5.6% 5,896 21.4%

Column descriptions ▲
Entries: Number of entries in each chromosome
Canonical: Proteins seen with 2 distinct, uniquely mapping peptides
Uncertain: Proteins with some evidence that is not sufficient for canonical status
Redundant: Proteins that have peptides that map to them, but not uniquely and thus not needed to explain the observed peptides
Not Observed: No detections at all in PeptideAtlas above our very stringent threshold



Below are individual Arabidopsis thaliana PeptideAtlas builds available for download in various flat file formats. Note that not all files contain all information from the build. A build subtitled "PSM FDR=0.002" denotes a PSM FDR threshold of 0.002 (0.2%) is applied to every sample in the build.

A. thaliana 2022-06 PSM FDR = 0.0008Latest Build
  • Biosequence Set in FASTA format [132MB]
  • Peptide CDS and chromosomal coordinates [377MB]
  • Peptide CDS coordinates [223MB]
  • Peptide sequences in FASTA format [17MB]
  • Peptide sequences in GFF format [19MB]
A. thaliana 2021-03 PSM FDR = 0.001
  • Biosequence Set in FASTA format [170MB]
  • Database tables exported as TSV dump file [5GB]
  • Database tables exported as mysql dump file [4GB]
  • Peptide CDS and chromosomal coordinates [331MB]
  • Peptide CDS coordinates [195MB]
  • Peptide sequences in FASTA format [15MB]


Complete description of each of the available download formats

Arabidopsis Protein Sequences Download

Filename Size # Sequences Description
revised_mito_plastid_edited.fasta 48KB 114 Protein ids and sequences after application of editing for the 79 plastid-encoded and 35 mitochondrial-encoded proteins and pseudogenes; minor frequency edits are not applied but all high frequency edits are included
revised_mito_plastid_edited.peff 52KB 114 Protein ids and their amino acid sequences with all possible variants supplied in PEFF format. PEFF allows for encoding the variants in a compact way in the file. Comet and some other search engines support PEFF.
revised_mito_plastid_pre-edit.fasta 48KB 114 Protein ids and unedited sequences (with the exception of essential edits for start and stop codons that need to be applied to generate a protein) for the 79 plastid-encoded and 35 mitochondrial-encoded proteins and pseudogenes
revised_mito_plastid_all-editing-permutations.fasta 1.2MB 3,818 Proteins ids and the >10.000 sequence variants (see Materials and Methods) to allow for complete and exhaustive MSMS data base search of all possible edits and allow for partial editing (similar as we did in this study)
Araport11_genes.201606.pep.2.fasta 25MB 48,359 Araport11 2016-06 [ link ]
TAIR10_label_pep_20101214.2.fasta 20MB 35,386 TAIR10 20101214 [ link ]
Refseq_GCF_000001735.4.protein.2.faa 24MB 48,265 Refseq 000001735.4 [ link ]
uniprot-rename-proteome_UP000006548.2.fasta 21MB 39,342 Uniprot UP000006548 [ link ]
araport11_pseudogene_3frame_newid.2.fasta 1.1MB 3,720 Araport11 from Qi Sun, Cornell University, select transcripts
crap_GFP.fasta 4KB 3 Our custom GFP contaminants Klaas Van Wijk 09-2020
crap_CONTAM.2.fasta 44KB 116 Our PeptideAtlas custom contaminants
CONTRIB_LW_peptides.2.fasta 1.3MB 16,809 Contributed peptides LW 16809 very short peptides [ link ]
CONTRIB_SIPs_peptides.2.fasta 44KB 607 Contributed peptides SIPs 607 very short peptides [ link ]
CONTRIB_sORFs_peptides.2.fasta 568KB 7,901 Contributed peptides sORFs 7901 very short peptides [ link ]
Arabidopsis_RNA_Edits.fasta 20KB 50 RNA Edits from Joshua Heazlewood 02-2021
Mito_ORFS_new.fasta 4KB 3 Mito ORFS from Philippe Giege
refseq_var_sloan_30.fasta 2.6MB 8,672 Sloan edits all possible permutations
refseq_var_IS_30.fasta 1.3MB 3,099 Permutations of RNA edits from Ian Small UWA
CONTRIB_Iowa_peptides.2.fasta 1.6MB 7,481 Contributed peptides from Iowa State University, Eve Wurtele
Araport11_CORE.fasta 14MB 27,559 Core proteome
Arabidopsis_PeptideAtlas_search.fasta 57MB 176,064 Full database used in the searches

We gratefully acknowledge the support for the Arabidopsis PeptideAtlas from NSF grant 1922871 “TRTech-PGR: A PeptideAtlas for Arabidopsis thaliana and other plant species; harnessing world-wide proteomics data and mining for biological features”.