Each PeptideAtlas build is associated with a reference database -- usually a combination of several protein sequence databases (Swiss-Prot, IPI, Ensembl ...) for the species plus a database of contaminants. From the reference database, any protein that contains any observed peptide is considered to be a member of the Atlas. It is easy to see that the entire list of proteins in an Atlas is going to be highly redundant. Thus, we label each Atlas protein using the terminology below.
The term "observed peptides" in this context refers to the set of peptides in the PeptideAtlas build. These peptides are selected using a PSM (peptide spectrum match) FDR threshold applied to each experiment separately. (In older builds, peptides were selected using a probability cutoff to all PSMs for the Atlas.)
Taken together, the set of proteins with a Presence Level label for any Atlas has the property that no two members share exactly the same set of observed peptides.
Label | Technical definition | Practical definition |
---|---|---|
Canonical | Proteins with at least two 9AA or greater peptides with a total extent of 18AA or greater that are uniquely mapping within the core reference proteome (excludes isoforms, contaminants, and other sequences). | Canonical proteins are the most parsimonious, non-redundant list of proteins derived from the set of identified peptides mapped to the core reference proteome for an atlas. The number of canonical proteins is what we use as the protein count for the atlas build. |
Noncore-Canonical | Proteins with at least two 9AA or greater peptides with a total extent of 18AA or greater that do not map in the core reference proteome, but rather to an isoform, contaminant, or other protein missing from the core reference proteome. | Noncore-canonical proteins are a list of proteins that are very well supported by the set of identified peptides, but are not part of the core reference proteome of a species (often isoforms or contaminants, or other messy cases where the reference proteome is incompletely understood). |
Weak | Protein has more unique peptides than shared peptides, and only one uniquely mapping peptide 9AA or greater | This set of proteins has some promising evidence, but needs another uniquely-mapping 9AA or greater peptide that forms a total extent of 18AA or great to reach canonical status. |
Insufficient evidence | Protein has more unique peptides than shared peptides, but none are 9AA or greater | This set of proteins has some apparently uniquely mapping peptides but with fewer than 9AA there is too high a chance that it derives from something missing from our search space. |
Marginally Distinguished | Protein has unique peptides, but there are not more unique peptides than shared peptides, and the extended length of unique peptides is < 18AA. | This set of proteins are highly similar to other proteins with better evidence but there is at least one unique peptide that might be evidence from a distinct sequence. |
Indistinguishable Representative | Protein has no unique peptides, and there are several indistinguishable proteins, but this one is assigned to be an Indistinguishable Representative and the others are Indistinguishable. | This set of proteins are considered the leaders of groups of indistinguishable proteins. The peptide evidence indicates that at least one of the proteins in a group is present, but we cannot know which one. One is assigned to be the leader (indistinguishable representative). |
Subsumed | Protein has no unique peptides and shares all peptides with at least one protein but has fewer peptides | Proteins whose observed peptides are a subset of those of a canonical or other high ranked protein is considered subsumed. For any pair of leader/subsumed proteins, it is possible that both have been observed, but it is more conservative to claim that only the leader has been observed. Subsumed proteins are not necessary to explain all observed peptides. |
Representative | Protein is selected a representative in a situation more complex than a set of indistinguishable, where several proteins have shared peptides and at least some of the proteins must have been detected but it is not possible to determine which ones. | Representative proteins are selected in cases of substantial complexity to be the owners of peptides that are not uniquely mapping but cannot find a home within a canonical or other higher ranking protein. |
Not detected | Protein entry has no peptides at all. | This set of proteins have no peptides at all above a build's selected quality threshold that map to them. |
Proteins that are in the Atlas reference set that are redundant to proteins with the above labels are given the labels below.
Label | Technical definition | Practical definition |
---|---|---|
Indistinguishable | Protein has no unique peptides, and there are several indistinguishable proteins, and this one is assigned to be subordinate to its group leader, the Indistinguishable Representative. | Exactly the same peptides from this protein have been observed in one or more other proteins, and one of the others was selected as the Indistinguishable Representative. |
Identical | Identical in sequence to a protein with any other label. | The protein entry has an identical sequence to another protein entry and this one was given the label identical and does not compete for status with the other one, since there is no way to distinguish them. For example, if the core proteome contains two entries with identical sequence, one is assigned the category identical and the other one is free to compete for canonical or other status. Peptides that map to an entry labeled .identical. may still be uniquely mapping. |