PeptideAtlas Tiered Human Integrated Search Proteome

In order to provide human proteomics MS/MS search databases that are well defined, comprehensive, and frequently updated, we have developed an automated system that integrates all of major sources of human protein sequences into a set of search databases. These databases are tiered into several levels of complexity from which researchers may choose depending on the goal of the experiment and the data processing resources available.

Description of the Databases

On the first of every month, all protein lists are pulled down from their original sources. If any of them have changed, they are integrated according to the description in Deutsch et al. (submitted) and released here. If none of the source databases have changed, there is no new release. Briefly, the individual levels are as follows:

Level 1	Includes only the core ~20,000 primary isoforms from Swiss-Prot, Universal Protein Contaminants
Level 2	Level 1 plus all ~22,000 "varplic" alternative splice isoforms from Swiss-Prot, immunoglobulin variable region sequences from Swiss-Prot and IMGT.
Level 3	Level 2 plus GENCODE, UniProt "UP000005640" and additional non-redundant sequences from other small sources including microbes, external contributions, and additional RefSeq XP sequences.
Level 4	A "kitchen sink" database that includes Level 3 plus all other distinct sequences from UniProtKB/TrEMBL and RefSeq XP that are not already present in lower levels.

Listing of All Source Databases

Database	Date	# Entries	Level 1	Level 2	Level 3	Level 4
Swiss-Prot canonical	2025-04-01	20,402	20,402	20,402	20,402	20,402
Swiss-Prot + varsplic	2025-04-01	42,503	20,402	42,499	42,499	42,499
GENCODE	2024-11-01	112,218			60,321	60,321
UP000005640	2025-04-01	105,486	20,402	42,499	45,325	45,325
UniProtKB + TrEMBL	2025-04-01	227,104	20,402	42,499	45,325	142,906
NCBI RefSeq NP	2025-04-01	67,727			13,447	12,794
NCBI RefSeq XP	2025-04-01	131,315				50,926
IMGT	2025-04-01	781		781	781	781
Microb	2025-04-01	1,608			1,608	1,608
Contrib	2025-04-01	726,331			726,331	726,331
Contaminant	2024-04-19	499	299	299	299	299
# Entries			24,255	47,133	851,680	999,534

Download THISP Databases

Below are the monthly releases of the THISP databases available for download. The "Base" is the set of Level 1-4 FASTA files (target and target-decoy). The "Components" is the set of all individual source components (from neXtProt, RefSeq, IMGT, cRAP, etc.) used to make the FASTA files in "Base", as described in the THISP article.

2025-04-01	2025-03-01	2025-02-01	2025-01-01	2024-12-01	2024-11-01	2024-10-01	2024-09-01
2024-08-01	2024-07-01	2024-06-01	2024-05-01	2024-04-01	2024-02-01	2024-01-01	2023-12-01
2023-11-01	2023-10-01	2023-09-01	2023-08-01	2023-07-01	2023-06-01	2023-05-01	2023-04-01
2023-03-01	2023-02-01	2023-01-01	2022-12-01	2022-11-01	2022-10-01	2022-09-01	2022-08-01
2022-07-01	2022-06-01	2022-05-01	2022-04-01	2022-03-01	2022-02-01	2022-01-01	2021-12-01
2021-10-01	2021-09-01	2021-08-01	2021-07-01	2021-06-01	2021-05-01	2021-04-01	2021-03-01
2021-02-01	2021-01-01	2020-12-01	2020-11-01	2020-10-01	2020-09-01	2020-08-01	2020-07-01
2020-06-01	2020-05-01	2020-04-01	2020-03-01	2020-02-01	2020-01-01	2019-12-01	2019-10-01
2019-09-01	2019-08-01	2019-07-01	2019-06-01	2019-05-01	2019-04-01	2019-03-01	2019-02-01
2019-01-01	2018-12-01	2018-11-01	2018-10-01	2018-09-01	2018-08-01	2018-07-01	2018-06-01
2018-05-01	2018-04-01	2018-03-01	2018-02-01	2018-01-01	2017-12-01	2017-11-01	2017-10-01
2017-09-01	2017-08-01	2017-07-01	2017-06-01	2017-05-01	2017-04-01	2017-03-01	2017-02-01
2017-01-01	2016-12-01	2016-11-01	2016-10-01	2016-09-01	2016-08-01	2016-07-01	2016-06-01
2016-05-01	2016-04-06	2016-03-01	2016-02-01	2016-01-01	2015-12-01	2015-11-01	2015-10-01

Cite

If you use this database, please cite us:

General purpose citation: Deutsch et al., "Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics", J Proteome Res. Author manuscript; available in PMC 2016 Nov 4.