Mapping over 74,000 scales from more than 31,500 APA PsycTests Questionnaires, Surveys, and Tests.
This is a search engine designed to help researchers, practitioners, and students discover scales from a database of approximately 30,000 questionnaires, surveys, and tests.
Two main modes of operation are currently supported:
Enter multiple survey items (i.e., statements or questions) to find related instruments. This is the preferred search mode, using a fine-tuned model to predict item-pair correlations (magnolia-psychometrics/surveybot3000) for higher accuracy.
Describe a construct, trait, or scale name to find relevant instruments. This mode uses a general-purpose model and yields results that may be more prone to jingle-jangle fallacies (sentence-transformers/all-mpnet-base-v2).
Behavioral science faces a proliferation problem: thousands of psychological measures exist, but researchers often struggle to determine whether their "new" construct is genuinely novel or merely reinvents existing measures under different labels. This challenge stems from an inherent asymmetry: Creating new measures is easy and reputationally rewarding, while systematically comparing them against thousands of existing ones is cumbersome and time-consuming. Traditional validation methods simply don't scale to the vast landscape of published measures. This search engine addresses this gap by enabling researchers to quickly identify conceptually similar measures, helping to reduce redundancy, improve construct clarity, and ultimately make psychological science more cumulative and efficient.
While earlier approaches using latent semantic analysis have shown promise (Rosenbusch et al., 2020), this work uses specialized state-of-the-art transformer models and a substantially larger dataset to better predict the empirical relationships between psychological measures.
The data are primarily sourced from the APA PsycTests Database with permission from the American Psychological Association (APA). The database includes approximately 30,000 scale representations derived from around 500,000 individual items that were extracted by parsing PDFs using (visual) language models.
Model accuracy: The accuracy of the magnolia-psychometrics/surveybot3000
model used for item-based search is reported in detail in the sources referenced below (Hommel & Arslan, 2025).
In brief, when predicting scale correlations, the synthetic model estimates converge with empirical correlations from human respondents with an accuracy of r = .83 (95% CI [.81; .85]; manifest).
For scale-based search using construct labels, please see Wulff & Mata (2025) on the accuracy of the pre-trained model (sentence-transformers/all-mpnet-base-v2).
Data trustworthiness: While we strive for high data quality, please note that the items and scales are extracted automatically from source PDFs.
As such, there may be occasional discrepancies or errors in the extracted text. We recommend verifying any critical information against the original source when possible.
The extraction process involved two models: a visual language model (Qwen/Qwen2.5-VL-7B-Instruct)
to parse the PDF layout and a large language model (Qwen/Qwen3-32B) to interpret and structure the content.
Errors in transcription may occur, especially with complex survey layouts or poor scan quality. Warning codes are provided to flag potential issues (see below).
Warning codes indicate potential data quality issues detected during automated AI-extraction from source PDFs. While these codes do not necessarily imply errors, they highlight areas that may require additional scrutiny. Regardless of warning codes, we recommend verifying any critical information against the original source when possible.
In item-based search, cosine similarity scores are closely aligned with the correlation coefficients one would expect when administering two scales to human respondents. In scale-based search, similarity scores reflect semantic relatedness between the search term and the scale and/or instrument labels in the database.
While cosine similarity theoretically ranges from -1 to 1, sentence vectors rarely yield negative scores because their many abstract linguistic features are predominantly positively correlated.
Unfortunately, due to copyright restrictions, we cannot display individual survey items (statements/questions) directly on the platform. However, you can follow the DOI link to obtain the items from the original source. To access full instrument details, please:
Please let us know by opening an issue in this GitHub repository or by contacting us via email.
SynthNet Search uses a specialized embedding model to predict the relatedness of survey items and scales. The database currently includes about 30,000 scale representations obtained from embedding statements from approx. 500,000 items. It is designed to facilitate the discovery of relevant measures in the behaviorial sciences.
For the main application, please cite the following work:
The conceptual search functionality is powered by a specialized language model trained to predict relationships between psychological items and scales:
APA
This research is part of the SYNTH research project (#546323839) , which is conducted as part of the META-REP Priority Program aiming to improve the replicability, reproducibility, and generalizability of empirical research in the behavioral sciences. SYNTH contributes to these goals by integrating large language models into current research workflows to reduce burdens on scientists while improving transparency and replicability.
Björn E. Hommel
Postdoctoral Researcher
Department of Personality Psychology and Psychological Assessment
Wilhelm Wundt Institute of Psychology | Leipzig University
bjoern.hommel@uni-leipzig.deWe gratefully acknowledge the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) for funding this work (#546323839), and the American Psychological Association (APA) for granting permission to use the APA PsycTests Database for this research.
This web application is maintained by magnolia psychometrics GmbH.