Instructions

This is a search engine designed to help researchers, practitioners, and students discover scales from a database of approximately 30,000 questionnaires, surveys, and tests.

Two main modes of operation are currently supported:

Item-Based Search

Enter multiple survey items (i.e., statements or questions) to find related instruments. This is the preferred search mode, using a fine-tuned model to predict item-pair correlations (magnolia-psychometrics/surveybot3000) for higher accuracy.

Scale-Based Search

Describe a construct, trait, or scale name to find relevant instruments. This mode uses a general-purpose model and yields results that may be more prone to jingle-jangle fallacies (sentence-transformers/all-mpnet-base-v2).

Motivation: Why do we need this?

Behavioral science faces a proliferation problem: thousands of psychological measures exist, but researchers often struggle to determine whether their "new" construct is genuinely novel or merely reinvents existing measures under different labels. This challenge stems from an inherent asymmetry: Creating new measures is easy and reputationally rewarding, while systematically comparing them against thousands of existing ones is cumbersome and time-consuming. Traditional validation methods simply don't scale to the vast landscape of published measures. This search engine addresses this gap by enabling researchers to quickly identify conceptually similar measures, helping to reduce redundancy, improve construct clarity, and ultimately make psychological science more cumulative and efficient.

While earlier approaches using latent semantic analysis have shown promise (Rosenbusch et al., 2020), this work uses specialized state-of-the-art transformer models and a substantially larger dataset to better predict the empirical relationships between psychological measures.

FAQ

What are the data sources for the search engine?

The data are primarily sourced from the APA PsycTests Database with permission from the American Psychological Association (APA). The database includes approximately 30,000 scale representations derived from around 500,000 individual items that were extracted by parsing PDFs using (visual) language models.

How trustworthy and accurate are the results?

Model accuracy: The accuracy of the magnolia-psychometrics/surveybot3000 model used for item-based search is reported in detail in the sources referenced below (Hommel & Arslan, 2025). In brief, when predicting scale correlations, the synthetic model estimates converge with empirical correlations from human respondents with an accuracy of r = .83 (95% CI [.81; .85]; manifest).

For scale-based search using construct labels, please see Wulff & Mata (2025) on the accuracy of the pre-trained model (sentence-transformers/all-mpnet-base-v2).

Data trustworthiness: While we strive for high data quality, please note that the items and scales are extracted automatically from source PDFs. As such, there may be occasional discrepancies or errors in the extracted text. We recommend verifying any critical information against the original source when possible. The extraction process involved two models: a visual language model (Qwen/Qwen2.5-VL-7B-Instruct) to parse the PDF layout and a large language model (Qwen/Qwen3-32B) to interpret and structure the content. Errors in transcription may occur, especially with complex survey layouts or poor scan quality. Warning codes are provided to flag potential issues (see below).

What are warning codes and what do they mean?

Warning codes indicate potential data quality issues detected during automated AI-extraction from source PDFs. While these codes do not necessarily imply errors, they highlight areas that may require additional scrutiny. Regardless of warning codes, we recommend verifying any critical information against the original source when possible.

What does the similarity score mean?

In item-based search, cosine similarity scores are closely aligned with the correlation coefficients one would expect when administering two scales to human respondents. In scale-based search, similarity scores reflect semantic relatedness between the search term and the scale and/or instrument labels in the database.

While cosine similarity theoretically ranges from -1 to 1, sentence vectors rarely yield negative scores because their many abstract linguistic features are predominantly positively correlated.

How can I access the full scales and items? Can I download or export search results?

Unfortunately, due to copyright restrictions, we cannot display individual survey items (statements/questions) directly on the platform. However, you can follow the DOI link to obtain the items from the original source. To access full instrument details, please:

  • Click the DOI link in search results to access the original source
  • Contact the instrument authors for permission to use the full measure
  • Check institutional access through your library's APA PsycTests subscription

Found a bug in the app or an error in the data?

Please let us know by opening an issue in this GitHub repository or by contacting us via email.

Research & Citation

SynthNet Search

SynthNet Search uses a specialized embedding model to predict the relatedness of survey items and scales. The database currently includes about 30,000 scale representations obtained from embedding statements from approx. 500,000 items. It is designed to facilitate the discovery of relevant measures in the behaviorial sciences.

For the main application, please cite the following work:

Hommel, B. E., Külpmann, A. I., & Arslan, R. C. (2025). The Synthetic Nomological Net: A search engine to identify conceptual overlap in measures in the behavioral sciences. Manuscript in preparation.
BibTeX

Embedding Model: SurveyBot3000

The conceptual search functionality is powered by a specialized language model trained to predict relationships between psychological items and scales:

APA
Hommel, B. E. & Arslan, R. C. (2025). Language models accurately infer correlations between psychological items and scales from text alone. Manuscript submitted for publication.
BibTeX

About the Project

This research is part of the SYNTH research project (#546323839) , which is conducted as part of the META-REP Priority Program aiming to improve the replicability, reproducibility, and generalizability of empirical research in the behavioral sciences. SYNTH contributes to these goals by integrating large language models into current research workflows to reduce burdens on scientists while improving transparency and replicability.

SYNTH Research Team

Björn E. Hommel

Björn E. Hommel

Postdoctoral Researcher

University of Leipzig

Ruben C. Arslan

Ruben C. Arslan

Professor, Psychological Research Methods

University of Witten

Ruben C. Arslan

Annika Külpmann

Doctoral Student

University of Witten

Malte Elson

Malte Elson

Professor, Psychology of Digitalisation

University of Bern

Jamie Cummins

Jamie Cummins

Senior Postdoctoral Researcher

University of Bern

Beth Clarke

Beth Clarke

Postdoctoral Researcher

University of Bern

Corresponding Author

Björn E. Hommel

Postdoctoral Researcher

Department of Personality Psychology and Psychological Assessment

Wilhelm Wundt Institute of Psychology | Leipzig University

bjoern.hommel@uni-leipzig.de

Acknowledgements

We gratefully acknowledge the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) for funding this work (#546323839), and the American Psychological Association (APA) for granting permission to use the APA PsycTests Database for this research.

This web application is maintained by magnolia psychometrics GmbH.