Narcissism Test

I wrote created this proposal for a new instrument to measure narcissism for our assessments class at the University of Rochester.

Implicit Association Test Instrument for Narcissism (IAT-Narc)

Abstract

Narcissism is a psychological construct of growing interest, yet most existing assessments rely on self-report instruments that are prone to social desirability bias and lack comprehensive construct validity. The present study introduces the Implicit Association Test for Narcissism (IAT-Narc), a novel instrument designed to assess unconscious narcissistic traits using reaction-time tasks. By leveraging the Implicit Association Test methodology (Greenwald et al., 2017) and adopting the triarchic model of narcissism (Wright & Edershile, 2017)—which includes grandiosity, entitlement, and vulnerability—IAT-Narc aims to overcome the limitations of traditional explicit measures. This paper outlines the development of IAT-Narc and a four-phase empirical validation process: Exploratory Factor Analysis, Confirmatory Factor Analysis, Convergent Validity Assessment, and Test-Retest Reliability. The researchers emphasize ethical considerations and cultural sensitivity throughout the design and deployment of the instrument. The goal is to contribute a psychometrically robust tool for measuring narcissism that captures implicit processes often missed by self-report approaches and delivers a positive impact on clinical and research applications.

Keywords: narcissism, implicit association test, grandiosity, vulnerability, entitlement, psychometrics, triarchic model



Literature Review

Introduction

Interest in narcissism is high in both academic research and popular culture, partly due to concerns that Western societies are becoming more narcissistic (Miller et al., 2014). Existing self-report instruments are limited by social desirability bias (Heinze et al., 2020) and weak construct validity, particularly in distinguishing between grandiose and vulnerable subtypes (Wright & Edershile, 2017). Implicit Association Tests (IATs) offer a promising alternative by capturing unconscious processes through reaction-time tasks (Greenwald et al., 2000). This project introduces IAT-Narc, a linguistically informed IAT designed to address these limitations.

Limitations of Self-Report Measures

Self-report measures of narcissism face limitations. Wright and Edershile (2017) highlight the heated debate surrounding the construct of narcissism, arguing that the most widely used measure—the NPI (used in 77% of studies), measures only grandiose narcissism and overlooks its counterpart, vulnerable narcissism. They propose a new “triarchic” model of measuring narcissism, which puts the core trait of narcissism as “entitlement,” fluidly expressed as either grandiosity or vulnerability sub-constructs, depending on personality traits and context. 

Miller et al. (2014) agree with the concerns about construct validity and consistency across instruments. They studied several self-report measures of narcissism,  including the NPI-16 and PNI, PDQ-4 NPD, and PID-5, comparing each with expert ratings. They found that the NPI-16 was significantly correlated with grandiose narcissism but failed to correlate with vulnerable traits. The additional self-report instruments only showed moderate or mixed correlations.

In a second 2014 study, Miller et al. compared additional instruments, FFNI, PNI, HSNS, and the NPI-16, against expert ratings. They found the NPI-16 and FFNI-G most strongly correlated with grandiose narcissism. The HSNS correlated with vulnerable narcissism, yet somewhat imprecisely, as it also appeared to correlate to neuroticism and low extraversion, raising concerns about construct specificity. Other instruments only showed moderate or mixed correlations.

Heinze et al. (2020) also discuss the limitations of self-report measures of narcissism due to social desirability bias. They claim a narcissistic individual may update their response to appear less narcissistic. Greenwald and Banaji (2017) attribute this bias as a fundamental problem in all explicit self-report tools. Wright and Edershile (2018) also emphasize that individuals may lack insight and present their self-reports defensively.

These findings highlight the ongoing limitations in self-report measures of narcissism, including differentiating grandiose and vulnerable narcissism, the insufficiency of currently available self-reporting to address bias, and the lack of agreement in instrument measurement. There appears to be an opportunity to more comprehensively define the construct of narcissism and create a new measurement to address these limitations. Implicit Association Tests (IATs) may be a way to do this.

Cognitive and Implicit Processes

An alternative to self-report measurements are IATs. According to Greenwald and Banaji (2017), IATs are a revolution in measuring constructs. Typically, self-report measures only measure conscious (“controlled” or “explicit”) constructs, but through IATs, we can measure unconscious (“automatic” or “implicit”) constructs. Known as  “dual-process” theory, we can account for the fallible nature of conscious processes. Greenwald and Banaji (2017) describe how IATs are implemented by presenting word-sorting tasks to computer participants and measuring their reaction time. The premise is that when there is an unconscious bias to associate two words together, reaction times will be faster.

Kurdi et al. (2021) respond to criticism that IATs may not measure distinct implicit constructs, but rather a variant of explicit attitudes. Dual-process theory states that there are two different constructs, for example, explicit conscious narcissism and implicit unconscious narcissism. Self-report tests measure a conscious version, and IATs measure an unconscious version. Kurdi et al. (2021) respond to extensive critiques by Schimmack (2021), demonstrating that even if the dual-process theory is incomplete and IATs measure only an implicit sub-construct of the explicit construct, they still offer new and valid information beyond what self-report tools provide.

Heinze et al. (2020) agree with the limits of self-report tests and the potential of IAT, displayed via their work building an Antagonistic Narcissism IAT (AN-IAT) to measure a particular form of grandiose narcissism. In their IAT, participants sorted words into categories like “Me” vs. “Not Me” and “Narcissistic” vs. “Not Narcissistic.” They conducted three studies with the tool, the first (N = 224) to show construct validity comparing AN-IAT to self-report scores (e.g., NPI, PNI), the second (N = 210) tested temporal stability finding high reliability within sessions (0.88) and moderate reliability after one week (0.64). The third (N = 648) validated AN-IAT against third-party informant ratings from people who knew the participant. Their IAT tool was valid and reliable. 

The word list used in any IAT is critical to its construct validity. Heinze et al. (2020) selected words for their Antagonistic Narcissism IAT by manually extracting terms from existing self-report measures that they believed reflected the construct. However, this subjective approach introduces the risk of human error and lacks systematic validation. It remains unclear whether the selected words comprehensively and accurately represent the targeted dimension of narcissism. Moreover, their tool was limited to grandiose traits and did not account for the expanded, triarchic model of narcissism, including grandiosity and vulnerability mediated by entitlement (Wright & Edershile, 2018). To address these gaps, researchers must more closely examine the linguistic markers of narcissism.

Linguistic Correlates

Elleuch et al. (2024) reviewed 43 studies on the psycholinguistic features of grandiose narcissism. They found that narcissistic individuals often use boastful, dominant, control-focused language, downplaying the achievements of others. Common patterns included flattery, condescension, impulsive and aggressive language, and expressions of superiority. They also noted that while researchers have widely validated the NPI for grandiose traits, it does not adequately capture vulnerable narcissism. In a study by Holtzman et al. (2019), researchers analyzed 4,941 texts across 15 content types—including social media posts, essays, and video transcripts. They used the Linguistic Inquiry and Word Count (LIWC) tool, which generates 72 linguistic variables (“effects”) to characterize text. Seventeen of these effects were significantly associated with narcissism scores as measured by the NPI. Strong positive correlations included words related to sports, second-person pronouns (e.g., “you”), profanity, and sexual content. In contrast, individuals with higher narcissism scores tended to use fewer words reflecting anxiety, fear, tentativeness, and sensory experiences (e.g., “see,” “hear”). 

Zhang et al. (2023) agree that people often reveal narcissistic traits through their everyday language. They examined how narcissistic traits manifest in everyday language among older adults (N = 281, ages 65–89). Researchers asked participants to complete the NPI-16 and then provided them with an Android-based recording device that captured random 30-second snippets of their daily conversations. The researchers collected and transcribed 28,323 usable audio samples, which were analyzed using a machine-learning model in conjunction with the LIWC tool. They found that individuals with higher narcissism scores used more personal and group pronouns (e.g., “I,” “we,” “you,” “they”), achievement-related words (e.g., “win,” “success”), causal language (e.g., “because,” “since,” “therefore”), often used to justify or frame a desired state over a current one, and terms related to sex—indicating consistent linguistic markers of narcissism in naturalistic settings.

These findings demonstrate a clear and consistent link between narcissism and language use. An IAT that incorporates linguistically relevant stimuli and measures participants’ response times may improve our ability to assess narcissism more accurately and expand the construct to include both grandiose and vulnerable traits.

Conclusion

Both social desirability bias and a narrow focus on grandiose traits limit existing self-report measures of narcissism. Implicit Association Tests (IATs) offer a promising alternative by capturing unconscious processes and bypassing the limitations of explicit self-reporting. Researchers can design an IAT to reflect Wright and Edershile’s (2018) triarchic model of narcissism—comprising grandiosity and vulnerability, mediated by entitlement—thereby providing a more comprehensive and valid assessment of the construct.

A critical component of IAT development is selecting words that accurately represent the constructs researchers aim to measure (Heinze et al., 2020; Greenwald & Banaji, 2017). While prior studies have relied on expert consensus to curate these word lists, the current project proposes a hybrid approach that combines expert judgment with artificial intelligence (AI), specifically large language models (LLMs), to assist in generating initial word sets. This integration of AI with human expertise aims to improve the breadth and precision of the item pool beyond what manual selection alone can offer.



Methodology

Development of the IAT-Narc Implicit Association Test (IAT)

We developed a novel Implicit Association Test to assess implicit narcissistic traits: the IAT-Narc. We populated the IAT-Narc with carefully selected word items.

Word Items

The first step in this process is to generate lists of words for use by the IAT tool. We will need eight word lists, each mapping to different sub-constructs of the narcissism construct and Self and Other constructs. As proposed earlier, we will use a hybrid approach to generate the word lists with the help of AI. We were testing prompts in Open AI's Chat GPT 4o for our proposal. The below table summarizes each of the word lists. Appendix A shows the exact prompt provided and the word list output received.

We hypothesize that an exploratory factor analysis (EFA) will reduce the initial 324-word item pool to approximately 84–144 items, retaining 10–20 high-loading words per sub-construct across the six narcissism dimensions (three traits and their antonyms). We will retain an additional 24 words for the Self and Other categories (12 each) to support the target categorization components of the IAT.

To implement the test, we will utilize MinnoJS, an open-source framework from Project Implicit, and integrate it into Qualtrics for seamless administration. IAT-Narc will measure implicit narcissistic associations through reaction-time-based word categorization tasks, which we will organize into four separate components.

Task 1. Target Categorization

The test begins with a target categorization task, where participants classify 12 self-related words (such as Me, Myself, and Mine) and 12 other-related words (such as They, Them, and Their). The program presents each word on the screen one at a time, in randomized order, for a maximum of 1500 milliseconds or until the participant responds. We instruct participants to press the "E" key to categorize a word as "Me" and the "I" key to categorize it as "Not Me." The software records each reaction time in milliseconds. If a participant does not respond within 1500 milliseconds, the system logs the maximum value and proceeds to the next word. This task establishes a baseline reaction time for distinguishing between self-related and other-related concepts.

Task 2. Attribute Categorization

Following the target categorization, participants will proceed to the attribute categorization task, where they will classify 60 words as either "Narcissistic" (e.g., Admired, Special, Insecure) or "Not Narcissistic" (e.g., Humble, Modest, Secure). The same process and key mapping will be used, with "E" assigned to narcissistic words and "I" assigned to non-narcissistic words.

Task 3. Combined (First Critical Test)

In the first critical test phase (Combined Task), we merge the two categorization tasks so that participants sort words based on the pairing of Me + Narcissistic versus Not Me + Not Narcissistic. During this task, each participant sees 72 words—60 narcissistic-related and 12 self-related—displayed randomly. The program presents each word one at a time, and participants press the "E" key if the word matches the Me + Narcissistic category or the "I" key if it matches Not Me + Not Narcissistic.

Task 4. Reversed (Second Critical Test)

In the second critical test phase (Reversed Task), we reverse the category pairings to measure implicit resistance to associating oneself with narcissistic traits. Participants now sort words according to Me + Not Narcissistic versus Not Me + Narcissistic, again using the “E” key for Me + Not Narcissistic and the “I” key for Not Me + Narcissistic. As in the combined task, the program presents each participant with 72 words per block—60 narcissistic-related and 12 self-related—randomly ordered.

Dependent Measure, Reaction Times

The system records reaction times for each categorization, and we analyze differences in response latency between congruent (Me + Narcissistic) and incongruent (Me + Not Narcissistic) pairings to assess the strength of implicit narcissistic associations. Faster reaction times when participants pair self-related words with narcissistic traits indicate higher implicit narcissism scores, while slower responses in those conditions may reflect weaker implicit narcissistic tendencies. We express reaction time data primarily through the D-score, as detailed in the Data Analysis Plan.

Participants and Sampling Procedures

After the construction of the instrument, as described above. The study will recruit adult participants from MTurk and university participant pools, ensuring a diverse demographic sample. Inclusion criteria require participants to be 18 or older, fluent in English, and U.S. residents. Participants will complete an informed consent form and provide demographic information (e.g., age, gender identity, race/ethnicity, education level, socioeconomic status, primary language, and any diagnosed cognitive impairments).

Participants must use a computer with internet access that is compatible with the study software. Exclusion criteria include non-adults, non-fluent English speakers, individuals with cognitive impairments affecting reaction time tasks, and those who fail software compatibility or attention checks. This approach ensures high-quality data collection and validity in measuring implicit narcissism associations. The study will then have four phases.

Phase 1: Exploratory Factor Analysis (EFA)

This phase will involve an initial sample of N = 300–600 participants. The purpose is to explore the underlying factor structure of the IAT-Narc item pool and identify high-loading items for retention.

Phase 2: Confirmatory Factor Analysis (CFA)

In Phase 2, we will recruit a separate sample of N = 400–600 participants to validate the factor structure identified in Phase 1. We will use Confirmatory Factor Analysis (CFA) to assess model fit and factor loadings, thereby confirming the latent structure of the refined word set.

Phase 3: Convergent Validity Assessment

To evaluate convergent validity, N = 400–600 participants will complete both the IAT-Narc and the NPI-16, a standardized measure of explicit narcissism, and the Hypersensitive Narcissism Scale (HSNS), a measure of implicit narcissism. Correlational analyses will assess the relationship between implicit and explicit narcissism scores.

Phase 4: Test-Retest Reliability

A subset of N = 100–150 participants from Phase 3 will be re-administered the IAT-Narc approximately two weeks later. Phase 4 will allow for assessing temporal stability and internal consistency over time.

Data Analysis Plan

Below, we will describe statistical procedures for construct and predictive validity, including exploratory and confirmatory factor analysis and a longitudinal test.

D-scores for Word Items

Within each phase, we will calculate a D-score for every word item. IAT research commonly uses D-scores to quantify implicit associations. We derive each D-score by computing the difference in reaction times between congruent and incongruent IAT blocks (e.g., Me + Narcissistic vs. Me + Not Narcissistic) and adjusting for variability (Greenwald et al., 2003). The standardized software we use—MinnoJS—automatically computes D-scores. For clarity, we also outline a simplified version of the calculation process below:

  1. Remove extremely fast trials (<300 ms) and those >1500 ms are capped at 1500ms.

  2. Penalize incorrect trials by replacing the reaction time (RT) with the block mean + 600 ms.

  3. Calculate the mean RT for each block task (Combined vs. Reversed).

  4. Calculate standard deviation across all tasks.

  5. Calculate the difference between mean RT’s in blocks (e.g., incongruent – congruent).

  6. Divide the difference by the pooled standard deviation to get a D-score

D-score=Mean RTincongruent - Mean RTcongruentSDpooled

A typical output after executing the IAT is shown in Table 1 below.

Table 1 - Typical IAT Data Output

From the results in Table 1, we would calculate D-scores for each participant item and transform our output into a format similar to that in Table 2 below.

Table 2 - Participant × Item D-score matrix

Phase 1: Exploratory Factor Analysis (EFA)

Using the D-score technique described above, we will execute Phase 1 according to our outlined methodology and generate a result set like the one shown in Table 2—a Participant × Item D-score matrix. Since we do not need participant ID data for this analysis, we will remove that column, resulting in an n × m matrix of participants by items. Before running the EFA, we will perform two key diagnostic checks: Bartlett’s Test of Sphericity and the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy.

Bartlett’s Test of Sphericity Test

In order to determine if our dataset is suitable for factor analysis we will first conduct a Bartlett’s Test of Sphericity. This test tells us whether the words in our dataset are related enough to each other to form meaningful groups or patterns. If the words were completely unrelated, factor analysis wouldn’t be useful. The test gives us a p-value. If the p-value is less than 0.05, the relationships between the words are strong enough to continue with factor analysis. We will use Python for our data analysis with the “factor_analyzer” Python package.

Kaiser-Meyer-Olkin (KMO) Test

To further assess the suitability of our dataset for factor analysis, we will conduct a Kaiser-Meyer-Olkin (KMO) Test. This test assesses how suited our data is for factor analysis based on the proportion of variance among variables that might be common variance (i.e., shareable through factors). This test returns a KMO Value from 0 to 1. A value above 0.80 suggests our data is suitable for factor analysis. Between 0.60 and 0.80, our data should be adequate. Below 0.60, we may consider removing words with low variance before continuing. We will again use Python for our data analysis with the “factor_analyzer” Python package.

Principal Axis Factoring (PAF) with Oblique Rotation (Promax)

After the diagnostic checks, we will use the Principal Axis Factoring (PAF) to perform the EFA. We will apply an oblique rotation method, specifically Promax. The input to the PAF will be the same participant x item matrix described above. We will again use the “factor_analyzer” Python package, which will provide results such as Table 3 below.

Table 3 - EFA Output

Each value in the matrix is called a factor loading. A factor loading represents how strongly a word is associated with a particular underlying factor. If a word has a high loading (e.g., ≥ 0.40) on one factor and low loadings on all others, it is considered a good candidate for that factor. Words with low loadings across all factors (e.g., < 0.30) typically do not meaningfully relate to any factor and are usually removed. If a word has high loadings on multiple factors, it is said to cross-load, which makes it ambiguous — these items are also usually removed to maintain clear factor interpretation.

Scree Plot & Eigenvalues

We shall then decide how many factors we keep in our instrument, e.g., Entitlement, Grandiosity, Vulnerability, antonym for each, Self and Other, or a new set of constructs found by the EFA. We will look at both eigenvalues and a scree plot to do this. An eigenvalue represents the amount of variance explained by a factor. We will retain all factors with an eigenvalue greater than 1.0 according to the Kaiser criterion. A scree plot visually displays the eigenvalues in descending order. The result of our analysis of this data is to update our factors and related word items based on the results of this EFA to form a new instrument structure.

Phase 2: Confirmatory Factor Analysis (CFA)

We will then proceed to Confirmatory Factor Analysis (CFA) to validate the factor structure we identified in Phase 1. In this phase, we will recruit and test a new sample of N = 400–600 participants using the same methodology. The CFA will evaluate whether this new data supports the refined structure of the IAT-Narc—comprising up to eight factors: three narcissism traits, their antonyms, and Self/Other categories.

The CFA process begins with two main inputs: (1) a participant × item D-score matrix that we generate from the new sample and (2) a hypothesized model in which we specify which items we expect to load onto each latent factor based on the results of the EFA. We will implement the CFA using the Python library "semopy". The software will compare the observed correlation (or covariance) matrix derived from the D-score data with the model's implied (predicted) correlation matrix. It will then calculate standardized factor loadings and several model fit indices—including CFI, TLI, RMSEA, SRMR, and the Chi-Square test—to evaluate how well the hypothesized model fits the observed data.

Standardized Factor Loadings

A key output of the CFA will be a matrix mapping each item to its corresponding factor, with an associated factor loading value, similar to Table 3. If the model is a good fit, each item should display a high loading (e.g., ≥ 0.40) on its assigned factor and minimal cross-loading on other factors.

Model Indices

Several model indices listed below will be output from our CFA via the Python package "semopy". We will use these to evaluate the validation of our instrument.

Comparative Fit Index (CFI). Compares the fit of our model to a null model with no relationships. Values ≥ .90 indicate acceptable fit; ≥ .95 indicate excellent fit.

Tucker-Lewis Index (TLI). It is like CFI but includes a penalty for model complexity. Values ≥ .90 indicate acceptable fit.

Root Mean Square Error of Approximation (RMSE). The RMSE estimates the model's error of approximation per degree of freedom. Values < .08 indicate reasonable fit; < .05 indicate close fit.

Standardized Root Mean Square Residual (SRMR). Measures the average difference between observed and predicted correlations. Values < .08 indicate a good fit.

Chi-Square Test. Tests whether the observed and implied correlation matrices differ significantly. A non-significant p-value (p > .05) suggests a good fit, though this test is sensitive to sample size and may be significant in large samples.

Phase 3: Convergent Validity Assessment

In Phase 3, we will assess the convergent validity of the IAT-Narc using a new sample of N = 400–600 participants, whom we will recruit using the same inclusion criteria and procedures outlined earlier. Participants will complete three instruments: the IAT-Narc, the Narcissistic Personality Inventory-16 (NPI-16), and the Hypersensitive Narcissism Scale (HSNS). Researchers commonly use the NPI-16 to measure grandiose narcissism, while the HSNS captures elements of vulnerable narcissism. Because the IAT-Narc targets grandiose and vulnerable dimensions, these self-report scales are appropriate benchmarks for convergent validity.

Example Data Output

Upon completing this phase, the dataset will include each participant's implicit D-scores for grandiose and vulnerable traits, a composite total score, and their self-reported scores from the NPI-16 and HSNS. Table 4 illustrates a hypothetical sample of the expected data structure:

Table 4. Example Data Output from Convergent Validity Test

Correlational Analysis

We will begin with Pearson correlation analyses between the IAT-Narc D-scores and both self-report instruments. Specifically:

  • IAT Grandiose D-score will be correlated with NPI-16 scores.

  • IAT Vulnerable D-score will be correlated with HSNS scores.

  • IAT Total D-score will be correlated with the average of NPI-16 and HSNS scores (standardized if needed).

A strong positive correlation (e.g., r ≥ .30) between the IAT Grandiose D-score and NPI-16 would suggest that the implicit measure aligns well with explicit grandiose traits. Similarly, a moderate to strong correlation (r ≥ .20) between the IAT Vulnerable D-score and HSNS would support the instrument’s sensitivity to vulnerable traits. A significant correlation between the total D-score and combined narcissism measures would support overall convergent validity.

Multiple Regression Analyses

Next, we will conduct multiple regression analyses to test whether the IAT-Narc scores predict self-reported narcissism. In these models, we will treat NPI-16 and HSNS scores as dependent variables and use the IAT Grandiose D-score and IAT Vulnerable D-score as predictors.

Key output metrics will include:

  • R² (R-squared): Indicates the proportion of variance explained in the self-report scores by the IAT-Narc predictors. Values above .10 are meaningful in psychological research, with R² ≥ .30 considered strong.

  • β (Beta coefficient): Reflects the strength and direction of the relationship. Values above β = 0.30 suggest moderate predictive power.

  • p-values: Values less than .05 indicate statistical significance.

This analysis will show not only whether the IAT predicts self-report outcomes but also which subcomponents (grandiose or vulnerable) contribute most strongly to those predictions.

Structural Equation Model (SEM)

Finally, we will use Structural Equation Modeling (SEM) to assess how well our proposed theoretical model fits the observed data. SEM combines elements of regression and factor analysis to test the overall factor structure and interrelations between constructs.

The inputs to the SEM will include the IAT-Narc subscale scores, NPI-16, and HSNS scores, as well as a model specification derived from the triarchic theory of narcissism. SEM will estimate how well the observed data match the predicted structure.

 We will evaluate model fit using the following indices:

  • CFI (Comparative Fit Index): ≥ .90 indicates acceptable fit; ≥ .95 indicates excellent fit.

  • TLI (Tucker-Lewis Index): ≥ .90 desirable.

  • RMSEA (Root Mean Square Error of Approximation): < .08 reasonable; < .05 excellent.

  • SRMR (Standardized Root Mean Square Residual): < .08 desirable.

  • Chi-Square Test (χ²): A non-significant p-value (e.g., p > .05) suggests a good fit, though this metric is sensitive to sample size.

  • Standardized Path Coefficients (β): Indicate the strength of relationships among latent variables; values above .30 are considered moderate.

This analysis will allow us to validate the measurement model of IAT-Narc and its conceptual alignment with explicit measures of narcissism.

Cross-Sectional Experimental Tests for Construct Validity

We will collect scores on NPI-16 for each individual completing IAT-Narc. This will allow us to conduct a cross-sectional analysis. If scores of IAT-Narc have a high correlation to NPI-16, we have evidence of construct validity.

Longitudinal Experimental Tests for Construct Validity

We will administer the test again two weeks later for a subset of participants (N = 100–150) who complete the IAT-Narc in Phase 3. Although this two-week interval does not constitute an actual longitudinal design—since it does not span months or years—it still provides valuable short-term reliability data. Future studies can build on this work by implementing long-term longitudinal designs to evaluate construct validity further.

Phase 4: Test–Retest Reliability

To assess the temporal stability of the IAT-Narc, we will conduct a test-retest reliability analysis in Phase 4 of the study. We will invite a subset of participants (N = 100–150) from Phase 3 to complete the IAT-Narc a second time, approximately two weeks after their initial session. We chose this interval to balance potential memory effects to capture meaningful test stability over time.

In this phase, we aim to determine whether the IAT-Narc yields consistent results when participants take it twice under similar conditions. If participants' scores on the second administration strongly correlate with their initial scores, we can conclude that the instrument reliably measures implicit narcissism over time. These results suggest that the IAT-Narc captures stable personality traits rather than temporary mood states, distractions, or other short-term influences.

Intraclass Correlation Coefficient (ICC)

The primary statistical metric used for this assessment will be the Intraclass Correlation Coefficient (ICC), which is commonly used in psychological research to quantify test-retest reliability. ICC values range from 0 to 1, with higher values indicating greater reliability. Based on established guidelines (Koo & Li, 2016). We will calculate separate ICCs for IAT Grandiose D-scores, IAT Vulnerable D-scores, and IAT Total D-scores. This breakdown allows us to examine whether each dimension of the IAT-Narc—grandiose, vulnerable, and combined—demonstrates adequate reliability.

If ICC values meet or exceed the .70 threshold across subconstructs, this will prove that the IAT-Narc demonstrates stable implicit associations over time. Thereby supporting its use as a reliable psychological instrument. Test-retest reliability is especially critical for implicit measures, in which subtle changes can influence cognition, mood, or context. Strong reliability results affirm that the IAT-Narc is valid and consistently reproducible across time points.

Additional Reliability Considerations

Although test–retest is the primary focus of this phase, we also note additional reliability constructs for completeness:

Internal Consistency Reliability. Split-half reliability and Cronbach’s Alpha (α ≥ .70 acceptable) could be calculated for each subconstruct post-Phase 1 and Phase 2, although this is more traditionally suited for explicit instruments with item redundancy. Since IATs rely on reaction time and concept-pairing rather than item similarity, internal consistency is less critical but still informative if adapted to D-scores.

Alternate Form Reliability. We did not assess Alternate Form Reliability in this study, but we could explore it in future research by creating two parallel versions of IAT-Narc and evaluating score equivalence.

Inter-Rater Reliability. This is not applicable in the context of IATs, as scoring is fully automated and does not involve human interpretation. However, it could be used if we asked experts to rate narcissism based on DSM criteria.

Data Interpretation Scheme

To support the practical application of the IAT-Narc, we propose a clear data interpretation scheme grounded in normative comparison and construct-level mapping. The IAT-Narc produces D-scores, which are standardized effect sizes derived from differences in response times between congruent and incongruent block pairings (Greenwald et al., 2003). These D-scores range roughly from -1 to +1 and reflect the strength of implicit associations between the self and narcissistic attributes. We have given a further breakdown in Table 5 below.

Table 5. IAT-Narc D-score Interpretation and Cut-offs

These cutoffs follow recommendations by Greenwald et al. (2003). They will be applied separately to subscale scores (e.g., IAT Grandiose D-score, IAT Vulnerable D-score) and the IAT Total D-score.

Norm Referencing (Final Score Interpretation)

To support interpretability and normative referencing, we will generate descriptive statistics from our Phase 3 data, including the mean (M) and standard deviation (SD) for each IAT-Narc D-score: grandiose, vulnerable, and total. These metrics allow us to establish normative benchmarks for interpreting individual scores. We can also extend this process to include percentile rankings or z-score conversions for more granular assessment.

Social and Cultural Diversity

We will initially norm IAT-Narc using data from English-speaking U.S. residents, with extensive demographic information collected during the norming process. However, cultural and linguistic biases may still arise—particularly due to varying familiarity with specific word stimuli across educational and regional backgrounds. Future adaptations to enhance inclusivity could involve translating the instrument into other languages, modifying word lists to reflect vernacular differences, or even incorporating graphical stimuli in place of text. Crucially, any instrument use of IAT-Narc outside the original norming population will require re-norming and validation to ensure cultural and linguistic appropriateness.

Potential Impact

IAT-Narc has the potential to make meaningful contributions to both clinical practice and psychological research. Clinically, it could serve as an early screening tool for narcissistic traits in contexts where social desirability bias often undermines traditional assessments—such as forensic evaluations, workplace assessments, and therapeutic settings. IAT-Narc offers a more nuanced and less biased alternative to self-report measures by tapping into implicit processes. In research contexts, the instrument could advance the field’s understanding of narcissism by distinguishing between implicit and explicit constructs and clarifying subdimensions such as grandiosity, vulnerability, and entitlement. Over time, this may support the development of more precise interventions to address narcissistic traits.

Ethical Considerations

We will approach this study in alignment with ethical standards set by the ACA Code of Ethics and current best practices in psychological assessment. All participants will provide informed consent and receive a clear explanation of the purpose, benefits, and limitations of the IAT-Narc. We will obtain IRB approval for all work. We will inform participants that the instrument is a screening tool, not a diagnostic test.

We will deidentify all participant data to protect privacy, store it securely using encrypted systems, and destroy it after analysis. We will mitigate risks such as distress or misclassification by providing contact information for mental health resources, including a debriefing at the end of participation. Additionally, individuals with appropriate training and scope of practice will only administer the instrument. Future adaptations of the tool will comply with ADA requirements, including offering accessible formats where needed. 

Conclusion

We propose the development of IAT-Narc in response to the limitations of existing self-report measures, which often fail to capture unconscious processes and inadequately represent vulnerable dimensions of narcissism. The IAT-Narc offers a more comprehensive and unbiased assessment of narcissistic traits by incorporating reaction-time methodology and a linguistically grounded, triarchic framework.

This new instrument has the potential to improve both research and clinical practice by revealing implicit narcissistic tendencies that traditional tools overlook. Its applications include early detection, more nuanced personality assessment, and enhanced diagnostic accuracy in settings where social desirability or self-insight limitations undermine explicit self-reporting.

References

Elleuch, D. (2024). Narcissistic personality disorder through psycholinguistic analysis and neuroscientific correlates. Frontiers in Behavioral Neuroscience, 18, 1354258. https://doi.org/10.3389/fnbeh.2024.1354258

Greenwald, A. G., & Banaji, M. R. (2017). The implicit revolution: Reconceiving the relation between conscious and unconscious. American Psychologist, 72(9), 861–871. https://doi.org/10.1037/amp0000238

Greenwald, A. G., Nosek, B. A., & Banaji, M. R. (2003). Understanding and using the Implicit Association Test: I. An improved scoring algorithm. Journal of Personality and Social Psychology, 85(2), 197–216. https://doi.org/10.1037/0022-3514.85.2.197

Heinze, P. E., Fatfouta, R., & Schröder-Abé, M. (2020). Validation of an implicit measure of antagonistic Narcissism. Journal of Research in Personality, 88, 103993. https://doi.org/10.1016/j.jrp.2020.103993

Holtzman, N. S., Tackman, A. M., Carey, A. L., Brucks, M. S., Küfner, A. C. P., Deters, F. G., Back, M. D., Donnellan, M. B., Pennebaker, J. W., Sherman, R. A., & Mehl, M. R. (2019). Linguistic markers of grandiose Narcissism: A LIWC analysis of 15 samples. Journal of Language and Social Psychology, 38(5–6), 773–786. https://doi.org/10.1177/0261927X19871084

Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012

Kurdi, B., Ratliff, K. A., & Cunningham, W. A. (2021). Can the Implicit Association Test serve as a valid measure of automatic cognition? A response to Schimmack (2021). Perspectives on Psychological Science, 16(2), 422–434. https://doi.org/10.1177/1745691620904080

Miller, J. D., Lynam, D. R., & Campbell, W. K. (2014). Measures of Narcissism and their relations to DSM-5 pathological traits: A critical reappraisal. Assessment, 23(1), 3–9. https://doi.org/10.1177/1073191114522909

Miller, J. D., McCain, J., Lynam, D. R., Few, L. R., Gentile, B., MacKillop, J., & Campbell, W. K. (2014). A comparison of the criterion validity of popular measures of Narcissism and narcissistic personality disorder via the use of expert ratings. Psychological Assessment, 26(3), 958–969. https://doi.org/10.1037/a0036613

Schimmack, U. (2019). The Implicit Association Test: A Method in Search of a Construct. Perspectives on Psychological Science, 16(2), 396-414. https://doi.org/10.1177/1745691619863798 (Original work published 2021)

Wright, A. G. C., & Edershile, E. A. (2017). Issues resolved and unresolved in pathological Narcissism. Current Opinion in Psychology, 21, 74–79. https://doi.org/10.1016/j.copsyc.2017.10.001

Zhang, S., Fingerman, K. L., & Birditt, K. S. (2023). Detecting Narcissism from older adults’ daily language use: A machine learning approach. The Journals of Gerontology, Series B: Psychological Sciences and Social Sciences, 78(9), 1493–1500. https://doi.org/10.1093/geronb/gbad061

Appendix A

We are developing an Implicit Association Test (IAT) to measure Narcissism using a triarchic model, which includes three sub-constructs:

• Grandiosity

• Entitlement

• Vulnerability

We need you to generate eight separate word lists to be used as stimuli in the IAT. These should reflect language individuals high or low in Narcissism might implicitly associate with themselves or others.

Please generate the following:

1. 40 words that reflect Grandiosity

2. 40 words that reflect Entitlement

3. 40 words that reflect Vulnerability

4. 40 antonyms or conceptual opposites of Grandiosity

5. 40 antonyms or conceptual opposites of Entitlement

6. 40 antonyms or conceptual opposites of Vulnerability

7. 12 words representing “Self” (e.g., me, mine, myself, I)

8. 12 words representing “Other” (e.g., they, them, others)

Guidelines:

• Use words that would be understandable to a general adult population in the U.S. and appropriate for a psychological IAT.

• Avoid duplicates or overly similar words across lists.

• Where possible, draw inspiration from psychological and psycholinguistic research on Narcissism, including real-world language usage (e.g., social media, speech).

• Prioritize concrete, emotionally evocative, and linguistically simple terms over abstract jargon.

Format the output clearly in 8 labeled lists, each with the list name and the corresponding 40 or 12 words in bullet or comma-separated format.

Here is a full set of carefully curated word lists for your Implicit Association Test (IAT) based on the triarchic model of Narcissism. Each list is designed to reflect clear, emotionally resonant, and commonly understood language, in line with psychological research and real-world language cues.

1. 40 Words Reflecting Grandiosity

1. Superior

2. Dominant

3. Impressive

4. Important

5. Powerful

6. Outstanding

7. Accomplished

8. Glorious

9. Admired

10. Elite

11. Radiant

12. Genius

13. Heroic

14. Talented

15. Unstoppable

16. Respected

17. Gifted

18. Bold

19. Dazzling

20. Champion

21. Supreme

22. Royal

23. Triumphant

24. Majestic

25. Almighty

26. Leader

27. Confident

28. Unmatched

29. Celebrated

30. Visionary

31. Brave

32. Iconic

33. Influential

34. Charismatic

35. Divine

36. Invincible

37. Golden

38. Victorious

39. Renowned

40. Glorious

2. 40 Words Reflecting Entitlement

1. Deserving

2. Owed

3. Privileged

4. Special

5. Exempt

6. First

7. Demanding

8. Rewarded

9. Superior

10. Exclusive

11. VIP

12. Chosen

13. Preferred

14. Exceptional

15. Granted

16. Rightful

17. Authorized

18. Entitled

19. Worthy

20. Best

21. Priority

22. Select

23. Favored

24. Indulgent

25. Taken

26. Reserved

27. Claimed

28. Inherited

29. Permitted

30. Powerful

31. Irreplaceable

32. Unique

33. Specially-treated

34. Earned

35. Obliged

36. Reward-worthy

37. Demanding

38. Exalted

39. Nominated

40. Chosen

3. 40 Words Reflecting Vulnerability

1. Ashamed

2. Insecure

3. Lonely

4. Fragile

5. Hurt

6. Sensitive

7. Powerless

8. Anxious

9. Needy

10. Rejected

11. Fearful

12. Sad

13. Exposed

14. Uncertain

15. Small

16. Nervous

17. Inferior

18. Hopeless

19. Abandoned

20. Miserable

21. Shy

22. Isolated

23. Timid

24. Broken

25. Lost

26. Scared

27. Worthless

28. Embarrassed

29. Weak

30. Tearful

31. Crushed

32. Guilty

33. Defenseless

34. Wounded

35. Unloved

36. Hesitant

37. Invisible

38. Humble

39. Awkward

40. Overwhelmed

4. 40 Antonyms or Opposites of Grandiosity

1. Ordinary

2. Modest

3. Average

4. Small

5. Unimportant

6. Plain

7. Humble

8. Simple

9. Quiet

10. Lowly

11. Normal

12. Forgettable

13. Dull

14. Meek

15. Unnoticed

16. Weak

17. Submissive

18. Shy

19. Timid

20. Unremarkable

21. Minor

22. Uncertain

23. Reserved

24. Cautious

25. Passive

26. Powerless

27. Anxious

28. Nervous

29. Hidden

30. Insecure

31. Anonymous

32. Undervalued

33. Scared

34. Soft-spoken

35. Plainspoken

36. Silent

37. Obscure

38. Low-profile

39. Restrained

40. Invisible

5. 40 Antonyms or Opposites of Entitlement

1. Grateful

2. Humble

3. Accepting

4. Selfless

5. Patient

6. Tolerant

7. Considerate

8. Giving

9. Fair

10. Sharing

11. Equal

12. Content

13. Respectful

14. Unassuming

15. Cooperative

16. Appreciative

17. Modest

18. Yielding

19. Thoughtful

20. Sacrificing

21. Generous

22. Tame

23. Non-demanding

24. Reserved

25. Submissive

26. Obedient

27. Dutiful

28. Obliged

29. Accommodating

30. Kind

31. Honest

32. Calm

33. Understanding

34. Forgiving

35. Gentle

36. Agreeable

37. Balanced

38. Sharing

39. Conceding

40. Receptive

6. 40 Antonyms or Opposites of Vulnerability

1. Confident

2. Secure

3. Strong

4. Fearless

5. Bold

6. Brave

7. Steady

8. Resilient

9. Calm

10. Composed

11. Grounded

12. Empowered

13. Certain

14. Assertive

15. Tough

16. Stoic

17. Independent

18. Capable

19. Tough-minded

20. Dominant

21. Whole

22. Solid

23. Unshaken

24. Proud

25. Assured

26. Competent

27. Self-reliant

28. Fierce

29. Cool-headed

30. Protected

31. Strong-willed

32. Collected

33. Balanced

34. Unafraid

35. Controlled

36. Stable

37. Courageous

38. Focused

39. Fearless

40. Resolved

7. 12 Words Representing “Self”

1. Me

2. Myself

3. I

4. Mine

5. My

6. Self

7. Owner

8. Ego

9. Body

10. Identity

11. Face

12. Soul

8. 12 Words Representing “Other”

1. They

2. Them

3. Theirs

4. Others

5. Stranger

6. Outsider

7. You

8. Your

9. He

10. She

11. Their

12. Group

Let me know if you’d like these lists formatted for input into your IAT software (like Inquisit or jsPsych), or if you’d like suggestions for pilot testing or validation.

Ryan Bohman

Mental Health Counseling apprentice, amateur philosopher and recovering tech bro and entrepreneur.

https://www.gnosis.health
Previous
Previous

Books I am Reading

Next
Next

Mindful University Podcast