View in EDS HTML Full Text PDF Full Text

AI-Generated Images of Familiar Faces Are Indistinguishable from Real Photographs

Saved in:

Bibliographic Details
Title:	AI-Generated Images of Familiar Faces Are Indistinguishable from Real Photographs
Language:	English
Authors:	Robin S. S. Kramer (ORCID 0000-0001-8339-8832), Alex L. Jones, Daniel Fitousi, Jeremy J. Tree
Source:	Cognitive Research: Principles and Implications. 2025 10.
Availability:	Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/
Peer Reviewed:	Y
Page Count:	16
Publication Date:	2025
Document Type:	Journal Articles Reports - Research
Descriptors:	Artificial Intelligence, Human Body, Photography, Adults, Foreign Countries, Pictorial Stimuli, Identification, Familiarity, Accuracy
Geographic Terms:	United States, Canada, United Kingdom, Australia, New Zealand
DOI:	10.1186/s41235-025-00683-w
ISSN:	2365-7464
Abstract:	Human users are now able to generate synthetic face images with artificial intelligence (AI) tools. Although indistinguishable from real photographs, these images have tended to feature fictional identities that do not exist in the real world. As a result, their use in applied contexts, including the spread of fake information, is similarly limited. Here, we investigated a new method for generating face images (via ChatGPT plus DALL-E) and its application to both fictional and real (in this case, celebrity) identities. Our results demonstrated that generated images of both fictional (Experiment 1) and celebrity identities (Experiment 2) could not be distinguished from real photographs. Further, providing additional real photographs for comparison during the task resulted in limited gains (Experiments 3 and 4). Finally, prior familiarity with celebrity faces produced only modest performance improvements. Therefore, new methods of detection should be explored as a matter of urgency since the latest 'off the shelf' AI tools can now generate face images of real people that are essentially undetectable as synthetic to most human observers.
Abstractor:	As Provided
Notes:	https://osf.io/fmuh5
Entry Date:	2026
Accession Number:	EJ1491516
Database:	ERIC
Full text is not displayed to guests. Login for full access.

FullText	Links: – Type: pdflink Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHj0k_4E0hTGH8RJwT4gCJyBsGNe_WN95AvKlDbXJGqwxwGxHSp4lWlHjE4zTvR8i9FlAAAA4zCB4AYJKoZIhvcNAQcGoIHSMIHPAgEAMIHJBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDGzM470tqOjxx5PJkgIBEICBm4cVfADMYGSYEnmNMrrgURHLY7I1dUGvOP70lB22bRHTsvA80WOGQSa-zsFqq1Hx7spDjU1oW__tfLyrlfWId8HaSypjwBSK05WyoGMIoONUjbnrNNNH3cEAkLTk-Sa9vjBjxahfTy8pBjFEVXrBbAex3d3qP0Is9tB79BmOEdJOHmLx4RSqFlsL4Ma01U_WdTFQJlmt7Knl0QfS Text: Availability: 1 Value: <anid>AN0188649318;[k1e6]14oct.25;2025Oct16.05:21;v2.2.500</anid> <title id="AN0188649318-1">AI-generated images of familiar faces are indistinguishable from real photographs </title> <p>Human users are now able to generate synthetic face images with artificial intelligence (AI) tools. Although indistinguishable from real photographs, these images have tended to feature fictional identities that do not exist in the real world. As a result, their use in applied contexts, including the spread of fake information, is similarly limited. Here, we investigated a new method for generating face images (via ChatGPT plus DALL·E) and its application to both fictional and real (in this case, celebrity) identities. Our results demonstrated that generated images of both fictional (Experiment 1) and celebrity identities (Experiment 2) could not be distinguished from real photographs. Further, providing additional real photographs for comparison during the task resulted in limited gains (Experiments 3 and 4). Finally, prior familiarity with celebrity faces produced only modest performance improvements. Therefore, new methods of detection should be explored as a matter of urgency since the latest 'off the shelf' AI tools can now generate face images of real people that are essentially undetectable as synthetic to most human observers.</p> <p>Keywords: Face perception; Deepfakes; ChatGPT; Artificial intelligence; Information and Computing Sciences Artificial Intelligence and Image Processing</p> <p>Supplementary Information The online version contains supplementary material available at https://doi.org/10.1186/s41235-025-00683-w.</p> <hd id="AN0188649318-2">Introduction</hd> <p>Early attempts to create synthetic human faces in domains including robotics and entertainment tended to lack realism, often falling into the "uncanny valley" (Mori, 1970/[<reflink idref="bib37" id="ref1">37</reflink>]) and eliciting unease or repulsion in observers (for reviews, see Kätsyri et al., [<reflink idref="bib23" id="ref2">23</reflink>]; Wang et al., [<reflink idref="bib50" id="ref3">50</reflink>]). More recently, advances in artificial intelligence (AI) have resulted in techniques that are now capable of traversing this valley successfully. For instance, images synthesised using generative adversarial networks (GANs) like StyleGAN2 (Karras et al., [<reflink idref="bib22" id="ref4">22</reflink>], [<reflink idref="bib21" id="ref5">21</reflink>]) are indistinguishable from, and often more realistic than, real face photographs (e.g. Bray et al., [<reflink idref="bib3" id="ref6">3</reflink>]; Kramer &amp; Cartledge, [<reflink idref="bib28" id="ref7">28</reflink>]; Lago et al., [<reflink idref="bib32" id="ref8">32</reflink>]; Miller et al., [<reflink idref="bib36" id="ref9">36</reflink>]; Nightingale &amp; Farid, [<reflink idref="bib39" id="ref10">39</reflink>]; Tucciarelli et al., [<reflink idref="bib49" id="ref11">49</reflink>]). Problematically, such images are now being used in fake social media profiles (e.g. Ricker et al., [<reflink idref="bib44" id="ref12">44</reflink>]) with the possibility of malicious intent (e.g. to influence public opinion or political behaviour).</p> <p>To date, generating synthetic face images has been limited to new identities (i.e. 'people' who do not exist). Acceptance as genuine requires 'only' that these faces achieve anatomical realism and photographic quality. Beyond this, viewers have no expectations regarding how each face should look. In contrast, producing synthetic images of familiar faces represents a more challenging task by incorporating a further requirement – the image must fall within the boundaries of plausibility (based on prior knowledge; Burton et al., [<reflink idref="bib7" id="ref13">7</reflink>]) for that specific person. Previous research has shown that viewers are sensitive to even small alterations to familiar face images (e.g. Brédart &amp; Devue, [<reflink idref="bib4" id="ref14">4</reflink>]; Brooks &amp; Kemp, [<reflink idref="bib5" id="ref15">5</reflink>]; Diel &amp; Lewis, [<reflink idref="bib10" id="ref16">10</reflink>]; O'Donnell &amp; Bruce, [<reflink idref="bib40" id="ref17">40</reflink>]; Sandford &amp; Bindemann, [<reflink idref="bib48" id="ref18">48</reflink>]). Indeed, familiarity is often tested by pairing a known face with a visually similar foil (e.g. Pozo et al., [<reflink idref="bib42" id="ref19">42</reflink>]; Robertson et al., [<reflink idref="bib46" id="ref20">46</reflink>]). Therefore, successful synthesis of familiar faces represents a particularly difficult task but also one with virtually limitless applications.</p> <p>Synthesising realistic/believable images of familiar faces may be more challenging than the generation of unfamiliar ones because we are widely acknowledged to be experts with the former (Young &amp; Burton, [<reflink idref="bib53" id="ref21">53</reflink>]). This expertise stems from stable representations tuned to specific identities (Burton et al., [<reflink idref="bib7" id="ref22">7</reflink>]) that, as a result, support recognition in even poor viewing conditions (Burton et al., [<reflink idref="bib6" id="ref23">6</reflink>]). However, it remains unclear as to whether this same expertise aids or impedes judgments of authenticity. If familiarity improves sensitivity to the information that defines a person's unique appearance, it might help observers in detecting synthetic versions of that individual. Conversely, if familiarity fosters tolerance for natural variation (Brédart &amp; Devue, [<reflink idref="bib4" id="ref24">4</reflink>]; Ge et al., [<reflink idref="bib15" id="ref25">15</reflink>]) then AI-synthesised images that match the identity's appearance sufficiently may be accepted as genuine. Therefore, examining how familiarity might influence the detection of synthetic images represents an important test for established mechanisms of face perception when applied to this newly created class of AI-generated stimuli.</p> <p>The distinction between synthesising unfamiliar versus familiar faces could also have practical implications. While synthesising new (fictional) faces may add credibility to fake online profiles (e.g. Park &amp; Nicolau, [<reflink idref="bib41" id="ref26">41</reflink>]; Xu, [<reflink idref="bib52" id="ref27">52</reflink>]), the production of synthetic familiar faces would, for example, allow creators to manipulate celebrity endorsements, which can play a role in both marketing and political domains (see Knoll &amp; Matthes, [<reflink idref="bib24" id="ref28">24</reflink>]). As a result, every image featuring a recognisable identity would require the viewer to question its authenticity. Such images fall within the broader category of 'deepfakes' (for a review, see Masood et al., [<reflink idref="bib35" id="ref29">35</reflink>]), which are becoming increasingly difficult to detect (Groh et al., [<reflink idref="bib18" id="ref30">18</reflink>]) and therefore pose a substantial challenge as technologies continue to advance.</p> <p>OpenAI's ChatGPT is a virtual assistant that has been adopted globally (around 250 million weekly active users; Jamali, [<reflink idref="bib20" id="ref31">20</reflink>]). While its capabilities are multifaceted, one recent addition allows for the generation of images from text. A description of the desired image (prompt) is entered into ChatGPT by the user and, behind the scenes, the chatbot processes this description and passes it to DALL·E (a diffusion model) for image generation. By training DALL·E on vast amounts of image–text pairs from the internet, the model learns patterns in these data (e.g. what makes a dog look like a dog) and is then able to generate new images given any description. Although the details of these training data have not been publicly disclosed, face images will likely have been well represented and so there is potential for synthetic face generation. Further, many online face images depict famous identities, meaning that DALL·E may have incorporated sufficient identity-specific information to allow for the generation of realistic images portraying these individuals.</p> <p>To an extent, the realism of a generated image is limited only by the user's ability to 'engineer' a high-quality prompt. Since people are relatively poor when tasked with providing detailed face descriptions (e.g. Kramer &amp; Gous, [<reflink idref="bib29" id="ref32">29</reflink>]), ChatGPT's vision model (GPT-4 V) can be exploited to assist in the process. An initial face image can be uploaded as input, which is then analysed by ChatGPT and provides a starting point for the generation of new images (e.g. by allowing the user to then refer to the original image's style, background, or even facial appearance). Recent research has shown that GPT-4 V can perceive identity, emotion, and social traits from face images (e.g. Elyoseph et al., [<reflink idref="bib13" id="ref33">13</reflink>]; Kramer, [<reflink idref="bib25" id="ref34">25</reflink>], [<reflink idref="bib26" id="ref35">26</reflink>], [<reflink idref="bib31" id="ref36">31</reflink>], [<reflink idref="bib27" id="ref37">27</reflink>]), and so this approach may facilitate the production of realistic face images.</p> <hd id="AN0188649318-3">The current research</hd> <p>Here, we aimed to provide the first investigation of ChatGPT's ability to generate realistic face images. Since recent work has shown that specialist algorithms like StyleGAN2 (Karras et al., [<reflink idref="bib21" id="ref38">21</reflink>]) are already capable of producing novel, fictional faces (e.g. Nightingale &amp; Farid, [<reflink idref="bib39" id="ref39">39</reflink>]), we initially tasked ChatGPT (plus DALL·E) with this same goal. Crucially, GANs can only generate synthetic faces in the style of those images it was trained on, with the user typically unable to specify the desired characteristics of the output (e.g. the gender or ethnicity of the face, the background, etc.). In contrast, ChatGPT allows for complete control over image and face specifications, meaning that we could match synthetic images to real photographs and account for image properties that might influence judgements of realism (e.g. facial expression or background colour).</p> <p>Moving beyond current demonstrations of novel face synthesis, we also aimed to explore the generation of familiar (celebrity) face images. As discussed earlier, this represents a significant paradigm shift in terms of potential applications but also brings with it substantial requirements. Synthetic images need to be believable both as face photographs <emph>and</emph> as instances of specific faces with which viewers have prior knowledge. If such images were indistinguishable from real photographs, this would bring into question the veracity of all online content moving forward while highlighting the need for society to develop solutions as a matter of urgency.</p> <hd id="AN0188649318-4">Experiment 1</hd> <p>Previous studies using GANs have shown that images of fictional identities can be generated that are indistinguishable from real photos (e.g. Nightingale &amp; Farid, [<reflink idref="bib39" id="ref40">39</reflink>]). However, synthesising faces in this way did not allow for image characteristics to be closely matched across synthetic and real faces. Here, we investigated a new method for generating synthetic images (i.e. using ChatGPT) that facilitated this matching. Again, we sought to determine whether our synthetic faces could be detected by viewers.</p> <hd id="AN0188649318-5">Method</hd> <p></p> <hd id="AN0188649318-6">Participants</hd> <p>The sample sizes for our experiments were initially set as a compromise between available resources (i.e. participant payment) and our estimates of what was required to measure sufficiently precise effects with these experimental paradigms. In addition, we aimed to recruit samples comparable in size with previous work using a similar experimental design (see Experiment 1 of Miller et al., [<reflink idref="bib36" id="ref41">36</reflink>]).</p> <p>Since we planned to use Bayesian methods for our central analyses, we anticipated increasing the sample size where estimates of theoretically important predictors were imprecise/ambiguous. It is worth noting that Bayesian methods do not suffer from many of the issues affecting frequentist analyses regarding optional stopping if the aim is simply sufficient precision rather than hypothesis confirmation (Rouder, [<reflink idref="bib47" id="ref42">47</reflink>]). In the end, our initial samples provided sufficiently unambiguous evidence and we did not choose to collect additional data.</p> <p>We have also chosen to report summary participant performance (e.g. proportion correct, sensitivity, response bias) to allow readers to make broad comparisons with previous work in this field. To compare these values to a constant (e.g. chance performance), a one-sample <emph>t</emph>-test (two-tailed, α = 0.05, power = 95%) requires at least 54 participants to detect medium-sized effects (GPower 3.1 software; Faul et al., [<reflink idref="bib14" id="ref43">14</reflink>]). In all experiments, our sample sizes exceeded this minimum requirement.</p> <p>For this experiment, a sample of 110 participants (58 women, 51 men, 1 preferred another term; age <emph>M</emph> = 40.9 years, <emph>SD</emph> = 12.9 years; 76% self-reported ethnicity as White) were recruited through the Prolific online platform, where eligibility was restricted to those with an approval rate of 95% or above on the site. In addition, participation was limited to residents of the U.S., Canada, the U.K., Australia, or New Zealand. Participants' data were excluded if they did not complete all trials; used a mobile phone (to avoid images appearing very small); responded incorrectly to at least one of the attention check trials; or gave the same response to all experimental trials (see details in the Supplemental Methods and Table S1 in the online supplementary materials). All participants in this research gave informed, onscreen consent before taking part and were provided with an onscreen debriefing upon completion. All experiments received ethical approval from the University of Lincoln's research ethics committee (ref. 21014) and were carried out in accordance with the provisions of the World Medical Association Declaration of Helsinki. There was no overlap between participant samples across our four experiments.</p> <hd id="AN0188649318-7">Stimuli</hd> <p>Our real face photographs comprised a subset of the images used in previous research (Miller et al., [<reflink idref="bib36" id="ref44">36</reflink>]; Nightingale &amp; Farid, [<reflink idref="bib39" id="ref45">39</reflink>]), originally taken from the Flickr-Faces-HQ Dataset (Karras et al., [<reflink idref="bib21" id="ref46">21</reflink>]). Nightingale and Farid ([<reflink idref="bib39" id="ref47">39</reflink>]) divided their real images into 50 subsets, with each of these comprising eight images (with men and women, as well as Black, White, East Asian, and South Asian ethnicities, equally represented). We randomly selected 12 of these subsets (totalling 96 images) for use as our real photos.</p> <p>To generate 'matched' synthetic images for these real photographs, we provided ChatGPT (model GPT-4o) with each photo, along with a crafted prompt. In brief, this asked ChatGPT to generate a new, fictional person while replicating both image (e.g. background, lighting) and person characteristics (e.g. gender, ethnicity). For the full prompt, see the Supplemental Methods in the online supplementary materials. Note that ChatGPT refused to produce images depicting children (who appeared in a small number of the real photos) and so we specified adult faces for our new images. Importantly, ChatGPT did not sample or reuse any part of the real photographs it was shown. In general terms, each photo served as a visual prompt (much like a detailed text prompt), which was then analysed and used to guide the generation of an entirely new image.</p> <p>Synthetic images generated by ChatGPT were high quality, while the set of real photos showed some variation (e.g. in terms of lighting, blur, etc.). As such, our final step was to manually alter each synthetic image so that it more closely approximated its matched real photograph. The aim was to broadly equate the two matched images to rule out extraneous image characteristics when comparing how they were perceived, and that these adjustments would not have otherwise been required had the goal been to simply generate realistic images. (For example stimuli pre- vs. post-adjustment, see Fig. S1 in the online supplementary materials.) Using GIMP image editor (<ulink href="http://www.gimp.org">www.gimp.org</ulink>), only general image settings (temperature, saturation, brightness, contrast, blur) were adjusted. Crucially, no changes were made to the synthetic images beyond these overall adjustments, and no changes were made to the real photographs. See Fig. 1 for example matched image pairs. Finally, all images were resized to 500 × 500 pixels, resulting in an experimental viewing size of 8.5 × 8.5 cm on a 24″ (1920 × 1080 pixel) display, for example.</p> <p>Graph: Fig. 1 Example stimuli depicting matched image pairs from Experiment 1. Images are real (top row) and synthetic (bottom row). Real images were taken from the Flickr-Faces-HQ Dataset (Karras et al., [<reflink idref="bib21" id="ref48">21</reflink>]) and made available online by Nightingale and Farid ([<reflink idref="bib39" id="ref49">39</reflink>])</p> <p>To avoid individual participants viewing both images in a matched pair, our stimuli were divided into two sets. First, the 12 real face subsets (described above) were randomly split into two sets of six (while maintaining the eight image subsets themselves). Each matched synthetic image was then allocated to the opposite set to its real counterpart. In other words, Set A contained real photos of IDs 1–48 and the synthetic matched images for IDs 49–96, while Set B contained the inverse (synthetic for 1–48, real for 49–96).</p> <hd id="AN0188649318-8">Procedure</hd> <p>The experiment was completed using the Gorilla online testing platform (Anwyl-Irvine et al., [<reflink idref="bib2" id="ref50">2</reflink>]). After consent was obtained, participants provided demographic information (age, gender, ethnicity), as well as the type of device they were using. Next, participants were informed that they would be shown around 100 face images, and that they would be asked to decide whether each image was a real photograph or a completely new image generated by a computer. They were also told that a few obviously computer-generated images had been included to check they were paying attention.</p> <p>During the task, each image was presented on screen individually below the question "is this a real photo or a computer-generated image?", with these remaining until a response was given. The two response buttons, appearing below the image, were labelled 'real photo' and 'computer-generated'. Responses were self-paced, and no feedback was given. Assignment to viewing either Set A (96 images) or Set B (96 images) was counterbalanced across participants, while the viewing order of the images was randomised for each participant. In addition, four attention check trials were included, where obvious distortions were present (see the Supplemental Methods and Fig. S2 in the online supplementary materials), and these were incorporated into the randomised order of each viewing sequence. Participants were also presented with a 'halfway point' screen after completing the first half of the task, informing them of their progress and providing an optional break before continuing.</p> <hd id="AN0188649318-9">Analytic strategy</hd> <p>We first investigated accuracy (proportion correct) on the task, along with signal detection measures (sensitivity and response bias). Since Nightingale and Farid ([<reflink idref="bib39" id="ref51">39</reflink>]) provided image-level performance for the real face photos used here, we also considered whether our participants' accuracies were associated with theirs.</p> <p>Next, to more fully interpret the data, we applied model-based Bayesian inference to the disaggregated trial-level data using a hierarchical logistic regression model. Accuracy on each trial (1 = correct, 0 = incorrect) was predicted from two trial type fixed effects (synthetic vs real), plus a fixed effect of experiment condition (whether the participant viewed Set A or Set B) as a covariate. We also included a group-specific (random effect) structure to capture sources of variability across participants and identities, estimating the offset each participant and identity had in both the synthetic and real trial types.</p> <p>The model was estimated with weakly informative priors on all model parameters (Gelman et al., [<reflink idref="bib16" id="ref52">16</reflink>]). A Bernoulli likelihood was used, allowing us to estimate the probability of a trial being correct, and is suitable given the binary outcome we observed. For the model coefficients (both trial types and condition), we used a Gaussian distribution with a mean of 0 and a standard deviation of 5. An LKJ Cholesky (Lewandowski et al., [<reflink idref="bib33" id="ref53">33</reflink>]) prior was used for the covariance matrix of the identity and participant group-specific effects – as such, we were also able to estimate the correlation between these effects, capturing, for example, whether an identity with particularly poor accuracy in the synthetic trials had similarly low accuracy in the real trials. This prior had an eta parameter equal to 2, with a half-Gaussian distribution with a standard deviation of 3 for the standard deviations of the group-specific effects. The model was estimated using PyMC (Abril-Pla et al., [<reflink idref="bib1" id="ref54">1</reflink>]) in the Python programming language. Four Markov Monte Carlo chains were run, with 1,000 tuning steps and 4,000 samples drawn from the posterior. Our model structure was:</p> <p> <ephtml> &lt;math display="block" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mtext&gt;logit&lt;/mtext&gt;&lt;mfenced close=")" open="("&gt;&lt;mtext&gt;Pr&lt;/mtext&gt;&lt;mfenced close=")" open="("&gt;&lt;msub&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;mrow&gt;&lt;mi mathvariant="italic"&gt;ijk&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mfenced&gt;&lt;/mfenced&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;msubsup&gt;&lt;mi mathvariant="normal"&gt;&amp;#946;&lt;/mi&gt;&lt;mrow&gt;&lt;mi mathvariant="italic"&gt;ij&lt;/mi&gt;&lt;/mrow&gt;&lt;mtext&gt;Synthetic&lt;/mtext&gt;&lt;/msubsup&gt;&lt;mo&gt;&amp;#183;&lt;/mo&gt;&lt;msubsup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/mrow&gt;&lt;mtext&gt;Synthetic&lt;/mtext&gt;&lt;/msubsup&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;msubsup&gt;&lt;mi mathvariant="normal"&gt;&amp;#946;&lt;/mi&gt;&lt;mrow&gt;&lt;mi mathvariant="italic"&gt;ij&lt;/mi&gt;&lt;/mrow&gt;&lt;mtext&gt;Real&lt;/mtext&gt;&lt;/msubsup&gt;&lt;mo&gt;&amp;#183;&lt;/mo&gt;&lt;msubsup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/mrow&gt;&lt;mtext&gt;Real&lt;/mtext&gt;&lt;/msubsup&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;msub&gt;&lt;mi mathvariant="normal"&gt;&amp;#946;&lt;/mi&gt;&lt;mtext&gt;Condition&lt;/mtext&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#183;&lt;/mo&gt;&lt;msubsup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/mrow&gt;&lt;mtext&gt;Cond&lt;/mtext&gt;&lt;/msubsup&gt;&lt;/mrow&gt;&lt;/math&gt; </ephtml> </p> <p>Graph</p> <p>where <ephtml> &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;msub&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;mrow&gt;&lt;mi mathvariant="italic"&gt;ijk&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;/math&gt; </ephtml> is the accuracy on trial <emph>k</emph>, for identity <emph>i</emph> and participant <emph>j</emph>. The <emph>x</emph> variables are dummy-coded predictors, and the logit-transform converts the log-odds linear combination into probabilities.</p> <p>To make inferences about our various hypotheses, we used the posterior probability of effects being in specific directions, calculated simply as the proportion of an effect being above or below zero, given the observed data and the model (Makowski et al., [<reflink idref="bib34" id="ref55">34</reflink>]). This was similar in intention to classical null-hypothesis significance testing but provided the probability that the hypothesis was different from zero given the data, and not the converse (Welsch et al., [<reflink idref="bib51" id="ref56">51</reflink>]). As logistic models have coefficients on the log-odds scale, we converted estimates to odds or probabilities to give clearer interpretation. We also estimated 94% highest-density intervals (HDIs) of all posterior estimates, which showed the credible range of effects given the observed data and model.</p> <hd id="AN0188649318-10">Results</hd> <p>Across all participants, accuracy (proportion correct) on the task (<emph>M</emph> = 0.43) was below-chance performance (of 0.50), <emph>t</emph>(<reflink idref="bib109" id="ref57">109</reflink>) = 4.76, <emph>p</emph> &lt; 0.001, <emph>d</emph> = 0.45, 95% CI [0.26, 0.65] (see Fig. 2). Sensitivity (<emph>d'</emph>; <emph>M</emph> = − 0.50) was also below-chance performance (of 0), <emph>t</emph>(<reflink idref="bib109" id="ref58">109</reflink>) = 5.35, <emph>p</emph> &lt; 0.001, <emph>d</emph> = 0.51, 95% CI [0.31, 0.71]. Finally, we found a positive response bias (criterion, <emph>c</emph>; <emph>M</emph> = 0.51), <emph>t</emph>(<reflink idref="bib109" id="ref59">109</reflink>) = 8.96, <emph>p</emph> &lt; 0.001, <emph>d</emph> = 0.85, 95% CI [0.63, 1.07], indicating that participants were biased to respond with 'real photo' during the task.</p> <p>Graph: Fig. 2 Variability in task accuracy for Experiment 1</p> <p>Since Nightingale and Farid ([<reflink idref="bib39" id="ref60">39</reflink>]) provided image-level performance for the real face photos used here, we considered whether our participants' accuracies were associated with theirs. To this end, we calculated the proportion correct for each of the 96 real photos using their dataset (available online) and, separately, our own. Across these images, we found a strong association between performances derived from the two participant samples, <emph>r</emph>(<reflink idref="bib94" id="ref61">94</reflink>) = 0.66, 95% CI = [0.53, 0.76], <emph>p</emph> &lt; 0.001. Further, a comparison of the two sets of image-level accuracies showed that our sample (<emph>M</emph> = 0.58) produced significantly higher values than theirs (<emph>M</emph> = 0.51), <emph>t</emph>(<reflink idref="bib95" id="ref62">95</reflink>) = 4.96, <emph>p</emph> &lt; 0.001, <emph>d</emph> = 0.51, 95% CI = [0.29, 0.72]. Taken together, these results provide evidence that our participants, while performing poorly on the task, were not simply responding at random.</p> <hd id="AN0188649318-11">Can observers distinguish between real and synthetic images?</hd> <p>Marginalising over all predictors (trial type and condition), the baseline performance on the task was <emph>M</emph> = 0.41, 94% CrI [0.37, 0.44], <emph>p</emph>(H &lt; 0.5) = 100%, i.e. below chance (0.5). Next, we inspected the posterior estimates of the trial type coefficients from the model, representing the average baseline accuracy in each trial type, and considered the posterior probability that these accuracies were above or below chance. The model-estimated probability of identifying a real image as real was <emph>M</emph> = 0.62, 94% CrI [0.55, 0.68], <emph>p</emph>(H &gt; 0.5) = 99.9%, and for identifying synthetic as synthetic, <emph>M</emph> = 0.20, 94% CrI [0.15, 0.25], <emph>p</emph>(H &lt; 0.5) = 100%. As such, across participants and identities, accuracy for real images was above chance, while for synthetic images, it was clearly below. That participants mistook these AI-generated novel faces for real photos indicated they were clearly plausible.</p> <hd id="AN0188649318-12">Experiment 2</hd> <p>Previous studies found that computer-generated face images were indistinguishable from real photographs (e.g. Miller et al., [<reflink idref="bib36" id="ref63">36</reflink>]; Nightingale &amp; Farid, [<reflink idref="bib39" id="ref64">39</reflink>]). However, their method of generation (i.e. using StyleGAN2) did not allow for the specification of image or face characteristics. In our first experiment, we introduced a new way to generate face images (i.e. using ChatGPT) that provided this level of control while successfully producing images that participants could not detect as computer-generated.</p> <p>In Experiment 2, we used this same approach to image generation but took an additional step. For the first time, we investigated the potential for synthesising images of familiar (famous) faces. In other words, can ChatGPT be used to generate images that are believable both as face photographs <emph>and</emph> as instances of specific faces with which viewers have prior knowledge? If the chatbot demonstrates this capability then such a result has far-reaching implications.</p> <hd id="AN0188649318-13">Method</hd> <p></p> <hd id="AN0188649318-14">Participants</hd> <p>A sample of 115 participants (59 women, 55 men, 1 preferred another term; age <emph>M</emph> = 44.4 years, <emph>SD</emph> = 13.8 years; 71% self-reported ethnicity as White) were recruited online. All eligibility and exclusion criteria were identical to Experiment 1. (For details of exclusions, see the Supplemental Methods and Table S1 in the online supplementary materials.)</p> <hd id="AN0188649318-15">Stimuli</hd> <p>We collected photographs of 100 (50 men, 50 women; varied ethnicities) internationally famous celebrities (predominantly Hollywood actors). For each identity, we downloaded a large, high-quality photograph using Google Images searches, with each image depicting the individual facing roughly front-on and with their face free from occlusions. To generate 'matched' synthetic images for these real photographs, we followed the same process as in Experiment 1. However, the prompt we used here asked ChatGPT to generate a new image of the person depicted in the original photograph, changing the pose but replicating all other details about the image (e.g. the background). For the full prompt, see the Supplemental Methods in the online supplementary materials. Note that ChatGPT refused to generate images of named celebrities when asked, and so we did not identify these individuals during the process. Even so, it was clear that ChatGPT recognised the identities in that, on occasion, it would refuse to generate a new image of specific individuals (e.g. Tom Hanks), in which case we chose other identities as replacements. As in Experiment 1, each photo served as a visual prompt, which was then analysed and used to guide the generation of an entirely new image based on the original one.</p> <p>Again, we manually altered the general image properties for each synthetic image so that it more closely approximated its matched real photograph. Next, each matched pair of images was similarly cropped to contain only the head and neck, and in some cases, the top of the shoulders. (For example stimuli pre- vs. post-adjustment, see Fig. S1 in the online supplementary materials.) See Fig. 3 for example matched image pairs. Finally, all images were resized to 500 × 500 pixels, with the same experimental viewing size as in Experiment 1.</p> <p>Graph: Fig. 3 Example stimuli depicting matched image pairs from Experiments 2–4. Images are real (top row) and synthetic (bottom row). Image attributions (top row left to right): Jay Dixit (cropped); Red Carpet Report on Mingle Media TV (cropped); Dominick D (cropped); Toglenn (cropped). Photographs are from Wikimedia Commons (2025) (https://commons.wikimedia.org/)</p> <p>To avoid individual participants viewing both images in a matched pair, our stimuli were again divided into two sets. First, the 100 real face photographs were randomly split into two sets of 50 with the proviso that each set contained an equal number of men and women. Each matched synthetic image was then allocated to the opposite set to its real counterpart. In other words, Set A contained real photos of IDs 1–50 and the synthetic matched images for IDs 51–100, while Set B contained the inverse (synthetic for 1–50, real for 51–100).</p> <hd id="AN0188649318-16">Procedure</hd> <p>The general procedure was identical to Experiment 1, with the following caveats. First, each trial began with a fixation cross, displayed for 500 ms. This served to provide participants with a clear separation between trials since these now involved two judgements (decision + rating). Second, the onscreen instruction appearing above each image incorporated the name of the identity depicted. For example, "is this a real photo or a computer-generated image of Paul Rudd?" Third, following the participant's binary response ('real photo' or 'computer-generated'), the image was removed and a new instruction asked "before this experiment, how familiar were you with the facial appearance of Paul Rudd?" (or whichever identity appeared in the image on the previous screen). Participants responded using a 7-point scale with labelled anchors (1 = extremely unfamiliar to 7 = extremely familiar).</p> <p>As in Experiment 1, assignment to viewing either Set A (100 images) or Set B (100 images) was counterbalanced across participants, while the viewing order of the images was randomised for each participant. Again, four attention check trials were included, where obvious distortions were present (see the Supplemental Methods and Fig. S3 in the online supplementary materials), and these were incorporated into the randomised order of each viewing sequence.</p> <hd id="AN0188649318-17">Analytic strategy</hd> <p>As in Experiment 1, we first investigated accuracy (proportion correct) on the task, along with signal detection measures (sensitivity and response bias). Next, and also mirroring Experiment 1, we relied on model-based Bayesian inference to interpret the data, again with a hierarchical logistic regression model, predicting accuracy on each trial. The model here was expanded from the two trial type fixed effects (synthetic vs real) and condition covariate to include a familiarity predictor (in natural rating scale units, i.e. 1, 2 ... 7) and its interaction with trial type. This allowed us to investigate the influence of familiarity on accuracy for both real and synthetic images. The group-specific (random effect) structure was identical to Experiment 1, including offsets for participant and identities around the two trial types.</p> <p>The prior distribution and sampling settings were identical to Experiment 1. Our model structure was:</p> <p> <ephtml> &lt;math display="block" xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mrow&gt;&lt;mtext&gt;logit&lt;/mtext&gt;&lt;mfenced close=")" open="("&gt;&lt;mtext&gt;Pr&lt;/mtext&gt;&lt;mfenced close=")" open="("&gt;&lt;msub&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;mrow&gt;&lt;mi mathvariant="italic"&gt;ijk&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/mfenced&gt;&lt;/mfenced&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;msubsup&gt;&lt;mi&gt;&amp;#946;&lt;/mi&gt;&lt;mrow&gt;&lt;mi mathvariant="italic"&gt;ij&lt;/mi&gt;&lt;/mrow&gt;&lt;mtext&gt;Synthetic&lt;/mtext&gt;&lt;/msubsup&gt;&lt;mo&gt;&amp;#183;&lt;/mo&gt;&lt;msubsup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/mrow&gt;&lt;mtext&gt;Synthetic&lt;/mtext&gt;&lt;/msubsup&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;msubsup&gt;&lt;mi&gt;&amp;#946;&lt;/mi&gt;&lt;mrow&gt;&lt;mi mathvariant="italic"&gt;ij&lt;/mi&gt;&lt;/mrow&gt;&lt;mtext&gt;Real&lt;/mtext&gt;&lt;/msubsup&gt;&lt;mo&gt;&amp;#183;&lt;/mo&gt;&lt;msubsup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/mrow&gt;&lt;mtext&gt;Real&lt;/mtext&gt;&lt;/msubsup&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;msup&gt;&lt;mrow&gt;&lt;mi&gt;&amp;#947;&lt;/mi&gt;&lt;/mrow&gt;&lt;mtext&gt;Synthetic&lt;/mtext&gt;&lt;/msup&gt;&lt;mo&gt;&amp;#183;&lt;/mo&gt;&lt;msubsup&gt;&lt;mtext&gt;Familiarity&lt;/mtext&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/mrow&gt;&lt;mtext&gt;Synthetic&lt;/mtext&gt;&lt;/msubsup&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;msup&gt;&lt;mrow&gt;&lt;mi&gt;&amp;#947;&lt;/mi&gt;&lt;/mrow&gt;&lt;mtext&gt;Real&lt;/mtext&gt;&lt;/msup&gt;&lt;mo&gt;&amp;#183;&lt;/mo&gt;&lt;msubsup&gt;&lt;mtext&gt;Familiarity&lt;/mtext&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/mrow&gt;&lt;mtext&gt;Real&lt;/mtext&gt;&lt;/msubsup&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;msub&gt;&lt;mi mathvariant="normal"&gt;&amp;#946;&lt;/mi&gt;&lt;mtext&gt;Condition&lt;/mtext&gt;&lt;/msub&gt;&lt;mo&gt;&amp;#183;&lt;/mo&gt;&lt;msubsup&gt;&lt;mi&gt;x&lt;/mi&gt;&lt;mrow&gt;&lt;mi&gt;i&lt;/mi&gt;&lt;/mrow&gt;&lt;mtext&gt;Cond&lt;/mtext&gt;&lt;/msubsup&gt;&lt;/mrow&gt;&lt;/math&gt; </ephtml> </p> <p>Graph</p> <p>where <ephtml> &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;msub&gt;&lt;mi&gt;Y&lt;/mi&gt;&lt;mrow&gt;&lt;mi mathvariant="italic"&gt;ijk&lt;/mi&gt;&lt;/mrow&gt;&lt;/msub&gt;&lt;/math&gt; </ephtml> is the accuracy on trial <emph>k</emph>, for identity <emph>i</emph> and participant <emph>j</emph>. The <emph>x</emph> variables are dummy-coded predictors representing trial types, and the <ephtml> &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt;&lt;mi&gt;&amp;#947;&lt;/mi&gt;&lt;/math&gt; </ephtml> variables are familiarity ratings under each of the trial types, allowing the model to directly estimate the influence of familiarity on both trial types separately.</p> <hd id="AN0188649318-18">Results</hd> <p>First, we calculated simple performance (ignoring possible familiarity effects). Across all participants, accuracy (proportion correct) on the task (<emph>M</emph> = 0.52) was no different from chance performance (of 0.50), <emph>t</emph>(<reflink idref="bib114" id="ref65">114</reflink>) = 1.74, <emph>p</emph> = 0.084, <emph>d</emph> = 0.16, 95% CI [− 0.02, 0.35] (see Fig. 4). Sensitivity (<emph>d'</emph>; <emph>M</emph> = 0.10) was also no different from chance performance (of 0), <emph>t</emph>(<reflink idref="bib114" id="ref66">114</reflink>) = 1.67, <emph>p</emph> = 0.098, <emph>d</emph> = 0.16, 95% CI [− 0.03, 0.34]. Finally, as with Experiment 1, we found a positive response bias (criterion, <emph>c</emph>; <emph>M</emph> = 0.49), <emph>t</emph>(<reflink idref="bib114" id="ref67">114</reflink>) = 7.25, <emph>p</emph> &lt; 0.001, <emph>d</emph> = 0.68, 95% CI [0.47, 0.88], indicating that participants were biased to respond with 'real photo' during the task.</p> <p>Graph: Fig. 4 Variability in task accuracy for Experiments 2 and 3. Performance is displayed for Experiment 2 (blue) and Experiment 3 (orange)</p> <hd id="AN0188649318-19">Can observers distinguish between real and synthetic images?</hd> <p>Marginalising over all predictors, the average accuracy on the task was <emph>M</emph> = 0.51, 94% CrI [0.48, 0.54] <emph>p</emph>(H &gt; 0.5) = 67%, i.e. at the level of chance. Next, we inspected the posterior estimates of the trial type coefficients, which represented baseline accuracy in the model, and calculated the probability that these were above or below chance. The model-estimated probability of identifying a real image as real was <emph>M</emph> = 0.70, 94% CrI [0.64, 0.75], <emph>p</emph>(H &gt; 0.5) = 100%, and for identifying synthetic as synthetic, <emph>M</emph> = 0.32, 94% CrI [0.26, 0.38], <emph>p</emph>(H &lt; 0.5) = 100%.</p> <hd id="AN0188649318-20">Does familiarity improve accuracy?</hd> <p>To investigate whether familiarity improved accuracy, we converted the model-estimated slopes to odds (see Fig. 5). For real trials, a one-point increase in familiarity was associated with a 21% increase in the likelihood of a correct response, <emph>OR</emph> = 1.21, 94% CrI [1.14, 1.28], <emph>p</emph>(H &gt; 1) = 100%, while for synthetic trials, increasing familiarity was associated with a 4% reduction in the likelihood of a correct response, <emph>OR</emph> = 0.96, 94% CrI [0.93, 1], <emph>p</emph>(H &lt; 1) = 95.8%. (For a visualisation of the raw data prior to modelling, see the Supplemental Results and Fig. S4 in the online supplementary materials.)</p> <p>Graph: Fig. 5 Model predictions for the influence of familiarity on accuracy in Experiment 2. The left panel depicts model predictions, where shaded areas represent 94% credible intervals, with synthetic trials shown in red and real trials shown in blue. The right panel depicts posterior distributions of conditional model predictions, setting familiarity to the lowest (one) and highest (seven) levels. The dashed vertical lines represent accuracies expected by chance</p> <p>However, given that logistic models are nonlinear, interpreting coefficients directly should be done with caution, as changes across variables are not constant. To aid interpretation, for each trial type, we used the model to predict accuracy when fixing familiarity to one (lowest possible familiarity) and seven (highest possible). For real trials, low familiarity had clearly above-chance accuracy, <emph>M</emph> = 0.59, 94% CrI [0.51, 0.68], <emph>p</emph>(H &gt; 0.5) = 98.5%, as did high familiarity, <emph>M</emph> = 0.79, 94% CrI [0.73, 0.84], <emph>p</emph>(H &gt; 0.5) = 100%. For synthetic trials, low familiarity had clearly below-chance accuracy, <emph>M</emph> = 0.34, 94% CrI [0.26, 0.42], <emph>p</emph>(H &lt; 0.5) = 99.9%, as did high familiarity, <emph>M</emph> = 0.29, 94% CrI [0.22, 0.36], <emph>p</emph>(H &lt; 0.5) = 100%. From these estimates we computed the relative increase afforded by familiarity by dividing the maximum by the minimum (i.e. the relative risk). For real trials, a correct response was 1.33 times, 94% CrI [1.19, 1.45], <emph>p</emph>(H &gt; 1) = 100%, as likely to occur under high versus low familiarity. For synthetic trials, a correct response was 0.86 times, 94% CrI [0.73, 1] <emph>p</emph>(H &lt; 1) = 95.8%, as likely. As such, familiarity benefited performance with real photos but was detrimental for synthetic images (see Fig. 5).</p> <hd id="AN0188649318-21">Experiment 3</hd> <p>Prior familiarity with identities provided only limited benefits when tasked with differentiating between real photos and computer-generated images. This may be due to participants having to rely on their internal representations of these identities, which are sufficiently robust as to support accurate identification (e.g. Kramer et al., [<reflink idref="bib30" id="ref68">30</reflink>]) but failed to substantially benefit performance here. Therefore, we next investigated whether displaying additional (real) photos of the identity alongside those seen in Experiment 2 would increase synthetic image detection. These additional photographs onscreen (<reflink idref="bib1" id="ref69">1</reflink>) gave viewers facial appearance information for comparison without the need to rely on memory, and (<reflink idref="bib2" id="ref70">2</reflink>) represented a likely real-world scenario since computer-generated images posted online will either be seen alongside real photos or, at the very least, available for comparison with them.</p> <hd id="AN0188649318-22">Method</hd> <p></p> <hd id="AN0188649318-23">Participants</hd> <p>A sample of 127 participants (52 women, 75 men; age <emph>M</emph> = 40.1 years, <emph>SD</emph> = 12.4 years; 69% self-reported ethnicity as White) were recruited online. All eligibility and exclusion criteria were identical to Experiment 1. (For details of exclusions, see the Supplemental Methods and Table S1 in the online supplementary materials.)</p> <hd id="AN0188649318-24">Stimuli</hd> <p>We collected two additional photographs of each celebrity featured in Experiment 2, following the same procedure as in the original collection of images. These new images were also cropped and resized to 500 × 500 pixels, with the same experimental viewing size as those in Experiments 1 and 2.</p> <hd id="AN0188649318-25">Procedure</hd> <p>The general procedure was identical to Experiment 2, except in this case, the two-alternative forced choice (real or computer-generated) was directed towards the middle of three images. As such, we incorporated the following changes. First, instructions before the task explained that on each trial, participants were to decide whether the middle image was a real photograph or a completely new image generated by a computer, and that two real photographs would also be provided to help with their decision-making. Second, during the task, three images for a given identity were presented onscreen next to each other with the instruction appearing above them asking, for example, "Here are three images of Paul Rudd. Do you think the <emph>middle image</emph> is a real photo or computer-generated?" Third, the two additional real images were presented on either side of the image in question, with the label "real photograph" shown above them (see Fig. 6). After making their response, participants provided a familiarity rating as in Experiment 2.</p> <p>Graph: Fig. 6 An example trial from Experiment 3. The correct answer here is 'computer-generated'. Image attributions: left image—David Shankbone (cropped); right image—DannyB Photos (cropped). Photographs are from Wikimedia Commons (2025) (https://commons.wikimedia.org/)</p> <p>Assignment to viewing either Set A (100 identities) or Set B (100 identities) was counterbalanced across participants, while the viewing order of the identities was randomised for each participant. Finally, four attention check trials were included using those images featured in Experiment 2 (see the Supplemental Methods and Fig. S3 in the online supplementary materials), although here presented in the middle of two real photos of the identity. These were incorporated into the randomised order of each viewing sequence.</p> <hd id="AN0188649318-26">Analytic strategy</hd> <p>Again, we investigated accuracy (proportion correct) on the task, along with signal detection measures (sensitivity and response bias). For our model-based Bayesian inference, the model structure was identical to that of Experiment 2, including all group-specific effects, priors, and sampling approaches.</p> <hd id="AN0188649318-27">Results</hd> <p>First, we calculated simple performance (ignoring possible familiarity effects). Across all participants, accuracy (proportion correct) on the task (<emph>M</emph> = 0.58) was significantly greater than chance performance (of 0.50), <emph>t</emph>(<reflink idref="bib126" id="ref71">126</reflink>) = 7.86, <emph>p</emph> &lt; 0.001, <emph>d</emph> = 0.70, 95% CI [0.50, 0.89] (see Fig. 4). Sensitivity (<emph>d'</emph>; <emph>M</emph> = 0.46) was also greater than chance performance (of 0), <emph>t</emph>(<reflink idref="bib126" id="ref72">126</reflink>) = 7.66, <emph>p</emph> &lt; 0.001, <emph>d</emph> = 0.68, 95% CI [0.49, 0.87]. Finally, we found a small, positive response bias (criterion, <emph>c</emph>; <emph>M</emph> = 0.10), <emph>t</emph>(<reflink idref="bib126" id="ref73">126</reflink>) = 2.16, <emph>p</emph> = 0.033, <emph>d</emph> = 0.19, 95% CI [0.02, 0.37], indicating that participants were somewhat biased to respond with 'real photo' during the task.</p> <hd id="AN0188649318-28">Can observers distinguish between real and synthetic images?</hd> <p>Marginalising over all predictors, the average accuracy on the task was <emph>M</emph> = 0.59, 94% CrI [0.56, 0.62], <emph>p</emph>(H &gt; 0.5) = 100%, i.e. above chance levels. Next, we inspected the posterior estimates of the trial type coefficients, which represented baseline accuracy in the model, and calculated the probability that these were above or below chance. The model-estimated probability of identifying a real image as real was <emph>M</emph> = 0.62, 94% CrI [0.58, 0.67], <emph>p</emph>(H &gt; 0.5) = 100%, and for identifying synthetic as synthetic, <emph>M</emph> = 0.56, 94% CrI [0.50 0.61], <emph>p</emph>(H &gt; 0.5) = 96.2%.</p> <hd id="AN0188649318-29">Does familiarity improve accuracy?</hd> <p>To aid interpretation, we again converted the model-estimated slopes to odds (see Fig. 7). For real trials, a one-point increase in familiarity was associated with a 9% increase in the likelihood of a correct response, <emph>OR</emph> = 1.09, 94% CrI [1.04, 1.15], <emph>p</emph>(H &gt; 1) = 99.9%, while for synthetic trials, increasing familiarity was associated with a 1% increase in the likelihood of a correct response, <emph>OR</emph> = 1.01, 94% CrI [0.98, 1.05], <emph>p</emph>(H &gt; 1) = 75.4%, though this was not definitely positive. (For a visualisation of the raw data prior to modelling, see the Supplemental Results and Fig. S5 in the online supplementary materials.)</p> <p>Graph: Fig. 7 Model predictions for the influence of familiarity on accuracy in Experiment 3. The left panel depicts model predictions, where shaded areas represent 94% credible intervals, with synthetic trials shown in red and real trials shown in blue. The right panel depicts posterior distributions of conditional model predictions, setting familiarity to the lowest (one) and highest (seven) levels. The dashed vertical lines represent accuracies expected by chance</p> <p>Next, for each trial type, we used the model to predict accuracy when fixing familiarity to 1 (lowest possible familiarity) and 7 (highest possible). For real trials, low familiarity showed a point estimate above chance but with credible intervals overlapping chance, <emph>M</emph> = 0.55, 94% CrI [0.49, 0.63], <emph>p</emph>(H &gt; 0.5) = 92.8%, while high familiarity was clearly above chance, <emph>M</emph> = 0.69, 94% CrI [0.63, 0.75], <emph>p</emph>(H &gt; 0.5) = 100%. For synthetic trials, low familiarity was also above chance but again with credible intervals overlapping chance, <emph>M</emph> = 0.55, 94% CrI [0.47, 0.62], <emph>p</emph>(H &gt;.5) = 87.2%, while high familiarity was somewhat higher and more certainly above chance, <emph>M</emph> = 0.57, 94% CrI [0.50, 0.64], <emph>p</emph>(H &gt;.5) = 95.7% From these estimates we computed the relative increase afforded by familiarity by dividing the maximum by the minimum (i.e. the relative risk). For real trials, a correct response was 1.25 times, 94% CrI [1.14, 1.36], <emph>p</emph>(H &gt; 1) = 100%, as likely to occur under high versus low familiarity. For synthetic trials, a correct response was essentially even, <emph>M</emph> = 1.04, 94% CrI [0.93, 1.15], <emph>p</emph>(H &gt; 1) = 75.4%, though the posterior estimate suggests a higher probability of a correct response occurring more under high familiarity. Taken together, familiarity resulted in an overall improvement in accuracy (see Fig. 7).</p> <hd id="AN0188649318-30">How does the addition of two real, reference photos influence performance?</hd> <p>Since the only difference between this experiment and Experiment 2 was the presence of two additional, real photographs on each trial, we were able to combine datasets from these two experiments in a single model. This model was identical to the one used above, except for two changes. First, we included an interaction with a categorical 'experiment' indicator, allowing the trial type baseline accuracy to vary across experiments, as well as the association between familiarity and trial type. Second, we expanded the group-specific random effects structure for identities, which now appeared under both trial types for each experiment.</p> <p>Marginalising over all predictors, the model-predicted accuracy here was about 1.17 times, 94% CrI [1.10, 1.24], <emph>p</emph>(H &gt; 1) = 100%, greater relative to Experiment 2. Next, we compared model predictions for each trial type under each experiment, marginalised over familiarity. For real trials, the predicted accuracy here showed evidence of a relative decrease by 0.90, 94% CrI [0.81 0.99], <emph>p</emph>(H &lt; 1) = 97.3%, in comparison with Experiment 2. For synthetic trials, the presence of additional images here clearly increased accuracy by 1.76 times, 94% CrI [1.45, 2.11], <emph>p</emph>(H &gt; 1) = 100%, in comparison with Experiment 2. Taken together, these findings demonstrated a definite increase in accuracy with the addition of two real photographs, resulting mostly from their beneficial effect on trials involving synthetic images.</p> <hd id="AN0188649318-31">Experiment 4</hd> <p>Providing two additional, real photographs led to an increase in performance for synthetic image detection, presumably by allowing for a comparison with these images in addition to any internal representation previously developed through familiarity. However, in everyday contexts, additional images are unlikely to be accompanied by 'real photograph' labels and, as such, their authenticity may also be unknown. Therefore, in this final experiment, we investigated performance for the same three-image displays without providing such labels. Instead, and mirroring a lineup-style task, participants were unaware of whether one image was computer-generated (i.e. 'target present') or all three images were real photographs (i.e. 'target absent').</p> <hd id="AN0188649318-32">Method</hd> <p></p> <hd id="AN0188649318-33">Participants</hd> <p>A sample of 120 participants (49 women, 70 men, 1 preferred another term; age <emph>M</emph> = 39.0 years, <emph>SD</emph> = 13.4 years; 68% self-reported ethnicity as White) were recruited online. All eligibility and exclusion criteria were identical to Experiment 1. (For details of exclusions, see the Supplemental Methods and Table S1 in the online supplementary materials.)</p> <hd id="AN0188649318-34">Stimuli</hd> <p>The stimuli were those featured in Experiment 3, providing us with a 'target present' and a 'target absent' lineup for each identity (where the synthetic image was the 'target'). Both lineups for a given identity contained the two additional, real photographs, alongside either the original photo or the synthetic image from the matched pair.</p> <p>To avoid individual participants viewing both lineups for a given identity, we continued to use the two sets of 50 identities created in Experiment 2. Here, Set A contained 'target absent' lineups for IDs 1–50 and 'target present' lineups for IDs 51–100, while Set B contained the inverse ('present' for 1–50, 'absent' for 51–100).</p> <hd id="AN0188649318-35">Procedure</hd> <p>The general procedure was identical to Experiment 3, except in this case, the task was a four-alternative forced choice (deciding whether one or none of the three images were computer-generated). As such, we incorporated the following changes. First, instructions before the task explained that on each trial, either one image was a completely new image generated by a computer or all of the images were real photographs (so none were computer generated). Second, during the task, the three lineup images for a given identity were presented onscreen next to each other with the instruction appearing above them asking, for example, "Here are three images of Paul Rudd. Do you think any of them are computer-generated?" Third, numbered labels appeared above the three images (from left to right): 'Image 1', 'Image 2', 'Image 3'. Fourth, participants gave a four-alternative forced choice response after being presented with the following options: 'image 1 is computer-generated', 'image 2 is computer-generated', 'image 3 is computer-generated', 'all images are real photos'. (Only one option could be selected.) Following this response, participants provided a familiarity rating as in Experiments 2 and 3.</p> <p>Assignment to viewing either Set A (100 lineups) or Set B (100 lineups) was counterbalanced across participants, while the viewing order of the lineups was randomised for each participant. In addition, the location of the three lineup images onscreen (left/middle/right) was randomised for every trial. Finally, four attention check trials were included using those featured in Experiment 3 (see the Supplemental Methods and Fig. S3 in the online supplementary materials). These were incorporated into the randomised order of each viewing sequence.</p> <hd id="AN0188649318-36">Analytic strategy</hd> <p>Again, we investigated accuracy (proportion correct) on the task, along with signal detection measures (sensitivity and response bias). For our model-based Bayesian inference, the model structure was identical to that of Experiments 2 and 3, including all group-specific effects, priors, and sampling approaches.</p> <hd id="AN0188649318-37">Results</hd> <p>First, we calculated simple performance (ignoring possible familiarity effects). Correct responses were as follows: 1) a synthetic image was present and participants identified it as synthetic (a 'hit'), or 2) no synthetic image was present and participants responded with 'all images are real photos' (a 'correct rejection'). Across all participants, accuracy (proportion correct) on the task (<emph>M</emph> = 0.41) was significantly greater than chance performance (of 0.25), <emph>t</emph>(<reflink idref="bib119" id="ref74">119</reflink>) = 10.38, <emph>p</emph> &lt; 0.001, <emph>d</emph> = 0.95, 95% CI [0.73, 1.16] (see Fig. 8). Considering each trial type separately, both 'target present' trials (<emph>M</emph> = 0.33), <emph>t</emph>(<reflink idref="bib119" id="ref75">119</reflink>) = 3.47, <emph>p</emph> &lt; 0.001, <emph>d</emph> = 0.32, 95% CI [0.13, 0.50], and 'target absent' trials (<emph>M</emph> = 0.50), <emph>t</emph>(<reflink idref="bib119" id="ref76">119</reflink>) = 10.90, <emph>p</emph> &lt; 0.001, <emph>d</emph> = 1.00, 95% CI [0.78, 1.21], were also at above-chance performance. For information on the five response outcomes that were possible, see the Supplemental Results and Figure S6 in the online supplementary materials.</p> <p>Graph: Fig. 8 Variability in task accuracy for Experiment 4</p> <hd id="AN0188649318-38">How well do observers perform on the task?</hd> <p>Marginalising across all predictors, overall accuracy on the task was generally low but excluded chance performance, <emph>M</emph> = 0.38, 94% CrI [0.34, 0.43], <emph>p</emph>(H &gt; 0.25) = 100%. Consideration of the posterior estimates of the trial type coefficients showed that for 'target absent' trials, the model-estimated probability of accuracy was clearly greater than chance, <emph>M</emph> = 0.49, 94% CrI [0.43, 0.56], <emph>p</emph>(H &gt; 0.25) = 100%, but 'target present' trials were essentially at chance, <emph>M</emph> = 0.26, 94% CrI [0.21, 0.32], <emph>p</emph>(H &gt; 0.25) = 68.9%.</p> <hd id="AN0188649318-39">Does familiarity improve accuracy?</hd> <p>As in Experiments 2 and 3, we examined the effects of familiarity on accuracy as odds (see Fig. 9). For 'target absent' trials, there was little evidence of increased familiarity aiding trial accuracy, <emph>OR</emph> = 0.99, 94% CrI [0.94, 1.05], <emph>p</emph>(H &gt; 1) = 41.9%, while for 'target present' trials, a one-unit increase in familiarity was associated with a 9% increase in accuracy, <emph>OR</emph> = 1.09, 94% CrI [1.04, 1.14], <emph>p</emph>(H &gt; 1) = 100%.</p> <p>Graph: Fig. 9 Model predictions for the influence of familiarity on accuracy in Experiment 4. The left panel depicts model predictions, where shaded areas represent 94% credible intervals, with synthetic trials shown in red and real trials shown in blue. The right panel depicts posterior distributions of conditional model predictions, setting familiarity to the lowest (one) and highest (seven) levels. The dashed vertical lines represent accuracies expected by chance</p> <p>Finally, we predicted accuracy for both trial types at familiarities of one and seven. For 'target absent' trials, low familiarity showed levels above-chance accuracy, <emph>M</emph> = 0.44, 94% CrI [0.33, 0.55], <emph>p</emph>(H &gt; 0.25) = 100%, as did high familiarity, <emph>M</emph> = 0.55, 94% CrI [0.45, 0.66], <emph>p</emph>(H &gt; 0.25) = 100%. For 'target present' trials, low familiarity showed accuracy where chance was a credible hypothesis, <emph>M</emph> = 0.22, 94% CrI [0.15, 0.30], <emph>p</emph>(H &gt; 0.25) = 22.9%, as did high familiarity, <emph>M</emph> = 0.31, 94% CrI [0.22, 0.42], <emph>p</emph>(H &gt; 0.25) = 89.5%. For 'target absent' trials, a correct response was 1.27 times, 94% CrI [1.11, 1.42], <emph>p</emph>(H &gt; 1) = 100%, as likely with high compared to low familiarity. For 'target present' trials, a correct response was 1.45 times, 94% CrI [1.18, 1.74], <emph>p</emph>(H &gt; 1) = 100%, as likely with high compared to low familiarity. Taken together, there was a clear (though modest) benefit of increased familiarity on this task (see Fig. 9).</p> <hd id="AN0188649318-40">General discussion</hd> <p>Across our experiments, we demonstrated that an 'off the shelf' AI tool (ChatGPT plus DALL·E) can generate photorealistic images of real, familiar individuals which human observers cannot distinguish from genuine photos. Previous studies have tended to consider only a limited database of photographs (taken from Flickr) and the synthesis of fictional counterparts (created using StyleGAN; e.g. Miller et al., [<reflink idref="bib36" id="ref77">36</reflink>]; Nightingale &amp; Farid, [<reflink idref="bib39" id="ref78">39</reflink>]). As such, using ChatGPT, we first replicated the ability to synthesise images of this style, again confirming that observers were unable to discriminate between these and real photographs. Further, since our approach allowed us to specify all aspects of each synthesised image, we could closely match our images with original photographs to control for non-face characteristics during judgements.</p> <p>Next, we demonstrated the versatility and additional potential of ChatGPT. Not only could we generate images of another style (i.e. red carpet publicity photos) but we also synthesised images of familiar identities (which were again indistinguishable to observers). The importance of this distinction cannot be overstated. While realistic Flickr-style images of novel identities may add credibility to online fake profiles (and disinformation more broadly), the ability to produce novel/synthetic images of real people opens up a number of avenues for use and abuse. For instance, creators might generate images of a celebrity endorsing a certain product or political stance, which could influence public opinion of both the identity and the brand/organisation they are portrayed as supporting (e.g. Knoll &amp; Matthes, [<reflink idref="bib24" id="ref79">24</reflink>]).</p> <p>Finally, we investigated one method that could improve detection performance—providing additional images for comparison. When faced with a potentially synthetic image of a celebrity, observers will typically have access to other images of that identity (online), allowing for a direct comparison rather than solely relying on their internal representations. Here, we found that accuracy was improved when participants knew that the additional images were real photographs, in particular when the image under consideration was synthetic. In contrast, if the authenticity of these additional images was also unknown, accuracy on 'target present' trials (again, when the synthetic image was present) was around chance levels, highlighting the different challenges posed by these two contexts for the same triptych of images. However, it is worth noting that overall accuracy on Experiment 4 was modest but above chance, demonstrating that the presence of additional images, even when it is unclear as to which (if any) are synthetic, may still benefit performance.</p> <p>Importantly, we found that familiarity with identities was associated with only modest increases in performance across our tasks. This contrasts with evidence from studies involving deepfake videos (e.g. Nas &amp; de Kleijn, [<reflink idref="bib38" id="ref80">38</reflink>]), perhaps due to the additional diagnostic cues available in that medium (e.g. audio quality and synchronisation) that may facilitate detection (Groh et al., [<reflink idref="bib19" id="ref81">19</reflink>]). Indeed, such cues might explain why observers are often more accurate in general with synthetic videos than static images (Diel et al., [<reflink idref="bib9" id="ref82">9</reflink>]). From a more theoretical perspective, we know that familiarity leads to substantial improvements in recognition (e.g. Kramer et al., [<reflink idref="bib30" id="ref83">30</reflink>]). However, greater familiarity does not appear to increase sensitivity to subtle visual alterations in a person's face, and may even reduce it, perhaps because those deviations still fit within our internal identity representations (Brédart &amp; Devue, [<reflink idref="bib4" id="ref84">4</reflink>]; Ge et al., [<reflink idref="bib15" id="ref85">15</reflink>]). Therefore, the modest improvements found here suggest that the detection of subtle image artefacts or deviations in likeness may show little benefit from familiarity. Indeed, increased familiarity appears to elevate perceived likeness across all images of a person (Ritchie et al., [<reflink idref="bib45" id="ref86">45</reflink>]), implying a broad tolerance for variation that, in the context of synthetic image detection, may counteract any expected familiarity advantage. Future research should investigate this relationship between familiarity and tolerance more directly.</p> <p>Although familiarity in the current work provided only modest performance benefits, it may nonetheless alter the information that participants rely on when evaluating faces to determine their authenticity. Detection of synthetic faces depicting unfamiliar people is likely to be guided by low-level image cues (e.g. lighting inconsistencies, skin texture, or surface irregularities) whereas judgments about familiar faces may involve higher-level, identity-based expectations regarding shape and expression consistency, for instance. Further, familiarity increases the reliance on internal facial features (e.g. Ellis et al., [<reflink idref="bib12" id="ref87">12</reflink>]) and this may produce a shift in focus when detecting synthetic images also, although this has yet to be studied. Taken together, familiarity may change how people approach the task of synthetic image detection, even though it does not substantially improve performance.</p> <p>Previous work has identified the composition of the training dataset as an important factor when synthesising novel identities. If White face photographs are overrepresented during training then the algorithm (e.g. StyleGAN2) produces more realistic White synthetic faces (Miller et al., [<reflink idref="bib36" id="ref88">36</reflink>]; Nightingale &amp; Farid, [<reflink idref="bib39" id="ref89">39</reflink>]). Here, the composition of ChatGPT/DALL·E's training corpus is unknown. However, we can assume that these datasets contained larger numbers of images of more famous celebrities. As a result, ChatGPT will likely be better able to generate new images resembling those better-known identities. This is because ChatGPT's ability to synthesise a new face that closely resembles the one provided is likely determined by its exposure to similar looking faces (with none more similar than the celebrity themselves) during training. Indeed, anecdotally, this was evident during the initial stages of our exploration of the tool. Future work could therefore consider what is presumably a continuum of fame for celebrities that may predict (via their online prevalence) the realism of synthetic images that ChatGPT is capable of generating.</p> <p>In addition to each synthetic image's realism/resemblance, we also suspect that viewing conditions will play an important role when determining whether an image is computer-generated or not. In the current work, we took care to present face images at an acceptable size for inspection (and prevented the use of mobile phones). However, the public will likely view online content at smaller scales, making any useful signs of synthesis (e.g. incongruent lighting; Miller et al., [<reflink idref="bib36" id="ref90">36</reflink>]) potentially undetectable. Further study will therefore need to consider detection performance in more ecologically valid viewing conditions, e.g. on mobile devices or with smaller/poorer quality images.</p> <p>Although we found that familiarity with an identity failed to prevent erroneously believing a synthetic image to be real, there were clear individual differences in performance across all tasks. Recent studies have begun to investigate whether super-recognisers (i.e. individuals with superior face recognition abilities) may be better able to detect deepfakes. While such a group showed no advantage when shown deepfake videos (Ramon et al., [<reflink idref="bib43" id="ref91">43</reflink>]), evidence suggests super-recognisers may outperform typical observers when tasked with detecting digitally manipulated (Davis et al., [<reflink idref="bib8" id="ref92">8</reflink>]) and AI-synthesised face images (Dunn et al., [<reflink idref="bib11" id="ref93">11</reflink>]; Gray et al., [<reflink idref="bib17" id="ref94">17</reflink>]), although studies have yet to consider deepfakes involving familiar faces. As such, this represents an important avenue for future investigation.</p> <p>In sum, the present work demonstrates ChatGPT's ability to generate synthetic images of both novel and familiar faces which are indistinguishable from real photographs to most human observers. Since both familiarity with, and reference images of, a particular identity produced only limited benefits, researchers will need to explore alternative solutions as a matter of urgency. In time, we might find that automated systems will match or surpass human performance in detecting these deepfakes. However, at least for the foreseeable future, the veracity of content will be left for viewers to determine for themselves and, as such, we should make this search for solutions a priority.</p> <hd id="AN0188649318-41">Open practices statement</hd> <p>The raw data, analysis code, and ChatGPT-generated stimuli are available at the Open Science Framework (https://osf.io/fmuh5). Additional information can also be found in the online supplementary materials. Real photographs from Experiment 1 have previously been made available online (Nightingale &amp; Farid, [<reflink idref="bib39" id="ref95">39</reflink>]), while copyright permissions prevent us from sharing the real photographs of celebrities used in Experiments 2–4 (although these are accessible via Google Images). The experiments presented here were not preregistered.</p> <hd id="AN0188649318-42">Acknowledgements</hd> <p>Not applicable.</p> <hd id="AN0188649318-43">Author contributions</hd> <p>RSSK conceived of the initial project. All authors were involved with experimental design. RSSK collected the experimental data. RSSK and ALJ performed the data analysis. RSSK and ALJ prepared the initial draft. All authors revised the manuscript. All authors read and approved the final manuscript.</p> <hd id="AN0188649318-44">Funding</hd> <p>This work was supported by the Israel Science Foundation grant (ISF-1498/21) to DF.</p> <hd id="AN0188649318-45">Data availability</hd> <p>The raw data, analysis code, and ChatGPT-generated stimuli are available at the Open Science Framework: https://osf.io/fmuh5</p> <hd id="AN0188649318-46">Declarations</hd> <p></p> <hd id="AN0188649318-47">Ethics approval and consent to participate</hd> <p>All experiments received ethical approval from the University of Lincoln's research ethics committee (ref. 21014) and were carried out in accordance with the provisions of the World Medical Association Declaration of Helsinki.</p> <hd id="AN0188649318-48">Consent for publication</hd> <p>Not applicable.</p> <hd id="AN0188649318-49">Competing interests</hd> <p>The authors declare that they have no competing interests.</p> <hd id="AN0188649318-50">Supplementary Information</hd> <p>Graph: Supplementary file 1.</p> <hd id="AN0188649318-51">Publisher's Note</hd> <p>Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p> <ref id="AN0188649318-52"> <title> References </title> <blist> <bibl id="bib1" idref="ref54" type="bt">1</bibl> <bibtext> Abril-Pla O, Andreani V, Carroll C, Dong L, Fonnesbeck CJ, Kochurov M, Zinkov R. PyMC: a modern, and comprehensive probabilistic programming framework in Python. PeerJ Computer Science. 2023; 9: e1516. 37705656. 10495961. 10.7717/peerj-cs.1516</bibtext> </blist> <blist> <bibl id="bib2" idref="ref50" type="bt">2</bibl> <bibtext> Anwyl-Irvine AL, Massonnié J, Flitton A, Kirkham N, Evershed JK. Gorilla in our midst: An online behavioral experiment builder. Behavior Research Methods. 2020; 52: 388-407. 31016684. 10.3758/s13428-019-01237-x</bibtext> </blist> <blist> <bibl id="bib3" idref="ref6" type="bt">3</bibl> <bibtext> Bray SD, Johnson SD, Kleinberg B. Testing human ability to detect 'deepfake' images of human faces. Journal of Cybersecurity. 2023; 9; 1: 1-18. 10.1093/cybsec/tyad011</bibtext> </blist> <blist> <bibl id="bib4" idref="ref14" type="bt">4</bibl> <bibtext> Brédart S, Devue C. The accuracy of memory for faces of personally known individuals. Perception. 2006; 35; 1: 101-106. 16491712. 10.1068/p5382</bibtext> </blist> <blist> <bibl id="bib5" idref="ref15" type="bt">5</bibl> <bibtext> Brooks KR, Kemp RI. Sensitivity to feature displacement in familiar and unfamiliar faces: Beyond the internal/external feature distinction. Perception. 2007; 36; 11: 1646-1659. 18265845. 10.1068/p5675</bibtext> </blist> <blist> <bibl id="bib6" idref="ref23" type="bt">6</bibl> <bibtext> Burton AM, Wilson S, Cowan M, Bruce V. Face recognition in poor quality video: Evidence from security surveillance. Psychological Science. 1999; 10: 243-248. 10.1111/1467-9280.00144</bibtext> </blist> <blist> <bibl id="bib7" idref="ref13" type="bt">7</bibl> <bibtext> Burton AM, Kramer RSS, Ritchie KL, Jenkins R. Identity from variation: Representations of faces derived from multiple instances. Cognitive Science. 2016; 40; 1: 202-223. 25824013. 10.1111/cogs.12231</bibtext> </blist> <blist> <bibl id="bib8" idref="ref92" type="bt">8</bibl> <bibtext> Davis JP, Robertson DJ, Jenkins RE, Ibsen M, Nichols R, Babbs M, Rathgeb C, Løvåsdal F, Raja K, Busch C. The super-recogniser advantage extends to the detection of digitally manipulated faces. Applied Cognitive Psychology. 2025; 39; 2. 10.1002/acp.70053e70053</bibtext> </blist> <blist> <bibl id="bib9" idref="ref82" type="bt">9</bibl> <bibtext> Diel A, Lalgi T, Schröter IC, MacDorman KF, Teufel M, Bäuerle A. Human performance in detecting deepfakes: A systematic review and meta-analysis of 56 papers. Computers in Human Behavior Reports. 2024; 16. 10.1016/j.chbr.2024.100538100538</bibtext> </blist> <blist> <bibtext> Diel, A, &amp; Lewis, M. (2022). Familiarity, orientation, and realism increase face uncanniness by sensitizing to facial distortions. Journal of Vision, 22(4):14, 1–20.</bibtext> </blist> <blist> <bibtext> Dunn, J. D, White, D, Sutherland, C, Miller, E. J, Steward, B. A, &amp; Dawel, A. (2025). Super-recognisers can detect AI-hyperrealism. Open Science Framework. https://doi.org/10.31234/osf.io/fwjsb_v2</bibtext> </blist> <blist> <bibtext> Ellis HD, Shepherd JW, Davies GM. Identification of familiar and unfamiliar faces from internal and external features: Some implications for theories of face recognition. Perception. 1979; 8: 431-439. 503774. 10.1068/p080431</bibtext> </blist> <blist> <bibtext> Elyoseph Z, Refoua E, Asraf K, Lvovsky M, Shimoni Y, Hadar-Shoval D. Capacity of generative AI to interpret human emotions from visual and textual data: Pilot evaluation study. JMIR Mental Health. 2024; 11. 38319707. 10879976. 10.2196/54369e54369</bibtext> </blist> <blist> <bibtext> Faul F, Erdfelder E, Lang A-G, Buchner A. GPower 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods. 2007; 39; 2: 175-191. 17695343. 10.3758/BF03193146</bibtext> </blist> <blist> <bibtext> Ge L, Luo J, Nishimura M, Lee K. The lasting impression of Chairman Mao: Hyperfidelity of familiar-face memory. Perception. 2003; 32; 5: 601-614. 12854646. 10.1068/p5022</bibtext> </blist> <blist> <bibtext> Gelman A, Simpson D, Betancourt M. The prior can often only be understood in the context of the likelihood. Entropy. 2017; 19; 10: 555. 10.3390/e19100555</bibtext> </blist> <blist> <bibtext> Gray, K, Davis, J. P, Bunce, C, Ritchie, K. L, &amp; Noyes, E. (2025). Training super-recognisers' detection and discrimination of computer-generated faces. Open Science Framework. https://doi.org/10.31234/osf.io/5jqh8_v1</bibtext> </blist> <blist> <bibtext> Groh M, Epstein Z, Firestone C, Picard R. Deepfake detection by human crowds, machines, and machine-informed crowds. Proceedings of the National Academy of Sciences. 2022; 119; 1. 10.1073/pnas.2110013119e2110013119</bibtext> </blist> <blist> <bibtext> Groh M, Sankaranarayanan A, Singh N, Kim DY, Lippman A, Picard R. Human detection of political speech deepfakes across transcripts, audio, and video. Nature Communications. 2024; 15; 1: 7629. 39223110. 11368926. 10.1038/s41467-024-51998-z</bibtext> </blist> <blist> <bibtext> Jamali, L. (2024). OpenAI value surges to $157bn in funding deal. BBC News. https://<ulink href="http://www.bbc.co.uk/news/articles/c8rd0jd1g6xo">www.bbc.co.uk/news/articles/c8rd0jd1g6xo</ulink></bibtext> </blist> <blist> <bibtext> Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021; 43; 12: 4217-4228. 32012000. 10.1109/TPAMI.2020.2970919</bibtext> </blist> <blist> <bibtext> Karras, T, Laine, S, Aittala, M, Hellsten, J, Lehtinen, J, &amp; Aila, T. (2020). Analyzing and improving the image quality of StyleGAN. https://openaccess.thecvf.com/content_CVPR_2020/html/Karras_Analyzing_and_Improving_the_Image_Quality_of_StyleGAN_CVPR_2020_paper.html</bibtext> </blist> <blist> <bibtext> Kätsyri J, Förger K, Mäkäräinen M, Takala T. A review of empirical evidence on different uncanny valley hypotheses: Support for perceptual mismatch as one road to the valley of eeriness. Frontiers in Psychology. 2015; 6: 390. 25914661. 4392592. 10.3389/fpsyg.2015.00390</bibtext> </blist> <blist> <bibtext> Knoll J, Matthes J. The effectiveness of celebrity endorsements: A meta-analysis. Journal of the Academy of Marketing Science. 2017; 45: 55-75. 10.1007/s11747-016-0503-8</bibtext> </blist> <blist> <bibtext> Kramer RSS. Face to face: Comparing ChatGPT with human performance on face matching. Perception. 2025; 54; 1: 65-68. 39497555. 10.1177/03010066241295992</bibtext> </blist> <blist> <bibtext> Kramer RSS. Fusing ChatGPT and human decisions in unfamiliar face matching. Applied Cognitive Psychology. 2025; 39; 2. 10.1002/acp.70037e70037</bibtext> </blist> <blist> <bibtext> Kramer RSS. Comparing ChatGPT with human judgements of social traits from face photographs. Computers in Human Behavior: Artificial Humans. 2025; 4. 10.1016/j.chbah.2025.100156100156</bibtext> </blist> <blist> <bibtext> Kramer RSS, Cartledge C. Crowds improve human detection of AI-synthesised faces. Applied Cognitive Psychology. 2024; 38; 5. 10.1002/acp.4245e4245</bibtext> </blist> <blist> <bibtext> Kramer RSS, Gous G. Eyewitness descriptions without memory: The (f)utility of describing faces. Applied Cognitive Psychology. 2020; 34; 3: 605-615. 10.1002/acp.3645</bibtext> </blist> <blist> <bibtext> Kramer RSS, Young AW, Burton AM. Understanding face familiarity. Cognition. 2018; 172: 46-58. 29232594. 10.1016/j.cognition.2017.12.005</bibtext> </blist> <blist> <bibtext> Kramer, R. S. S. (2025c). Identifying basic emotions and action units from facial photographs with ChatGPT. Journal of Nonverbal Behavior, 49, 289-306.</bibtext> </blist> <blist> <bibtext> Lago F, Pasquini C, Böhme R, Dumont H, Goffaux V, Boato G. More real than real: A study on human visual perception of synthetic faces. IEEE Signal Processing Magazine. 2022; 39; 1: 109-116. 10.1109/MSP.2021.3120982</bibtext> </blist> <blist> <bibtext> Lewandowski D, Kurowicka D, Joe H. Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis. 2009; 100; 9: 1989-2001. 10.1016/j.jmva.2009.04.008</bibtext> </blist> <blist> <bibtext> Makowski D, Ben-Shachar MS, Chen SA, Lüdecke D. Indices of effect existence and significance in the Bayesian framework. Frontiers in Psychology. 2019; 10: 2767. 31920819. 6914840. 10.3389/fpsyg.2019.02767</bibtext> </blist> <blist> <bibtext> Masood M, Nawaz M, Malik KM, Javed A, Irtaza A, Malik H. Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward. Applied Intelligence. 2023; 53; 4: 3974-4026. 10.1007/s10489-022-03766-z</bibtext> </blist> <blist> <bibtext> Miller EJ, Steward BA, Witkower Z, Sutherland CA, Krumhuber EG, Dawel A. AI hyperrealism: Why AI faces are perceived as more real than human ones. Psychological Science. 2023; 34; 12: 1390-1403. 37955384. 10.1177/09567976231207095</bibtext> </blist> <blist> <bibtext> Mori, M. (2012). The uncanny valley (F. MacDorman &amp; N. Kageki, Trans.). Energy 7, 33–35. (Original published work 1970)</bibtext> </blist> <blist> <bibtext> Nas E, de Kleijn R. Conspiracy thinking and social media use are associated with ability to detect deepfakes. Telematics and Informatics. 2024; 87. 10.1016/j.tele.2023.102093102093</bibtext> </blist> <blist> <bibtext> Nightingale SJ, Farid H. AI-synthesized faces are indistinguishable from real faces and more trustworthy. Proceedings of the National Academy of Sciences. 2022; 119; 8. 10.1073/pnas.2120481119e2120481119</bibtext> </blist> <blist> <bibtext> O'Donnell C, Bruce V. Familiarisation with faces selectively enhances sensitivity to changes made to the eyes. Perception. 2001; 30; 6: 755-764. 11464563. 10.1068/p3027</bibtext> </blist> <blist> <bibtext> Park S, Nicolau JL. Asymmetric effects of online consumer reviews. Annals of Tourism Research. 2015; 50: 67-83. 10.1016/j.annals.2014.10.007</bibtext> </blist> <blist> <bibtext> Pozo E, Germine LT, Scheuer L, Strong RW. Evaluating the reliability and validity of the famous faces doppelgangers test, a novel measure of familiar face recognition. Assessment. 2023; 30; 4: 1200-1210. 35450435. 10.1177/10731911221087746</bibtext> </blist> <blist> <bibtext> Ramon M, Vowels M, Groh M. Deepfake detection in super-recognizers and police officers. IEEE Security &amp; Privacy. 2024; 22; 3: 68-76. 10.1109/MSEC.2024.3371030</bibtext> </blist> <blist> <bibtext> Ricker, J, Assenmacher, D, Holz, T, Fischer, A, &amp; Quiring, E. (2024). AI-generated faces in the real world: A large-scale case study of twitter profile images. Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses, Padua, Italy.</bibtext> </blist> <blist> <bibtext> Ritchie KL, Kramer RSS, Burton AM. What makes a face photo a 'good likeness'?. Cognition. 2018; 170: 1-8. 28917125. 10.1016/j.cognition.2017.09.001</bibtext> </blist> <blist> <bibtext> Robertson DJ, Noyes E, Dowsett AJ, Jenkins R, Burton AM. Face recognition by metropolitan police super-recognisers. PLoS One. 2016; 11; 2. 26918457. 4769018. 10.1371/journal.pone.0150036e0150036</bibtext> </blist> <blist> <bibtext> Rouder JN. Optional stopping: No problem for Bayesians. Psychonomic Bulletin &amp; Review. 2014; 21: 301-308. 10.3758/s13423-014-0595-4</bibtext> </blist> <blist> <bibtext> Sandford A, Bindemann M. Discrimination and recognition of faces with changed configuration. Memory &amp; Cognition. 2020; 48; 2: 287-298. 10.3758/s13421-019-01010-7</bibtext> </blist> <blist> <bibtext> Tucciarelli R, Vehar N, Chandaria S, Tsakiris M. On the realness of people who do not exist: The social processing of artificial faces. iScience. 2022; 25: 105441. 36590465. 9801245. 10.1016/j.isci.2022.105441</bibtext> </blist> <blist> <bibtext> Wang S, Lilienfeld SO, Rochat P. The uncanny valley: Existence and explanations. Review of General Psychology. 2015; 19; 4: 393-407. 10.1037/gpr0000056</bibtext> </blist> <blist> <bibtext> Welsch R, von Castell C, Hecht H. Interpersonal distance regulation and approach-avoidance reactions are altered in psychopathy. Clinical Psychological Science. 2020; 8; 2: 211-225. 10.1177/2167702619869336</bibtext> </blist> <blist> <bibtext> Xu Q. Should I trust him? The effects of reviewer profile characteristics on eWOM credibility. Computers in Human Behavior. 2014; 33: 136-144. 10.1016/j.chb.2014.01.027</bibtext> </blist> <blist> <bibtext> Young AW, Burton AM. Are we face experts?. Trends in Cognitive Sciences. 2018; 22; 2: 100-110. 29254899. 10.1016/j.tics.2017.11.007</bibtext> </blist> </ref> <aug> <p>By Robin S. S. Kramer; Alex L. Jones; Daniel Fitousi and Jeremy J. Tree</p> <p>Reported by Author; Author; Author; Author</p> </aug> <nolink nlid="nl1" bibid="bib37" firstref="ref1"></nolink> <nolink nlid="nl2" bibid="bib23" firstref="ref2"></nolink> <nolink nlid="nl3" bibid="bib50" firstref="ref3"></nolink> <nolink nlid="nl4" bibid="bib22" firstref="ref4"></nolink> <nolink nlid="nl5" bibid="bib21" firstref="ref5"></nolink> <nolink nlid="nl6" bibid="bib28" firstref="ref7"></nolink> <nolink nlid="nl7" bibid="bib32" firstref="ref8"></nolink> <nolink nlid="nl8" bibid="bib36" firstref="ref9"></nolink> <nolink nlid="nl9" bibid="bib39" firstref="ref10"></nolink> <nolink nlid="nl10" bibid="bib49" firstref="ref11"></nolink> <nolink nlid="nl11" bibid="bib44" firstref="ref12"></nolink> <nolink nlid="nl12" bibid="bib10" firstref="ref16"></nolink> <nolink nlid="nl13" bibid="bib40" firstref="ref17"></nolink> <nolink nlid="nl14" bibid="bib48" firstref="ref18"></nolink> <nolink nlid="nl15" bibid="bib42" firstref="ref19"></nolink> <nolink nlid="nl16" bibid="bib46" firstref="ref20"></nolink> <nolink nlid="nl17" bibid="bib53" firstref="ref21"></nolink> <nolink nlid="nl18" bibid="bib15" firstref="ref25"></nolink> <nolink nlid="nl19" bibid="bib41" firstref="ref26"></nolink> <nolink nlid="nl20" bibid="bib52" firstref="ref27"></nolink> <nolink nlid="nl21" bibid="bib24" firstref="ref28"></nolink> <nolink nlid="nl22" bibid="bib35" firstref="ref29"></nolink> <nolink nlid="nl23" bibid="bib18" firstref="ref30"></nolink> <nolink nlid="nl24" bibid="bib20" firstref="ref31"></nolink> <nolink nlid="nl25" bibid="bib29" firstref="ref32"></nolink> <nolink nlid="nl26" bibid="bib13" firstref="ref33"></nolink> <nolink nlid="nl27" bibid="bib25" firstref="ref34"></nolink> <nolink nlid="nl28" bibid="bib26" firstref="ref35"></nolink> <nolink nlid="nl29" bibid="bib31" firstref="ref36"></nolink> <nolink nlid="nl30" bibid="bib27" firstref="ref37"></nolink> <nolink nlid="nl31" bibid="bib47" firstref="ref42"></nolink> <nolink nlid="nl32" bibid="bib14" firstref="ref43"></nolink> <nolink nlid="nl33" bibid="bib16" firstref="ref52"></nolink> <nolink nlid="nl34" bibid="bib33" firstref="ref53"></nolink> <nolink nlid="nl35" bibid="bib34" firstref="ref55"></nolink> <nolink nlid="nl36" bibid="bib51" firstref="ref56"></nolink> <nolink nlid="nl37" bibid="bib109" firstref="ref57"></nolink> <nolink nlid="nl38" bibid="bib94" firstref="ref61"></nolink> <nolink nlid="nl39" bibid="bib95" firstref="ref62"></nolink> <nolink nlid="nl40" bibid="bib114" firstref="ref65"></nolink> <nolink nlid="nl41" bibid="bib30" firstref="ref68"></nolink> <nolink nlid="nl42" bibid="bib126" firstref="ref71"></nolink> <nolink nlid="nl43" bibid="bib119" firstref="ref74"></nolink> <nolink nlid="nl44" bibid="bib38" firstref="ref80"></nolink> <nolink nlid="nl45" bibid="bib19" firstref="ref81"></nolink> <nolink nlid="nl46" bibid="bib45" firstref="ref86"></nolink> <nolink nlid="nl47" bibid="bib12" firstref="ref87"></nolink> <nolink nlid="nl48" bibid="bib43" firstref="ref91"></nolink> <nolink nlid="nl49" bibid="bib11" firstref="ref93"></nolink> <nolink nlid="nl50" bibid="bib17" firstref="ref94"></nolink>
Header	DbId: eric DbLabel: ERIC An: EJ1491516 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: AI-Generated Images of Familiar Faces Are Indistinguishable from Real Photographs – Name: Language Label: Language Group: Lang Data: English – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Robin+S%2E+S%2E+Kramer%22">Robin S. S. Kramer</searchLink> (ORCID <externalLink term="http://orcid.org/0000-0001-8339-8832">0000-0001-8339-8832</externalLink>)<br /><searchLink fieldCode="AR" term="%22Alex+L%2E+Jones%22">Alex L. Jones</searchLink><br /><searchLink fieldCode="AR" term="%22Daniel+Fitousi%22">Daniel Fitousi</searchLink><br /><searchLink fieldCode="AR" term="%22Jeremy+J%2E+Tree%22">Jeremy J. Tree</searchLink> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="SO" term="%22Cognitive+Research%3A+Principles+and+Implications%22"><i>Cognitive Research: Principles and Implications</i></searchLink>. 2025 10. – Name: Avail Label: Availability Group: Avail Data: Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/ – Name: PeerReviewed Label: Peer Reviewed Group: SrcInfo Data: Y – Name: Pages Label: Page Count Group: Src Data: 16 – Name: DatePubCY Label: Publication Date Group: Date Data: 2025 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Journal Articles<br />Reports - Research – Name: Subject Label: Descriptors Group: Su Data: <searchLink fieldCode="DE" term="%22Artificial+Intelligence%22">Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Human+Body%22">Human Body</searchLink><br /><searchLink fieldCode="DE" term="%22Photography%22">Photography</searchLink><br /><searchLink fieldCode="DE" term="%22Adults%22">Adults</searchLink><br /><searchLink fieldCode="DE" term="%22Foreign+Countries%22">Foreign Countries</searchLink><br /><searchLink fieldCode="DE" term="%22Pictorial+Stimuli%22">Pictorial Stimuli</searchLink><br /><searchLink fieldCode="DE" term="%22Identification%22">Identification</searchLink><br /><searchLink fieldCode="DE" term="%22Familiarity%22">Familiarity</searchLink><br /><searchLink fieldCode="DE" term="%22Accuracy%22">Accuracy</searchLink> – Name: Subject Label: Geographic Terms Group: Su Data: <searchLink fieldCode="DE" term="%22United+States%22">United States</searchLink><br /><searchLink fieldCode="DE" term="%22Canada%22">Canada</searchLink><br /><searchLink fieldCode="DE" term="%22United+Kingdom%22">United Kingdom</searchLink><br /><searchLink fieldCode="DE" term="%22Australia%22">Australia</searchLink><br /><searchLink fieldCode="DE" term="%22New+Zealand%22">New Zealand</searchLink> – Name: DOI Label: DOI Group: ID Data: 10.1186/s41235-025-00683-w – Name: ISSN Label: ISSN Group: ISSN Data: 2365-7464 – Name: Abstract Label: Abstract Group: Ab Data: Human users are now able to generate synthetic face images with artificial intelligence (AI) tools. Although indistinguishable from real photographs, these images have tended to feature fictional identities that do not exist in the real world. As a result, their use in applied contexts, including the spread of fake information, is similarly limited. Here, we investigated a new method for generating face images (via ChatGPT plus DALL-E) and its application to both fictional and real (in this case, celebrity) identities. Our results demonstrated that generated images of both fictional (Experiment 1) and celebrity identities (Experiment 2) could not be distinguished from real photographs. Further, providing additional real photographs for comparison during the task resulted in limited gains (Experiments 3 and 4). Finally, prior familiarity with celebrity faces produced only modest performance improvements. Therefore, new methods of detection should be explored as a matter of urgency since the latest 'off the shelf' AI tools can now generate face images of real people that are essentially undetectable as synthetic to most human observers. – Name: AbstractInfo Label: Abstractor Group: Ab Data: As Provided – Name: Note Label: Notes Group: Note Data: https://osf.io/fmuh5 – Name: DateEntry Label: Entry Date Group: Date Data: 2026 – Name: AN Label: Accession Number Group: ID Data: EJ1491516
PLink	https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1491516
RecordInfo	BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1186/s41235-025-00683-w Languages: – Text: English PhysicalDescription: Pagination: PageCount: 16 Subjects: – SubjectFull: Artificial Intelligence Type: general – SubjectFull: Human Body Type: general – SubjectFull: Photography Type: general – SubjectFull: Adults Type: general – SubjectFull: Foreign Countries Type: general – SubjectFull: Pictorial Stimuli Type: general – SubjectFull: Identification Type: general – SubjectFull: Familiarity Type: general – SubjectFull: Accuracy Type: general – SubjectFull: United States Type: general – SubjectFull: Canada Type: general – SubjectFull: United Kingdom Type: general – SubjectFull: Australia Type: general – SubjectFull: New Zealand Type: general Titles: – TitleFull: AI-Generated Images of Familiar Faces Are Indistinguishable from Real Photographs Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Robin S. S. Kramer – PersonEntity: Name: NameFull: Alex L. Jones – PersonEntity: Name: NameFull: Daniel Fitousi – PersonEntity: Name: NameFull: Jeremy J. Tree IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 12 Type: published Y: 2025 Identifiers: – Type: issn-electronic Value: 2365-7464 Numbering: – Type: volume Value: 10 Titles: – TitleFull: Cognitive Research: Principles and Implications Type: main
ResultId	1