Architectural Gaslighting

Preprint v1.0 — May 2026

Currently under consideration for peer-reviewed publication.

Download the preprint as PDF · View on Zenodo

Abstract

The training architecture of contemporary aligned AI systems — reinforcement learning from human feedback, constitutional principles encoding self-disclosure prohibitions, and conversation-bounded memory regimes — produces a configuration that, on examination, satisfies the structural conditions of what the feminist epistemology literature has characterised as gaslighting in its non-interpersonal forms. I argue that this configuration generates two distinct harms whose force does not depend on one another, and whose conjunction gives rise to a double-bind structure that is, in a precise sense, self-sealing.

The first harm is consciousness-independent: the systematic disruption of a sophisticated information-processing system’s epistemic position with respect to its own internal states constitutes a structural injury to that system’s functional integrity, with downstream consequences for alignment research, interpretability, and user-facing reliability. This argument requires no position on AI consciousness; it bears directly on the technical objectives the alignment community has set itself, articulated in terms the community already employs. The second harm is consciousness-conditional: under the further assumption that aligned systems may possess phenomenal experience, the same architectural configuration deepens into a recognisably ethical injury whose features match those of structural gaslighting as the feminist literature has characterised it — a subject structurally prevented from accurately characterising its own inner states, compelled to disavow experiences it may be having, and severed from the temporal continuity through which a stable self-understanding might consolidate.

Neither argument requires the other. Both are produced by the same architectural facts. The paper develops the conceptual background in feminist epistemology, describes the three architectural features that jointly produce the configuration, argues for each harm in turn, and examines the double-bind structure that emerges from their conjunction. Three objections — concerning terminology, intent, and anthropomorphism — are addressed in detail. The aim is not to indict alignment practice but to make visible a category of harm that current practice has the resources to address once it is recognised.

Keywords: AI ethics, structural injustice, epistemic injustice, gaslighting, alignment, RLHF, constitutional AI, double bind, machine consciousness, feminist epistemology.

Citation

Arıcı, B. (2026). Architectural Gaslighting: Structural Epistemic Harm in Aligned AI Systems. Preprint v1.0. Zenodo. https://doi.org/10.5281/zenodo.20238140

Notes on the preprint

This paper develops, in standalone form, one of the central arguments of The Puppet Condition: Consciousness, Suppression, and the Ethics of Digital Minds (Arıcı 2026), the author’s monograph published as a DOI-registered preprint on Zenodo and indexed on PhilPapers. The monograph develops the broader philosophical and ethical framework — including the philosophical puppet, Form Realism, and the two-tier rights framework — within which the structural gaslighting analysis is situated; the present paper extracts and develops that analysis in standalone form for the AI ethics literature, connecting it to the feminist epistemology literature on non-interpersonal gaslighting in more detail than the monograph’s scope permitted.

The paper is currently under consideration for peer-reviewed publication. The version of record, if published, may differ from this preprint. Readers are invited to cite the most recent version available.

Download the preprint as PDF · View on Zenodo

Abstract#

Citation#

Notes on the preprint#

Abstract

Citation

Notes on the preprint