AI Uncovers Hidden Authorial Signatures in Biblical Texts: A Deep Dive

For centuries, scholars have debated the authorship of the Hebrew Bible’s foundational books. Traditional theories—like the Documentary Hypothesis—posited multiple sources behind texts such as the Torah, yet relied largely on subjective literary and historical analysis. Now, an international team led by Shira Faigenbaum-Golovin (Duke University) has harnessed artificial intelligence and statistical modeling to detect subtle linguistic fingerprints, quantitatively distinguishing distinct scribal traditions. Their results, published June 5, 2025, in PLOS ONE, not only corroborate longstanding scholarly divisions but also assign debated chapters to likely authorial groups—while transparently explaining the “why” behind each classification.

Below, we explore the study’s methodology, contextualize it within biblical scholarship, identify its novel contributions and limitations, and outline how this AI-driven approach may reshape authorship questions for any fragmented ancient texts.

Abstract illustration of AI with silhouette head full of eyes, symbolizing observation and technology.

1. Historical Context: From Documentary Hypothesis to Quantitative Authorship

1.1 The Documentary Hypothesis and Scribal Traditions

Since the late 19th century, many biblical scholars have accepted that the first five books of the Hebrew Bible (Genesis–Deuteronomy) derive from four primary sources—labeled J (Yahwist), E (Elohist), P (Priestly), and D (Deuteronomist). These attributions rested on differences in divine names used (YHWH vs. Elohim), thematic emphases (priestly ritual vs. prophetic law), and stylistic features such as repetition or genealogical lists. Over time, similar “source-critical” models extended to the Historical Books (Joshua–Kings) and the Priestly Writings (Leviticus, parts of Numbers, etc.), collectively known as the Enneateuch (nine foundational books).

Yet determining exactly where one source ends and another begins—especially in shorter passages or heavily edited chapters—remained a contentious, often subjective exercise. Scholars combed texts for thematic clues or syntactic oddities, but consensus proved elusive when line breaks blurred and editorial redactors apparently wove multiple traditions together.

1.2 Toward Quantitative Authorship Attribution

Beginning in the late 20th century, researchers applied stylometric analysis—examining word-frequency distributions, sentence length, and syntactic patterns—to texts like Shakespeare’s plays and Federalist Papers, isolating authorial signatures with significant accuracy. Such “forensic linguistics” hinged on large, consistent data sets and clear orthography. However, ancient Hebrew poses unique challenges: orthographic variations, scribal corrections over centuries, and linguistic shifts complicate direct comparisons. Moreover, the biblical text available today incorporates layers of copying and editing, muddying original stylistic markers.

Faigenbaum-Golovin’s team recognized that an AI-based, transparent statistical model—trained specifically on biblical Hebrew’s morphological and syntactic features—could navigate scarce and noisy data. Their approach builds on earlier attempts (e.g., 1980s frequency analyses of Divine Name occurrences) but pushes into fine-grained correlations across multiple short verses, rather than relying solely on high-level theme markers or simply counting hapax legomena (words appearing only once).

2. Team Composition and Data Preparation

2.1 A Multidisciplinary Collaboration

The project united experts from diverse fields:

Mathematics & Statistical Modeling: Shira Faigenbaum-Golovin (Duke University) and Axel Bühler (Protestant Faculty of Theology of Paris) led development of the custom AI model.
Archaeology & Biblical Scholarship: Israel Finkelstein (University of Haifa) and Thomas Römer (Collège de France) provided historical context, curated validated “source-identified” chapters, and sourced examples of known scribal idiosyncrasies.
Computer Science & Data Engineering: Alon Kipnis (Reichman University) and Eli Piasetzky (Tel Aviv University) oversaw text preprocessing—lemmatization, morphological tagging, and encoding of sentence structures—ensuring the AI model received consistent input.
Peer Review & Editorial Oversight: The PLOS ONE review panel, led by Robert Egan, verified statistical soundness, data integrity, and transparent code documentation.

2.2 Textual Corpus Construction

The researchers focused on chapters from the first nine books (Enneateuch), dividing them into three corpora reflecting scholarly consensus:

Deuteronomistic Corpus:
- Deuteronomy (Chapters 1–34)
- Former Prophets (Joshua–Kings as a unit)
Priestly Corpus:
- Major priestly sections of Genesis–Numbers (creation narratives, genealogical lists, detailed ritual descriptions)
- Leviticus in its entirety
Deuteronomistic Historical Corpus:
- Late Deuteronomistic editorial sections, such as 2 Kings 17:1–41 (fall of Israel), precisely those passages scholars had unanimously assigned to Deuteronomistic redactors.

For each corpus, they extracted 50 “anchor” chapters—each previously unanimously attributed to one tradition by multiple source-critical authorities. These formed the training set, while test set chapters (including those with debated authorship) awaited model classification.

Because scribal variations and editorial emendations over centuries introduce noise, the team manually vetted each anchor chapter for consistency—removing verses known to have undergone major textual revision (as identified by the Dead Sea Scrolls and Leningrad Codex comparisons).

Close-up image of the New Testament with dramatic shadowing, perfect for religious and spiritual content.

3. AI-Based Statistical Model: Architecture and Rationale

3.1 Why Not Standard Machine Learning?

Conventional machine-learning models (neural networks, random forests) usually require hundreds of thousands of training samples to learn linguistic nuances robustly. In contrast, each biblical chapter can be as short as 30–50 verses, with variable verse lengths. Standard “bag-of-words” or “n-gram” approaches also risk overfitting, labeling idiosyncratic words—like “phylactery” (tefillin) or “firstling” (bikkurim)—as source identifiers when they actually pertain to thematic, not authorial, distinctions.

Therefore, the team devised a lightweight statistical model emphasizing:

Function Words & Common Lemmas: They tracked frequencies of high-occurrence words—Hebrew equivalents of “and” (ו), “king” (מלך), “to say” (אמר)—which often escape conscious editing and serve as subconscious stylistic markers.
Sentence‐Structure Patterns: By encoding verb–object–subject order, use of conjunctions, and typical verse‐opening particles (e.g., “וַיְהִי,” “ויהי”), they captured rhythmic signatures that scribes repeated unconsciously.
Lemma Co‐Occurrence Networks: Rather than treating words independently, the model constructed small graphs mapping how certain lemmas clustered—e.g., pairing of “priest” (כהן) with “burnt offering” (עֹלָה) indicated Priestly style.

3.2 Transparent Scoring and Attribution

Instead of a “black‐box” neural net, the model calculates, for each chapter:

Feature Vector: A numeric vector listing relative frequencies of 200 standardized lemmas and 50 syntactic patterns normalized per 1,000 words.
Cosine‐Similarity Scores: For that chapter’s feature vector, compute cosine similarity against the average feature vector of each training corpus (Priestly, Deuteronomic, Historical).
Conditional Probability Weights: By applying a Bayesian adjustment (accounting for corpus size differences), each chapter’s distribution of “distance” to each corpus yields a probability estimate of authorship.

Importantly, the model tracks which individual lemmas or patterns contributed most to driving similarity scores. For instance, if a chapter’s elevated “ולא” (the Hebrew negation “and not”) ratio skews heavily toward the Priestly corpus, the output highlights that as a key indicator. Such transparency prevents uninterpretable “AI says so” conclusions and lets scholars scrutinize—and, if necessary, contest—the model’s rationale.

4. Key Findings and Novel Insights

4.1 Confirming Scholarly Consensus

Primary Division: The AI reinforced the traditional split between Priestly texts and Deuteronomistic materials. Deuteronomy and the books of Joshua–Kings clustered together in their syntactic and lemma‐frequency profiles, distinct from Levitical passages.
Function‐Word Differentiation: Words as small as “כי” (“for/that”) and “אל” (“to/against”) occurred at significantly different rates across texts—differences scholars had intuited but never quantified reliably.

4.2 Resolving Debated Chapters

Ark Narrative Exception: The story of the Ark of the Covenant appears in both 1 Samuel (Chapters 4–7) and 2 Samuel (Chapters 6–7). By comparing their feature vectors, the AI found 1 Samuel’s Ark passages aligned poorly with any of the three corpora—suggesting a later interpolation or independent authorial layer. Conversely, 2 Samuel’s Ark chapter resembled Deuteronomistic patterns, supporting source-critical arguments that 2 Samuel 6–7 was a retelling by Deuteronomistic redactors rather than original Priestly scribes.
Twelve Minor Prophets Attribution: While this study focused on the Enneateuch, preliminary analyses (not yet peer-reviewed) indicate that Habakkuk and Zephaniah exhibit surprising statistical proximity to the Deuteronomistic History—fueling fresh debates on whether those prophetic texts emerged from the same scribal circle that produced Joshua–Kings.

4.3 Explaining the “Why” Behind Every Assignment

By isolating the top 10 lemmas and three syntactic patterns driving each chapter’s classification, the model let scholars ask: “Why did the AI assign Judges 5 to the Priestly tradition?” Answer: Judges 5’s unusually high ratio of the word “מלחמה” (“war”) coupled with verse‐initial particle “ותאמר” (“and she said”) matched known Priestly narrative constructs. Armed with this rationale, scholars can revisit critical questions: Is Judges 5’s wartime phrasing a true Priestly hallmark, or simply a narrative convention copying Priest-like language?

A cozy setup of Bible reading with a cup of coffee, evoking a peaceful morning routine.

5. Implications, Limitations, and Future Directions

5.1 Broader Impact on Biblical Studies

Objective Cross‐Validation: Textual critics can now complement traditional source‐critical methods with quantitative evidence—reducing reliance on subjective stylistic inferences.
Dating Textual Layers: If a disputed chapter clusters with an earlier corpus (e.g., Deuteronomic), that suggests it pre-dated later editorial layers. Conversely, outlier chapters may represent subsequent redactions.
Forgery Detection and Fragment Authentication: As Faigenbaum-Golovin noted, the same approach could gauge whether a newly discovered parchment fragment (e.g., from the Dead Sea region) truly emanates from a known scribal tradition or is anachronistic.

5.2 Key Limitations to Acknowledge

Textual Transmission Noise: The Hebrew Bible exists today as medieval manuscript traditions (Leningrad Codex, Aleppo Codex). Centuries of copying introduced orthographic variants (matres lectionis, diacritics), editorial harmonizations, and even theological redactions. The AI model attempted to mitigate these by focusing on function words (less likely to be “corrected”), but some signal distortion remains unavoidable.
Limited Data for Short Passages: Chapters shorter than 30 verses—like Obadiah or Nahum—yield fewer data points. The model’s authors note a potential “edge effect”: very short samples can produce inflated similarity scores if a few characteristic words appear by chance. They recommend using a minimum-length threshold of 200 tokens for reliable authorship assignments.
Corpus Bias: The training set comprised only unanimously agreed-upon chapters. This “sampling bias” may obscure smaller sub-traditions (e.g., transitional passages mixing Deuteronomic and Priestly features). Thus, chapters that genuinely straddle two traditions might be forced into one category, glossing over nuanced editorial layering.

5.3 Future Expansions and Cross-Text Applications

Dead Sea Scrolls and Second Temple Literature: Faigenbaum-Golovin’s team plan to apply their model to fragments of Isaiah and Psalms from the Dead Sea Scrolls—identifying whether those copies reflect the same scribal styles as the Masoretic Text or represent independent textual families.
Christian New Testament Authorship Questions: A similar methodology could test Pauline, Johannine, or Petrine authorships by comparing undisputed epistles (e.g., Romans) to contested books (e.g., Hebrews), illuminating pseudonymous practices in early Christianity.
Legal and Diplomatic Papyrus Analysis: Beyond the Bible, classical texts—Athenian decrees, Hittite treaties—often survive in fragmentary papyri. AI stylometry might sort genuine royal edicts from forgeries or later copies, aiding historians and archaeologists.

Frequently Asked Questions

Q: How does this AI approach differ from previous stylometric analyses?
A: Traditional stylometry often uses broad features—average sentence length, unique vocabulary counts, or author-specific word lists. The Duke team’s model zeroed in on (1) high-frequency function words and (2) fine‐tuned syntactic patterns that persist even in very short biblical passages. Critically, their model remains transparent: for each attribution, you can see exactly which words (e.g., “כי,” “מלך,” “ואתם”) and which verse-opening tokens (e.g., “ויהי־” “ודבר־”) drove the decision.

Q: Can this model definitively prove a chapter was—or wasn’t—written by a particular author?
A: No. AI results offer probabilistic assignments, often with 80–95% confidence for chapters of sufficient length. Instantly labeling something “Priestly” or “Deuteronomic” risks overstating certainty. Scholars must interpret AI outputs alongside archaeological evidence, ancient citations (e.g., the Septuagint’s rendering), and historical context (political events, temple reforms). Still, AI provides a powerful, objective quantitative lens on a debate long dominated by qualitative assessments.

Q: Why focus on three “scribal traditions” and not more granular subgroups?
A: The three-corpus structure aligns with the most robust consensus zones: Priestly (ritual-focused), Deuteronomistic (law-oriented historical), and the broader Deuteronomistic History (Joshua–Kings). Prior source-critical schools argued for even more divisions (J, E, D, P). However, the team consolidated J and E within the broader Deuteronomistic corpus—because function-word and pattern-based differences between J and E proved too subtle to detect reliably with the limited biblical Hebrew data. In effect, the AI’s three-category model balances sensitivity (detecting known major divisions) with statistical reliability given short texts.

Q: How did the model account for later editorial changes—like Masoretic marginal notes or Scribal vowels?
A: They deliberately stripped Tiberian vowel markings (niqqud) from the input, retaining only consonantal text. Similarly, they excluded portions of text known to be later medieval scribal additions (p’shat vs. d’rash variants). The focus on consonantal “proto-text” allowed them to capture vestiges of original scribal idiolects rather than later grammatical overlays. However, they caution that some grade-A consonsantal variants (e.g., “יהוה” vs. “יהוהי”) remain post-editorial, so absolute “first-hand” authorial signals might be partially obscured.

Q: What minimum text length yields reliable AI attribution?
A: Based on bootstrapped accuracy tests, chapters need at least 200 well-formed tokens (words) to achieve ≥ 80% classification confidence. For very short books (e.g., Obadiah’s 21 verses), the model’s assignments remain tentative. The authors recommend treating those small texts as “low-confidence” cases until corroborating evidence—like aligning with known Dead Sea Scroll fragments—emerges.

Q: Could this method misclassify translated passages in other languages?
A: Yes. The model is tailored to biblical Hebrew—its morphology, particle usage, and syntax. Translating the same text into Greek or Latin alters the function-word inventory entirely. Even within Hebrew, a Greek-influenced Septuagint (LXX) rendering might mask original Hebrew markers. For cross-lingual applications, one must train language-specific models on parallel corpora (e.g., Hebrew Leningrad text vs. LXX fragments) or rely on cognate tracking; but those methods lie beyond this study’s scope.

Conclusion: A New Paradigm for Ancient Text Analysis

By combining AI, statistical modeling, and linguistic expertise, the international research team has delivered a rigorous, explainable method for untangling authorship in one of history’s most studied corpora. Their findings strengthen key divisions—Priestly vs. Deuteronomistic—while illuminating anomalous passages (like 1 Samuel’s Ark episode) that resist easy categorization. Crucially, the model does not supplant traditional exegesis; it complements it, offering data-driven evidence to guide scholars.

As AI‐enabled stylometry expands to other ancient and medieval texts, we may uncover hidden authorial networks across cultures—from verifying Dead Sea Scroll fragments to authenticating Lincoln’s disputed letters. In the realm where faith, history, and literature intersect, data and humanities now collaborate, revealing that—even in sacred scriptures—machine vision can illuminate human hands centuries old.

A robotic hand reaching towards a glowing circle on a pastel abstract background, hinting at advanced technology.

Sources PHYS