When AI Meets the Archive: Building the S.R. Crockett Short Story Database

How Artificial Intelligence and Human Expertise Combined to Map a Literary Landscape

In the digital humanities, there’s a persistent question: can artificial intelligence truly understand literature? The answer, we’ve discovered whilst cataloguing S.R. Crockett’s short story collections (1893-1910), is more nuanced than a simple yes or no. The real breakthrough comes not from choosing between human expertise and AI capability, but from understanding what each does brilliantly—and where each needs the other.

The Challenge: 115 Stories, Six Collections, 17 Years

When we set out to create a comprehensive database of Samuel Rutherford Crockett’s short fiction, the task seemed straightforward enough. We had six published collections spanning 1893 to 1910, containing 115 stories. We needed to determine where each story was set—a basic requirement for understanding any regional writer’s work.

What we discovered was that this seemingly simple task would become a masterclass in the necessity of collaboration between artificial intelligence and traditional scholarly expertise. The project knowledge bank contained collection introductions, scholarly analyses, and digital editions of the texts. Claude, Anthropic’s AI assistant, could search this material with remarkable speed. But speed, as we learnt, isn’t everything.

What AI Does Brilliantly: Pattern Recognition at Scale

The initial phase demonstrated AI’s extraordinary capability for rapid pattern identification. Using the project knowledge bank, Claude identified geographical settings for 83% of the 115 stories within minutes—work that would have taken a human researcher days or weeks of close reading and note-taking.

The AI excelled at:

Systematic searching across multiple documents: Claude could simultaneously search collection introductions, scholarly analyses, and textual material, cross-referencing mentions of place names and geographical markers across the entire corpus.

Consistent categorisation: Once parameters were established, the AI applied them uniformly. Every story was evaluated using the same criteria, eliminating the natural variation that creeps into human analysis over extended periods.

Identifying explicit patterns: When Crockett used his recurring fictional locations—Cairn Edward (representing Castle Douglas), Drumquhat, or Whinnylliggate—Claude recognised these patterns across collections and noted their consistency over 17 years of publication.

Quantitative analysis: The AI could instantly calculate percentages, track collection-by-collection variations, and identify trends. For instance, it noted that The Stickit Minister and Some Common Men (1893) was 91.7% Galloway-focused, whilst Bog Myrtle and Peat (1895) was the most geographically diverse at only 65.5% Galloway settings.

This rapid processing provided an excellent foundation. We had a clear picture of most of the corpus, with geographical settings identified and patterns emerging. But we weren’t finished—far from it.

Where AI Stumbled: The Critical 17%

The remaining 17% of stories revealed AI’s limitations with striking clarity. These gaps weren’t random; they clustered in predictable areas that highlighted the difference between data processing and genuine understanding.

The Love Idylls problem: This 1901 collection was poorly represented in the project knowledge bank. Where human expertise recognises that absence of evidence requires alternative research strategies, AI can only work with what it has access to. Claude couldn’t invent information that wasn’t in the knowledge bank.

The English setting blind spot: English locations were completely absent from initial results—not because these stories don’t exist (five stories are set in England), but because the available project knowledge materials didn’t emphasise them. A human reader would have noticed these stories during initial readings; AI only found what its searches could explicitly identify.

Nuanced geographical distinctions: The difference between ‘Austrian Alps’ and generic ‘Europe’, or between ‘Prussia’ and generic ‘Germany’, requires contextual knowledge. AI struggles with such specificity when source materials use varying terms.

Character-focused narratives: Stories like ‘Rosemary—That’s for Remembrance’ or ‘The Terror of Enderby’s’ are primarily about character rather than place. Whilst a human reader immediately grasps that setting is secondary in these pieces, AI searches for explicit geographical markers that simply aren’t emphasised in the text.

Human Expertise: The Irreplaceable Element

This is where 30 years of specialist knowledge became essential. The world expert curator brought several critical capabilities:

Comprehensive knowledge of the complete corpus: Unlike AI, which only knows what’s in its immediate knowledge bank, the curator has read every Crockett work multiple times, remembers marginal details, and understands context that extends beyond available documentation.

Interpretation of absence: When evidence is missing, human expertise can make informed judgements. The curator could confidently identify settings for the Love Idylls stories by drawing on knowledge of Crockett’s broader patterns, publishing context, and thematic concerns.

Understanding of historical publishing practices: The curator recognised that collections represented only a fraction of Crockett’s magazine output—perhaps one-fifth or one-tenth of his total short fiction. This contextual understanding prevented over-interpretation of the collected stories as representing Crockett’s complete short fiction output.

Qualitative judgement: Questions like whether to classify ‘Little Dublin’ (in ‘Barracloughs’) as Galloway or as a distinct urban Scottish category required interpretative decisions based on understanding of Crockett’s social and geographical concerns, not just data points.

Recognition of subtlety and ambiguity: Stories set in fictional ‘Quarrelwood’ (conflating Penicuik with the Glenkens) or narratives split between locations required nuanced geographical judgements that AI couldn’t make independently.

The Collaborative Breakthrough

The real power emerged when we combined AI’s rapid processing with human expertise’s depth and judgement. The final, complete analysis revealed findings that neither approach could have achieved alone:

The initial AI estimate suggested 65% of stories were set in Galloway. The final, expert-confirmed figure was 73.9%—a significant underestimation of nearly 9 percentage points. This wasn’t AI error, but rather a limitation of available documentation. The curator filled gaps, clarified ambiguities, and provided crucial information missing from digitised materials.

Similarly, the ‘unclear’ category—initially at 17%—was reduced to zero. Not because AI became better at searching, but because human expertise could make informed judgements where documentation was thin.

The collaborative methodology produced insights neither party could have achieved alone:

Complete geographical mapping of all 115 stories
Recognition of Galloway dominance (73.9%) whilst avoiding stereotype
Identification of Edinburgh as ‘urban counterpoint’ (9.6%)
Discovery of late-career English settings (appearing only from 1908)
Understanding of collection-by-collection evolution

This project offers several crucial insights for anyone working at the intersection of AI and humanities research:

1. AI accelerates, but doesn’t replace: Rapid pattern identification and systematic searching are invaluable, but they’re starting points, not conclusions.

2. Gaps in documentation are different from gaps in knowledge: AI can only work with available materials. Human experts draw on broader understanding.

3. Interpretation requires expertise: Data points become meaningful through contextual understanding, historical knowledge, and qualitative judgement.

4. The best results come from iterative collaboration: Starting with AI’s rapid analysis, then refining through expert knowledge, creates more comprehensive and nuanced understanding than either approach alone.

5. Transparency about methodology matters: Being explicit about what AI contributed (83% initial identification) and what human expertise added (17% completion plus refinement) builds confidence in the results.

Whilst our specific focus was mapping S.R. Crockett’s fictional geography, the methodology has broader application. Any project involving large-scale literary analysis, historical documentation, or cultural heritage preservation can benefit from this hand-in-hand approach. AI’s speed and systematic processing make previously impossible projects feasible. A complete geographical analysis of 115 stories, executed manually, might take weeks. But human expertise remains essential for accuracy, nuance, and confident interpretation.

The S.R. Crockett Cultural Legacy Charity’s experience demonstrates that the future of digital humanities isn’t about choosing between traditional scholarship and technological innovation. It’s about recognising what each does superbly, understanding their limitations, and bringing them together in genuine collaboration.

As we continue developing resources for the S.R. Crockett online museum, this collaborative methodology provides a template. AI handles systematic searching, pattern identification, and quantitative analysis. Human expertise provides context, interpretation, gap-filling, and qualitative judgement. Together, they create resources that are both comprehensive and reliable—broad in scope yet deep in understanding.

The spreadsheet and resource bank we’ve created for Crockett’s short stories represents more than just data. It’s evidence of what becomes possible when we stop asking whether AI or humans are better at literary analysis, and start asking how they can work together to serve scholarship and preserve cultural heritage.

In mapping Crockett’s literary landscape, we’ve also mapped a new territory: the productive collaboration between artificial intelligence and human expertise in understanding our literary past. The journey from 83% to 100% completion wasn’t just about filling gaps—it was about discovering that the most powerful tool for understanding literature might be the collaboration itself.

Hand in Hand #1

When AI Meets the Archive: Building the S.R. Crockett Short Story Database

How Artificial Intelligence and Human Expertise Combined to Map a Literary Landscape

The Challenge: 115 Stories, Six Collections, 17 Years

What AI Does Brilliantly: Pattern Recognition at Scale

Where AI Stumbled: The Critical 17%

Human Expertise: The Irreplaceable Element

The Collaborative Breakthrough

In and around Galloway Woods

The Heather Lintie

“Why English Linguistic and Literary Analysis Fails Scottish Literature: Teaching an AI to Recognise Scots Humour”

Romance and Realism

Hefted to Galloway

Packman’s Pool

When AI Meets the Archive: Building the S.R. Crockett Short Story Database

How Artificial Intelligence and Human Expertise Combined to Map a Literary Landscape

The Challenge: 115 Stories, Six Collections, 17 Years

What AI Does Brilliantly: Pattern Recognition at Scale

Where AI Stumbled: The Critical 17%

Human Expertise: The Irreplaceable Element

The Collaborative Breakthrough

Similar Posts