Introducing Digital Dostoevsky

What is Digital Dostoevsky?

Digital Dostoevsky is a computational text analysis project on a corpus of 5 novels and two novellas by Fyodor Dostoevsky. It is a digital humanities project which emerges out of our long-standing interest in traditional philological analysis. We are excited by how digital approaches such as TEI encoding, machine reading, and natural language processing can help to answer questions about the deep structure of Dostoevsky’s novels, questions about speech, character, space, temporality, affect, and fictionality, among other areas.

From 2019-2024, the project was hosted at the University of Toronto, led by Dr. Kate Holland, and supported by an Insight Grant from the Social Sciences and Humanities Research Council of Canada. At U of T, our funding supported a large team of undergraduate and graduate research assistants who assisted in our corpus encoding process. We published an article about our training methods as minimalist DH pedagogy in late 2025. In 2025, the project team was awarded a new Insight Grant from SSHRC. This funding will support the next phase of the project, both finalizing our TEI edition of Dostoevsky’s works and completing our book manuscript, Computational Dostoevsky. With the new funding, the project moved from U of T to the University of British Columbia, where it is now led by Dr. Katherine Bowers. Longtime team member Dr. Braxton Boyer also joined the project as a Postdoctoral Fellow in its new home at UBC.

Background

Computational text analysis has flourished in the last few years and many 19th-century writers now have their own digital editions and digital archives.  In the Russian context, computational text analysis seems like a natural fit, since Russian scholarship has a long tradition of textology; academic editions of canonical Russian works were produced with painstaking care by teams of editors throughout the Soviet period and beyond. Russia also has a strong tradition of computational methods in linguistics. The research questions which motivate our project are the same ones which scholars have been asking about Dostoevsky’s works for decades. Machine reading opens up possibilities for examining Dostoevsky’s corpus using technologies which neither the Formalists nor Bakhtin had at their disposal. Dostoevsky’s works are already available online. There is a wonderful digital edition of Dostoevsky’s Complete Works based at Petrozavodsk State University in Karelia here. This edition includes a digital concordance that can be used to parse the corpus. The academic Complete Works of Dostoevsky (both the 1972-1990 Soviet Academy of Sciences edition and the more recent Russian Academy of sciences edition that is still being created) is also online at the Russian Academy of Sciences (Pushkin House) and elsewhere. One aim of the Digital Dostoevsky project is to create a TEI digital edition of Dostoevsky’s works that prepares the ground for scholars beginning to work with computational methods. In addition to our analysis of the corpus, we hope that this project will serve as a resource for future projects like it.  

Our corpus

Our plain text corpus documents are taken from the canonical Soviet Academy of Sciences 30-volume edition of the Complete Works of Dostoevsky. We stripped the texts of their commentary and converted them to plain text files. So far, our corpus consists of five novels and two novellas: The Double, Notes from Underground, Crime and Punishment, The Idiot, Demons, The Adolescent, and The Brothers Karamazov. We may eventually add to them with the rest of Dostoevsky’s works, as well as adding translations in English and possibly even French.

Our encoding

We have fully encoded our corpus using TEI-XML (click here to find out more about this methodology) as of summer 2025. This includes The Double, Notes from Underground, and Dostoevsky’s five large novels (Crime and Punishment, The Idiot, Demons, The Adolescent, and Brothers Karamazov). For our “basic” encoding, we started with TEI tagging formal structures such as paragraphs and speech as well as named entities), and have moved on to places, direct and indirect speech, addresser and addressees, and liminal spaces and states. As we work through research questions using our corpus, we will create specially tagged files to address those questions. We published our first research article, on liminality in The Double, in October 2025.

Future plans

We are also exploring other computational methods beyond TEI tagging. We were part of a NEH funded institute based at Princeton, New Languages for NLP, which helped us to use natural language processing to build models to analyze Dostoevsky’s novels from the perspective of named entity recognition, named entity disambiguation, and other methodologies. We have done work in stylometry and plan to explore other methods moving forward.

Originally published in July 2021, updated in February 2026

2 thoughts on “Introducing Digital Dostoevsky

  1. Pingback: Tagging Speech in Dvoinik | Digital Dostoevsky

  2. Pingback: Welcome! – The Digital Gogol Project

Leave a comment