Tagging Speech in Dvoinik

In our most recent blogpost, “Encoding Dostoevsky,” you read that one of the first elements of Dvoinik that we decided to tag (apart from purely structural components like paragraphs) was speech. At the time, this seemed like a fairly straightforward endeavor — how hard can talking be, right? — but in hindsight this may have been the wrong thing to cut our TEI teeth on. Because as anyone who has read Dvoinik even halfway attentively can tell you: speech in this work can get a little complicated.

I’ll get back to the problems of speech, though. First I want to go over what the process of marking up a text for speech looks like.

On a basic level, speech tagging keeps track of a few things: you can mark whether a character is thinking inwardly or speaking out loud; who a character is speaking to (even if to him or herself); and whether any particular moment of thought or speech is direct or indirect. A text tagged for speech may look something like this.

Here you have dialogue between Goliadkin and Petrushka. Each individual bit of speech is marked by the <said> tag. Since the characters are speaking to each other, we’ve marked aloud=“true”. Then, depending on which character is speaking, you mark who=“#gol” or who=“#pet”. Next comes whether the speech is direct or indirect, and since in this instance the speech is direct we have marked direct=“true”. Who is being spoken to works the same as who is speaking, just using the “toWhom” tag, and finally we close off any bit of speech with a closing </said> tag. (The <p> and </p> in this example are paragraph tags.)

At this point in our project, we’ve almost completed our mark-up of Dvoinik, and not just for speech — in future blog posts you’ll also read about tagging for all of the names that appear in the text, as well as physical and liminal spaces. This means that we’re currently in the phase of figuring out what complex computer analysis can do with the document we’ve created (stay tuned!). But even relatively simple output based on our tagging can be interesting, as in the table below.

This table displays a count of the information contained in all the speech tags throughout the whole text of Dvoinik. That is, how many times any given character (the very left column) speaks directly, indirectly, out loud or not out loud (read: thinks).

As you might expect, Goliadkin speaks more than any other character. What was a little shocking to see, however, was how much more. You’ll notice as well that no other character besides Goliadkin has thoughts presented in the text, except for two imaginary characters (“imiz” stands for “imagined izvozchik” and “imdbl” stands for “imaginary double”); but since they’re part of Goliadkin’s imagination, they can’t truly be considered to be thinking anyway.

So what can we do with this information?

If you read our blog post titled “Introducing Digital Dostoevsky,” you will have seen that one of our main interests in computer analysis is how it can shed new light on questions or ideas from earlier, more traditional scholarship. The above table is, admittedly, quite simple and isn’t really a result or a finding in and of itself, but perhaps it hints at the potential that marking up Dostoevsky’s corpus for speech offers us in terms of thinking about, as one example, polyphony and dialogism. For instance, how does speech distribution comment on Bakthin’s concept of equally valid voices in Dostoevsky’s later novels? Or, in an avenue we haven’t pursued yet for Dvoinik, but that may be especially important for texts like Demons, how might speech marking be useful in analyzing the character of Dostoevsky’s narrator figure?

Aside from whatever results come from computer analysis, though, there is something else that is promising about marking up a text in TEI, especially if you are interested in thinking about how any of this technology could be used in a literature classroom.

We marked up Dvoinik “by hand,” so to speak. That means that we manually entered every single speech tag (again, see Figure 1) for every single occurrence of speech or thought in the text. This was a whole new level of close reading, and it forced us to pay attention to speech in ways we hadn’t before.

As only one example, establishing Goliadkin’s voice as opposed to the narrator’s was, at times, a torturous endeavor — this was one of those “problems” of tagging speech in Dvoinik I mentioned earlier. For all those instances in Figure 2 above where we have Goliadkin as speaking or thinking indirectly, we have to decide on and mark key words or phrases, or perhaps even sentiments, that hint to us that we are listening to Goliadkin’s voice and not the narrator’s. These words, phrases, and sentiments that give away a character’s indirect voice are called “clues” in TEI, and if scouring and marking up sections of narration for specific “clues” that pinpoint Goliadkin’s voice isn’t an exercise in close reading, then I don’t know what is. To get a sense of the possibilities of close reading through TEI, you can find a great example of an entire undergraduate literature course structured around marking a text for speech here.

There were other problems besides the narrator/Goliadkin relationship. How to mark a section of speech where you’re unsure whether Goliadkin is talking out loud or thinking (happens more than you might expect!)? What to do when Goliadkin is talking to a character in his imagination (Klara Olsufyevna, Petrushka, a magician) and that imaginary character talks back to him? Or how about when objects (a samovar) or glances are presented as speaking? And, above all, one has to wrestle with the fundamental question of whether or not, and when, Goliadkin’s double is real or imagined, with implications for speech marking arising no matter which interpretation you make.

In the end, and even without the high level of computer analysis we are working towards, TEI tagging opens up some interesting possibilities in answer to the seemingly simple question of “who is speaking?” We’re excited to explore these possibilities further, and especially across Dostoevsky’s entire corpus. Be on the lookout for what comes next.

Digital Dostoevsky

A blog chronicling the Computational Dostoevsky project

Tagging Speech in Dvoinik

One thought on “Tagging Speech in Dvoinik”

Leave a comment Cancel reply

Share this:

Related

One thought on “Tagging Speech in Dvoinik”

Leave a comment Cancel reply