The Digital Humanities and Interpretation

Stanley Fish on education, law and society.

The question I posed at the conclusion of my last post was how do the technologies wielded by digital humanities practitioners either facilitate the work of the humanities, as it has been traditionally understood, or bring about an entirely new conception of what work in the humanities can and should be? I’m going to sneak up on that question by offering a piece of conventional (i.e., non-digital) literary analysis that deals, as the digital humanities do, with matters of statistical frequency and pattern.

Halfway through “Areopagitica” (1644), his celebration of freedom of publication, John Milton observes that the Presbyterian ministers who once complained of being censored by Episcopalian bishops have now become censors themselves. Indeed, he declares, when it comes to exercising a “tyranny over learning,” there is no difference between the two: “Bishops and Presbyters are the same to us both name and thing.” That is, not only are they acting similarly; their names are suspiciously alike.

In both names the prominent consonants are “b” and “p” and they form a chiasmic pattern: the initial consonant in “bishops” is “b”; “p” is the prominent consonant in the second syllable; the initial consonant in “presbyters” is “p” and “b” is strongly voiced at the beginning of the second syllable. The pattern of the consonants is the formal vehicle of the substantive argument, the argument that what is asserted to be different is really, if you look closely, the same. That argument is reinforced by the phonological fact that “b” and “p” are almost identical. Both are “bilabial plosives” (a class of only two members), sounds produced when the flow of air from the vocal tract is stopped by closing the lips.

There is more. (I know that’s not what you want to hear.) In the sentences that follow the declaration of equivalence, “b’s” and “p’s” proliferate in a veritable orgy of alliteration and consonance. Here is a partial list of the words that pile up in a brief space: prelaty, pastor, parish, Archbishop, books, pluralists, bachelor, parishioner, private, protestations, chop, Episcopacy, palace, metropolitan, penance, pusillanimous, breast, politic, presses, open, birthright, privilege, Parliament, abrogated, bud, liberty, printing, Prelatical, people.

Even without the pointing provided by syntax, the dance of the “b’s” and “p’s” carries a message, and that message is made explicit when Milton reminds the presbyters that their own “late arguments … against the Prelats” should tell them that the effort to block free expression “meets for the most part with an event utterly opposite to the end which it drives at.” The stressed word in this climactic sentence is “opposite.” Can it be an accident that a word signifying difference has two “p’s” facing and mirroring each other across the weak divide of a syllable break? Opposite superficially, but internally, where it counts, the same.

To my knowledge, I am the first critic to put forward this interpretation of the sequence. However, that claim, the claim of originality, brings with it its own problems, at least in the context of literary criticism as it has been practiced since the late 1930s. Doesn’t the fact that for 368 years only I have noticed the b/p pattern suggest that it is without significance, an accidental concatenation of consonants? Aren’t I being at best over-ingenious and at worst irresponsibly arbitrary?

In order to answer such questions, I would have to demonstrate that Milton self-consciously put the pattern there and made it the formal bearer of his argument. I would have to build a chain of inference that led from the undoubted, countable presence of the “b’s” and “p’s” in the passage to Milton’s intention and back again. Were I to attempt to fashion that chain (don’t worry!), I would begin by citing the last line of a Milton sonnet — “New Presbyter is but old Priest writ large” — and go on to instance other places in his poetry and prose where Milton plays with sounds in a manner he would have learned from the rhetorical manuals we know he studied at school.

The requirement I would have to satisfy illustrates the problem of formalist analysis, analysis that wants to move from the noting of formal properties to the drawing of interpretive conclusions: given that there are only 26 letters (and 21 consonants) in the alphabet, it is inevitable that in a text of any size patterns of repetition and frequency will abound. The trick is to separate the patterns produced by the scarcity of alphabetic resources (patterns to which meaning can be imputed only arbitrarily) from the patterns designed by an author.

The usual way of doing this is illustrated by my example: I began with a substantive interpretive proposition — Milton believes that those who suffered under the tyrannical censorship of episcopal priests have turned into their oppressors despite apparent differences in worship and church structure — and, within the guiding light, indeed searchlight, of that proposition I noticed a pattern that could, I thought be correlated with it. I then elaborated the correlation.

The direction of my inferences is critical: first the interpretive hypothesis and then the formal pattern, which attains the status of noticeability only because an interpretation already in place is picking it out.

The direction is the reverse in the digital humanities: first you run the numbers, and then you see if they prompt an interpretive hypothesis. The method, if it can be called that, is dictated by the capability of the tool. You have at your disposal an incredible computing power that can bring to analytical attention patterns of sameness and difference undetectable by the eye of the human reader. Because the patterns are undetectable, you don’t know in advance what they are and you cannot begin your computer-aided search (called text-mining) in a motivated — that is, interpretively directed — way. You don’t know what you’re looking for or why you’re looking for it. How then do you proceed?

The answer is, proceed randomly or on a whim, and see what turns up. You might wonder, for example, what place or location names appear in American literary texts published in 1851, and you devise a program that will tell you. You will then have data.

But what do you do with the data?

The example is not a hypothetical one. It is put forward by Matthew Wilkens in his essay “Canons, Close Reading, and the Evolution of Method” (“Debates in the Digital Humanities,” ed. Matthew Gold, 2012). And Wilkens does do something with the data. He notices that “there are more international locations than one might have expected” — digital humanists love to be surprised because surprise at what has been turned up is a vindication of the computer’s ability to go beyond human reading — and from this he concludes that “American fiction in the mid-nineteenth century appears to be pretty diversely outward looking in a way that hasn’t received much attention.”

More international locations named than we would have anticipated; therefore mid-19th century American fiction is outward-looking, a fact we would not have “discovered” were it not for the kind of attention a computer, as opposed to a human reader, is capable of paying.

But does the data point inescapably in that direction? Don’t we have to know in what novelistic situations foreign lands are alluded to and by whom? If the international place names are invoked by a narrator, it might be with the intention not of embracing a cosmopolitan, outward perspective, but of pushing it away: yes, I know that there is a great big world out there, but I am going to focus in on a landscape more insular and American. If a character keeps dropping the names of towns and cities in Europe, Africa and Asia, the novelist could be alerting us to his pretentiousness and admonishing the reader to stay close to home. If a more sympathetic character daydreams about Paris, Istanbul and Moscow, she might be understood as caressing the exotic names in rueful recognition of the experiences she will never have.

The list of possible contextual framings is infinite, but some contextual framing is necessary if we are to move from noticing the naming of international locations to the assigning of significance. Otherwise we are asserting, without justification, a correlation between a formal feature the computer program just happened to uncover and a significance that has simply been declared, not argued for. (Frequency is not an argument.) Don’t we have to actually read the books, before saying what the patterns discovered in them mean?

No, says Wilkens (and many in the field agree with him). We have been working, he declares, with too few texts — a handful of “purportedly … representative works” — and we have drawn from that small sample conclusions we might radically revise were we to have in our contemplation the totality of texts produced in 19th-century America. The problem is that no reader could possibly process that totality, never mind discern the patterns that exist in it on a level too minute and deep for human apprehension.

This is where the computer comes to the rescue. Digitize the entire corpus and you can put questions to it and get answers in a matter of seconds. We can, says Wilkens, “look for potentially interesting features without committing months and years to extracting them via close reading.” The Stanford scholar Franco Moretti calls this method of analyzing huge bodies of data “distant reading” (“Graphs, Maps, Trees,” 2005). The Shakespearian scholar Martin Mueller briskly urges humanists to “stop reading” (“Digital Shakespeare or Toward a Literary Informatics”). So much for the old humanist program.

Wilkens acknowledges that we may “still need to read some of the texts closely,” and he admits that the more humanists turn to “algorithmic and quantitative analysis of piles of texts,” the “worse close readers” they will become. He sees it as a trade-off between a skill practiced on small samples by a priesthood of ivory-tower academics and a larger-scale enterprise that has the promise of encompassing all of knowledge. Wilkens thinks that’s a good bargain — “a few more numbers in return for a bit less text” — and declares that “We gain a lot by having available to us the kinds of evidence text-mining … provides.” The result, he predicts, will be “a more useful humanities scholarship.”

Words like “useful” and “evidence” indicate that Wilkens is still holding out for an interpretive payoff (evidence has to be evidence of something) although he concedes that as of yet that payoff has been “pretty limited.” Quite another route of success is imagined by Stephen Ramsay, perhaps the most sophisticated theorist of the burgeoning field. Ramsay is not concerned that computer-assisted analysis has not yet delivered an interpretive method, a way of pruning the myriad paths that are opened up by the generation of data. He doesn’t want to narrow interpretive possibilities, he wants to multiply them.

When another scholar worries that if one begins with data, one can “go anywhere,” Ramsay makes it clear that going anywhere is exactly what he wants to encourage. The critical acts he values are not directed at achieving closure by arriving at a meaning; they are, he says, “ludic” and they are “distinguished … by a refusal to declare meaning in any form.” The right question to propose “is not ‘What does the text mean?’ but, rather, ‘How do we ensure that it keeps on meaning’ — how … can we ensure that our engagement with the text is deep, multifaceted, and prolonged?” (“Toward an Algorithmic Criticism,” in Literary and Linguistic Computing, vol. 18, No. 2, 2003)

The answer is not to go to the text “armed with a hypothesis” but “with a machine that is ready to reorganize the text in a thousand different ways instantly.” Each reorganization (sometimes called a “deformation”) creates a new text that can be reorganized in turn and each new text raises new questions that can be pursued to the point where still newer questions emerge. The point is not to get to a place you had in mind and then stop; the point is to keep on going, as, aided by the data-generating machine, you notice this and then notice that which suggests something else and so an, ad infinitum.

It is, he explains, like browsing in a store. The salesclerk asks, “Can I help you?”, a question that assumes you came in with a definite purpose. You say, “No, I’m just browsing,” which Ramsay glosses as “(a) I don’t know what’s here and (b) I don’t know what I’m looking for.” In effect, he concludes, “I’m just screwing around,” picking up this item, and moving randomly to another and another and another. “Look at this. Then look at that.” That’s the method or anti-method; just try one algorithm and then another and see what the resulting numbers suggest (not prove) in the way of an interpretive hypothesis. And then do it again. Are we ready, he asks, “to accept surfing and stumbling — screwing around, broadly understood — as a research methodology?” (“The Hermeneutics of Screwing Around; or What You Do With a Million Books”) If we are ready, computer programs are ready to help us.

Ramsay accepts the criticism of those who say that readings of texts cannot “be arrived at algorithmically” (“Reading Machines,” 2011). This incapacity, however, doesn’t worry him, because the value of numbers for him is not that they produce or confirm readings, but that they provoke those who look at them to flights of interpretive imagination: “algorithmic transformations can provide the alternative visions that give rise to … readings” (“Reading Machines”). There is, he says, “no end of our understanding” of texts and concepts. There are only new noticings which … are practically discernible only through algorithmic means” (“Reading Machines”).

Ramsay presents these ideas in two tonal registers. At times he argues that however alien algorithmic criticism may seem, it is really a technologically ramped up version of what literary criticism has always been. Although the rhetoric of traditional literary criticism emphasizes getting at the truth about a text as its end point, in practice what critics do is try out one hypothesis, and then another, and in the process re-characterize or deform the text. We say about a poem, let’s look at this as an erotic poem, or a poem about markets, or a poem about literary imagination, and then, under the impetus of these various hypotheses, we rewrite the poem again and again. We produce new poems. “All criticism and interpretation is deformance.” What computers do is multiply the ways in which this “readerly process of deformation,” this opening up of “serendipitous paths” can be performed. We should understand computer-based criticism to be what it has always been: “human-based criticism with computers” (“Reading Machines”).

But in another mood, Ramsay is more messianic. By embracing rather than warding off alternative interpretive paths, algorithmic criticism “may come to form the basis for new kinds of critical acts,” acts that do not merely facilitate literary analysis but “build a platform for social networking and self-expression” (“Reading Machines”). Prompted by the numbers you try out something and you call across the room to a co-worker or to a colleague in another country and ask, “Here is what I found, what did you find?” (“The Hermeneutics of Screwing Around”). And that colleague asks another who asks another who, well, you get the point. The anti-methodology that refuses closure and insists on fecundity facilitates — no, demands — sharing, and builds an ever-expanding community of digital fellowship, an almost theological community in which everyone explores in “the inexhaustible nature of divine meaning” (“Reading Machines”).

These two visions of the digital humanities project — the perfection of traditional criticism and the inauguration of something entirely new — correspond to the two attitudes digital humanists typically strike: (1) we’re doing what you’ve always been doing, only we have tools that will enable you to do it better; let us in, and (2) we are the heralds and bearers of a new truth and it is the disruptive challenge of that new truth that accounts for your recoiling from us. It is the double claim always made by an insurgent movement. We are a beleaguered minority and we are also the saving remnant.

But whatever vision of the digital humanities is proclaimed, it will have little place for the likes of me and for the kind of criticism I practice: a criticism that narrows meaning to the significances designed by an author, a criticism that generalizes from a text as small as half a line, a criticism that insists on the distinction between the true and the false, between what is relevant and what is noise, between what is serious and what is mere play. Nothing ludic in what I do or try to do. I have a lot to answer for.

The New York Times

Opinionator | Mind Your P’s and B’s: The Digital Humanities and Interpretation

Mind Your P’s and B’s: The Digital Humanities and Interpretation

What's Next