Word Length Landscapes: 4.7 4.2 3.7

WordLengthLandscape 4.7 4.2 3.7

Data is turned into a landscape with the mean word length measurements giving rise to area chart mountains––Shelley's Zastrozzi 1810 has a mean word length of 4.7 characters, Dickens' A Tale of Two Cities 1859 represents the middle at 4.2 characters, and Stevenson's The Beach 1893 has a mean of 3.7 characters, respectively. "Undistinguishable" is the longest word at 17 characters.

Art piece (Carrie Roy): Framed print 23” x 11”

Encounters with data, storms and changing environments.

Literary Perspective (Catherine DeRose): “Word Length Landscapes: 4.7 4.2 3.7” indicates that the nineteenth-century novel underwent vocabulary changes as the century progressed. On average, the words used in novels become steadily shorter over time. Percy Bysshe Shelley, a Romantic poet and novelist, uses the longest words on average, with his Gothic novel Zastrozzi (1810), having the longest word average. At the other end of the spectrum we have Robert Louis Stevenson, a late nineteenth-century writer, whose short story The Beach Of Falesá (1892), contains the shortest words on average. The novel that falls directly in the middle of the list is Charles Dickens’s A Tale of Two Cities (1859). Dickens’s corpus actually functions as a check on our methodology and the results we’re seeing since his novels span the middle of the nineteenth century. We can actually track his writing career by looking at word length; his earlier novels, such as Nicholas Nickleby (1839) and Pickwick Club (1837), have longer words on average while his later novels, like Great Expectations (1861) and Bleak House (1853), have shorter words.

This finding raises provocative questions and avenues for literary exploration. Why would shorter words become more popular in the novel? What does this change reflect? One possibility is that this change illustrates the rising popularity of the novel and its dissemination to an increasingly literate and diverse readership. The change in vocabulary might also reflect changing genres (the “Golden Age” of children’s literature began in the late nineteenth-century) and an increase in serialization.

Statistics Perspective (Fred Boehm): "Word Length Landscapes" is inspired by word length histograms from our analysis of Victorian novels. In our histograms, we illustrate the relationships between frequencies of words of lengths 1,2,3,4,... We portray the histogram as a Victorian landscape with heights corresponding to word length frequencies. For instance, in Shelley's "Zastrozzi", the mean word length is 4.7, yet the range of heights in the corresponding landscape demonstrates the range of word lengths in Shelley's novel.


1. We ran the following code in R: mean_wordlength

2. to generate a database of mean word length and standard deviations for the corresponding texts (listed in chronological order)

3. we then selected the highest (Shelley Zastrozzi ), lowest (Stevenson The Beach Of Falesá), and middle (Dickens A Tale of Two Cities) examples of these values

4. Specific breakdowns of word length for the three novels were created through the following word length code: word length

5. we then charted the novel--specific values to generate shapes (mountains) for the word length landscape art piece and identified representative sentences from each to feature in the background.

* "Undistinguishable" was the longest word at 17 characters