Roll of the Topics: 5 10 20


Art Piece (Carrie Roy): Wood sculpture, black walnut, cherry, 26” x 15” x 13”

Dynamics of dice and numbers––one number sets new iterations in motion.

Literary Perspective (Catherine DeRose): “Roll of the Topics: 5 10 20” was inspired by our topic modeling nineteenth-century novels according to three different groupings. The sheer number of the results was, at first, overwhelming. It was unclear how to proceed and how to analyze without “cherry picking.” The results reminded me that digital analysis requires an openness in how I approach the data and in what conclusions I intend to draw from it. I was hoping to find distinct differences between decades, and instead, I was confronted with mass similarities. Perhaps unsurprisingly, the family unit (and mothers particularly), nobility, the country, and work all feature prominently in topics from every decade.

Statistics Perspective (Fred Boehm): Topic modeling is a statistical and computational approach to text analysis that aims to identify (unknown) collections of related words known as topics. One version of topic modeling is called latent dirichlet allocation (LDA). UC Berkeley's Michael Jordan and colleagues have used LDA to identify topics in collections of newspaper articles.

In the ten years since Jordan's team first reported LDA, researchers have extended the initial LDA approach to account for correlations among topics, changes in topics over time, and other elaborations. Furthermore, statisticians and computer scientists have applied topic modeling approaches to data that fall outside the traditional definition of a collection of texts. For instance, researchers at the University of Wisconsin-Madison have applied LDA to genetic data sets.

View samples of 5 topic word clouds
View samples of 10 topic word clouds
View samples of 20 topic word clouds

Additional images: