Skip to content

PLOS is a non-profit organization on a mission to drive open science forward with measurable, meaningful change in research publishing, policy, and practice.

Building on a strong legacy of pioneering innovation, PLOS continues to be a catalyst, reimagining models to meet open science principles, removing barriers and promoting inclusion in knowledge creation and sharing, and publishing research outputs that enable everyone to learn from, reuse and build upon scientific knowledge.

We believe in a better future where science is open to all, for all.

PLOS BLOGS EveryONE

A Way with Words: Data Mining Uncloaks Authors’ Stylistic Flair

First_Folio_-_Folger_Shakespeare_Library_-_DSC09660

As any writer or wordsmith knows, searching for the right word can be a painful struggle. Here’s comforting news: word choice may be the key to understanding your stylistic flair.

New research in the field of text mining suggests that distinct writing styles are discernible by word selection and frequency. Even the use of common words, such as “you” and “say,” can help distinguish one writer from another. To learn more about style, the authors of a recent PLOS ONE paper turned to the famed lord of language, William Shakespeare.

The researchers assembled a pool of 168 plays written during the 16th and 17th centuries. After accounting for duplicates, 55,055 unique words were identified and then cross-referenced against the work of four writers from that time period: William Shakespeare, Ben Jonson, Thomas Middleton, and John Fletcher. The researchers counted how often these writers used words from the pool and ranked words by their frequency. Lists of twenty of the most-used and least-used words were then compiled for each writer and considered “markers” of their individual styles.

Fletcher, for one, frequently used the word “ye” in his plays, so a relatively high frequency of “ye” would be a strong marker of Fletcher’s particular writing style. Similarly, Middleton often used “that” in the demonstrative sense, and Jonson favored the word “or.” Shakespeare himself used “thou” the most frequently, and the word “all” the least.

In addition to looking at individual word use, the researchers analyzed specific works where the writer’s style changed significantly, such as in Middleton’s political satire “A Game at Chess,” which was notably different from his other works. They also compared word choice between writers. Their findings indicate that, unlike his contemporaries, Shakespeare’s style was marked more by his underuse of words rather than his overuse. Take, for example, Shakespeare’s use of “ye.” Unlike Fletcher, who used this word liberally, “ye” is one of Shakespeare’s least frequently used words.

Such analyses, the researchers suggest, may help with authorship controversies and disputes, but they can also address other concerns. In a post in The Conversation, the authors of this paper suggest that the mathematical method used to identify words as markers of style may also be helpful to identify biomarkers in medical research. In fact, the research team currently uses these methods to study cancer and the selection of therapeutic combinations, multiple sclerosis, and Alzheimer’s disease.

 

Citation: Marsden J, Budden D, Craig H, Moscato P (2013) Language Individuation and Marker Words: Shakespeare and His Maxwell’s Demon. PLoS ONE 8(6): e66813. doi:10.1371/journal.pone.0066813

Image: First Folio – Folger Shakespeare Library – DSC09660, Wikimedia Commons

Back to top