A data science and machine learning approach to continuous analysis of Shakespeare's plays

Swisher, Charles; Shamir, Lior

doi:10.46298/jdmdh.10829

Citation Details

This content will become publicly available on July 13, 2024

A data science and machine learning approach to continuous analysis of Shakespeare's plays

The availability of quantitative text analysis methods has provided new waysof analyzing literature in a manner that was not available in thepre-information era. Here we apply comprehensive machine learning analysis tothe work of William Shakespeare. The analysis shows clear changes in the styleof writing over time, with the most significant changes in the sentence length,frequency of adjectives and adverbs, and the sentiments expressed in the text.Applying machine learning to make a stylometric prediction of the year of theplay shows a Pearson correlation of 0.71 between the actual and predicted year,indicating that Shakespeare's writing style as reflected by the quantitativemeasurements changed over time. Additionally, it shows that the stylometrics ofsome of the plays is more similar to plays written either before or after theyear they were written. For instance, Romeo and Juliet is dated 1596, but ismore similar in stylometrics to plays written by Shakespeare after 1600. Thesource code for the analysis is available for free download.

Award ID(s):: 2148878

NSF-PAR ID:: 10491973

Author(s) / Creator(s):: Swisher, Charles; Shamir, Lior

Publisher / Repository:: EPIsciences

Date Published:: 2023-07-13

Journal Name:: Journal of Data Mining & Digital Humanities

Volume:: 2023

ISSN:: 2416-5999

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on July 13, 2024
Journal Article:
https://doi.org/10.46298/jdmdh.10829

More Like this