MARGENTO/Paget/Inkpen Paper Accepted to FLAIRS-29

March 8th, 2016 margento

Automatic Classification of Poetry by Meter and Rhyme
MARGENTO (Chris Tanasescu), Bryan Paget, and Diana Inkpen

In this paper, we focus on large scale poetry classification
by meter. We repurposed an open source poetry
scanning program (the Scandroid by Charles O. Hartman)
as a feature extractor. Our machine learning experiments
show a useful ability to classify poems by
poetic meter. We also made our own rhyme detector
using the Carnegie Melon University Pronouncing Dictionary
as our primary source of pronunciation information.
Future work will involve classifying rhyme and
assembling a graph (or graphs) as part of the Graph
Poem Project depicting the interconnected nature of poetry
across history, geography, genre, etc.

The huge amount of data available in the digital age has attracted
the attention of major scholars and has developed
into its own research paradigm. There is no consensus as
to when data are large or complex enough to qualify as the
object of data-intensive research, especially since huge or
massive may mean completely different things in different
fields and disciplines, but Levallois, Steinmetz, andWouters
advance a relevant and potentially useful definition: “dataintensive
research [is] research that requires radical changes
in the discipline” involving “new, possibly more standardized
and technology-intensive ways to store, annotate, and
share data,” a concept that therefore “may point toward quite
different research practices and computational tools” (Levalois,
Steinmetz, and Wouters 2012). This paper introduces
our endeavour to redefine the scholarly approach to poetry
analysis by applying data-intensive research methods and
eventually mathematical graph theory.
The earliest stages of the Graph Poem Project (MARGENTO
2015) resulted in…

