What is the value of knowing how many times an author uses a certain word? Or asking if an author used a word positively or negatively and how that changed the impact of their statement? These are questions researchers using text analysis methods might ask when exploring collections of books, articles or social media posts to gain a deeper understanding of the connections between and meaning within texts.
The Libraries Digital Scholarship department hosted a text analysis showcase to highlight how text analysis might be used within the research process. The showcase took place on November 14, 2024 and included staff, faculty, graduate and undergraduate students. Presenters included were Sara Bautista, a staff member in Decker College and former Digital and Data Studies Undergraduate, Huiping Wang, a visiting scholar in the Translation Research and Instruction Program, Furkan Emer, a graduate student in Political Science and Evan Liu an undergraduate student in Statistics minoring in Digital and Data studies.
Bautista’s presentation, “Memorializing South African Apartheid Victims via Text Mining Analyses,” analyzed first hand accounts of victims who suffered violence during Apartheid. The accounts were gathered by the Truth and Reconciliation Commission of South Africa. Bautista began researching the data under the hypothesis that the accounts of violence in the data would be racially or ethnically motivated, however, as Bautista worked across different exploratory text analysis methods an entirely different view of the data appeared. Using word counts and topic modeling Bautista found that accounts of violence were more linked to internal disputes between rival anti-Apartheid movements. Bautista explained that the text analysis methods she used were particularly helpful for uncovering the obscure narratives within the texts she explored and to acknowledge often undiscussed truths of victim’s experiences and stories.
Liu’s presentation used similar text analysis methods exploring Stormfront, a forum created and used by white supremacists. Liu analyzed posts gathered from the forum using topic modeling to pull out common topics brought up across the posts. By analyzing the posts using topic modeling Liu was able to quickly identify common groupings among the posts to begin formulating a hypothesis for further research. For example, Liu found that topics pulled from the data coincided with concepts of self-victimization, Asians as a model minority and the villainization of black people among others. Using text analysis, therefore, gave Liu a starting point and broad understanding of how hate speech is formulated with the Stormfront forums and the common themes. Starting with this, Liu was able to create an initial hypothesis about what and why white supremacists believe what they do and potential approaches to combating hate speech through better detection, but also education and outreach.
Taking a different approach to text analysis methods, Emer’s presentation discussed his statistical research of military speeches in public media. Emer analyzed 160 media appearances of military speeches trying to uncover when and how military leaders utilize their country’s media. Through the analysis Emer found that military leaders were more likely to take advantage of their relationships with the media when their agenda did not align with that of their political leaders. In these cases, the media provides an essential outlet for military leaders to put forth their own agendas or their disagreements with current political leaders. Emer argued that military leaders presence and use of media is an understudied subject and that his analysis and work shows value in paying attention to and building an understanding of how military leaders interact with their country’s public through media and what that shows about their relative influence.
While Bautista, Liu and Emer used text analysis for on articles and smaller forms of text, statements and forum posts, to explore the relationship between language and the people using it, either politically or socially, Huiping Wang was interested in a slightly different question: namely how do different translations of the same text compare to each other and what comparisons can be made by using text analysis methods. To answer this question Wang analyzed 6 translations of the Chinese classic Daxue looking at the simplicity, readability and qualitative aspects of the translator’s language. She argued that each translator’s motivations for the text, whether that be the 19th century missionary David Collie trying to learn Chinese or late 20th century translator Chan Wing-tsit trying to popularize Chinese texts and educate Western audiences about them.
Throughout the showcase the value of text analysis methods and tools was shown to delve deeper into many different aspects and kinds of textual data from short form posts and stories to entire corpuses.
If you are interested in starting a text analysis project or learning more about the tools available for text analysis available to you through the libraries, reach out to the Libraries Digital Scholarship team! We’d love to talk with you about your ideas and how we can support them.