For my current project, which automates an approach to understanding decision making processes of elites, first developed by Robert Axelrod in the 1970s, I make use of two newly developed computational linguistic tools (see my github profile).
Rather than discussing the specifics of my current project, which you can read about here, I will use this space as an attempt to stimulate a discussion about the study of texts and language as a means to understanding culture, norms and collective behavior.
There is a long-standing tradition in all of the social sciences to analyze the language that people use to communicate ideas and express themselves as a means to understanding their motives, desires and actions. In economics, proponents of the Historical School analyzed language to study the unfolding of economic events of a particular time and place and while this school of thought died in the early 20th century, ironically for historical reasons, new theories of language and economics are starting to sproud (here and here) although they seem to all be conceived by the same person: Arial Rubenstein (here is a review). In Political Science, Gary King and Justin Grimmer have had some noteworthy contributions to advancing our social and political understanding by systematically studying texts. Using mostly a practice known as "discourse analysis", in Sociology, spoken language and texts are probably still the main sources of data and this has been so since the early days of Max Weber and Emile Durkheim and the same seems to be true of Anthropology; please correct me if I'm wrong (although the criticism below most often applies) ...I would greatly appreciate if someone could point me to more scientifically rigorous approaches to the study of texts and/or utterances in these fields (Anthropology and Sociology)!
The main critiques of using spoken and written language to gain an understanding of social forces, which I'm aware of, relate to one of two things: 1) the selection of evidence seems to be arbitrary and anectodal, or worse, most often evidence seems to be purposely selected to make a particular point (it is hard to check how evidence was selected and the data collection process is hardly transparent). 2) even if the texts or utterances were to be selected in a transparent and unbiased way, the amount of evidence is usually small as it has to be processed, analyzed, or interpreted by humans, which seems to allow for only very small bits of texts or utterances to be subjected to analysis; this hardly amounts to a serious method, it seems to be a collage, an artistic or poetic expression, rather than a science. For true understandig, this modus operandi is no doubt unacceptable as a means to transparently gain a shared understanding of the social world; but this is not to say that texts and utterances produced by humans can not serve as rich data sources to test interesting hypotheses. To the contrary, it seems to me that texts and transcribed utterances are some of the richest sources of data that are perpetually produced by the social world (in very large amounts)!
Hence, it seems natural to me that text analysis, as practiced in the 21st century (by computational linguists and artificial intelligence researchers), should be an integral part of the social sciences and it comes as a continuous surprise to me that I can still find astonishingly little use of NLP (Natural Language Processing) in the social sciences and practically no use of the more cutting edge tools that have been developed in Computational Linguistics over the last few years, for example at Stanford, MIT, or Carnegie Mellon University.
Jjust stumbled on this and wanted to say thanks to John McCreery for the pointer to the Tambayong & Carley piece -- and also thanks to Johannes Castner for bringing up this topic :)
Johannes, are you aware of Network Text Analysis in Computer-Intensive Rapid Ethnography Retri... by
Laurent Tambayong California State University at Fullerton; firstname.lastname@example.org and Kathleen M. Carley Carnegie Mellon University; email@example.com
Abstract: Advances in text analysis, particularly the ability to extract network based information from texts, is enabling researches to conduct detailed socio-cultural ethnographies rapidly by retrieving characteristic descriptions from texts and fusing the results from varied sources. We describe this process and illustrate it in the context of conflict in the Sudan. We show how network information can be extracted from vast quantities of unstructured texts-based information using computer assisted processes. This is illustrated by an examination of changes in the political networks in Sudan as extracted from the Sudan Tribune. We find that this approach enables rapid high level assessment of a socio-cultural environment, generates results that are viewed as accurate by subject matter experts, and match actual historical events. The relative value of this socio-cultural analysis approach is discussed.
P.S. Just heard a talk by Loet Leydesdorff, suggesting that while geographers work in Euclidean space and social network analysts work in a graph-analytical space, semantic network analysis requires a vector space and, ultimately, a hyper vector space in which perspectives can be taken into account.
It will be particularly interesting, to the general reader as well as scholars, how your results compare with the conclusions reached by George Lakoff inMoral Politics, the book that introduced the idea of framing to public political discourse in the USA.
Johannes, check out the work of Loet Leydesdorff and his group in Amsterdam. I will also ask around at the Sunbelt conference in Hamburg that I will be attending next weekend.
There is an important dimension that I did not discuss earlier--well, there are probably many dimensions, but one comes to mind that is particularly important in economics and political science and while it goes far beyond the study of language, it touches this area as well and it is very obvious when thinking in parallel about the works of Arial Rubenstein on one hand and Gary King and Justin Grimmer on the other: the diffence in emphasis on either empirics or theory. Arial Rubenstein mostly theorizes while Gary King and Justin Grimmer are mostly concerned with using texts as data sources with which to confront theories; albeit the theories that are confronted are often not the same as those proposed by pure theorists such as Rubenstein (this is actually a problem worth writing about in a seperate blog at some point).
You need to be a member of Open Anthropology Cooperative to add comments!