Using Natural Language Processing and Social Network Analysis to study ancient Babylonian society

Publication Date: 
March 10, 2009
Expiration Date: 
March 10, 2012
Patrick Schmitz, IST–Data Services
Weight: 
0
Body Text: 

In Near Eastern Studies, as in other areas of Humanities, researchers often study corpora of administrative and legal texts to understand economic, administrative, and societal structure, considering the activities of individuals and their interactions with each other. This is often painstaking work, as, for example, in studying ancient Babylonian texts where scholars must first be able to read Akkadian, and then must assemble all the references to people and activities by hand. This process is formally known as prosopography, and is used by many scholars across a range of Humanities research. Now, Professor Niek Veldhuis and Dr. Laurie Pearce are working with IST–Data Services' Patrick Schmitz to apply some more modern approaches to the problem. They are applying techniques from the fields of Natural Language Processing (NLP) and Social Network Analysis (SNA) to extract the names and basic familial relationships of people mentioned in texts, and then to assemble the social network of the people based upon the activities described.

The new project, dubbed Berkeley Prosopography Services (BPS), will leverage an XML representation of each text, borrowing some ideas from the Text Encoding Initiative (TEI) for indicating persons and roles in the texts. A probabilistic engine will collate all the person-references in the corpus, along with some basic world knowledge, like the typical length of adult activity, and will then associate the names to individual persons, and finally will relate the people to one another by the kind of activities they engaged in. The resulting graph model can be used to produce a variety of reports and visualization tools, including simple name lists and family trees, as well as interactive models. By integrating graph visualization tools, the project will provide interactive tools that let researchers explore the network of associations and activities. They can focus on an individual, on a given type of activity (e.g., real-estate sales), or explore other aspects of the model. This should enable the researchers to answer many complex questions more easily, and with a visual response.


BPS Architecture Diagram

The initial application will be to a corpus of approximately 700 cuneiform tablets that record transactions such as sales and leases of temple offices and of real estate among members of a relatively small group of elite Mesopotamian citizens during the Hellenistic period (331-46 BCE) in the city of Uruk (southern Iraq). The electronic Uruk text corpus has been prepared and validated by Dr. Pearce as part of the international Cuneiform Digital Library consortium (CDL). Dr. Steve Tinney of the University of Pennsylvania is contributing parsing tools to convert the corpus to XML. A recent HART grant supports several graduate students on the team — two from Near Eastern Studies and one from the School of Information — to aid in the data preparation and the development of the tool.

The Uruk corpus is sufficiently rich and complex for the BPS to produce good results, and to serve as a prototype for the processing of larger and more problematic corpora, both within the cuneiform tradition as well as from other disciplines. The project also serves as a demonstration of broader initiatives (e.g., Project Bamboo) to effectively apply technology in support of research in the Humanities.

For more information about the BPS project, contact Patrick Schmitz, Professor Niek Veldhuis, or Dr. Laurie Pearce,