Examining Travel Writing about Southeast Asia with AntConc

One of the first papers I ever researched and wrote as a graduate student was about European perceptions of mainland Southeast Asia in the early nineteenth century. How did I research it? I went to the library and got whatever books I could find and read through them.

At that time, Mary Louise Pratt’s Imperial Eyes: Travel Writing and Transculturation had recently been published, and Edward Said’s Orientalism was still very popular. Having read and been influenced by these two books, I examined some travel accounts of mainland Southeast Asia by Western writers and sought to determine to what degree (if any) I could detect an “imperial desire” on the part of these authors to characterize the people of the region as inferior “Others” in need of “uplifting” through colonization.

imperial eyes

Today if I was a graduate student and was going to research a paper like this I would do it somewhat differently. I would still read Imperial Eyes and whatever studies about travel writing have been written since that work was published. And I would still base my work on travel accounts written by Westerners. However, for some of my research I would look at those works in a different manner.

In particular, I would use a tool like AntConc, which I discussed in the post below and in this post on my other blog.

SEA Visions

Several years ago some wonderful people at the Cornell University Library digitized many Western travel accounts of Southeast Asia and created a web page called “Southeast Asia Visions” where one can search, browse and read these texts.

In the time since that site was developed, many (if not all) of the texts there have been digitized by other people, and can now be found in places like archive.org. Indeed, I now prefer the archive.org versions of these titles as they are offered in various formats (read online, full text, etc.).

browse

What is still nice about the “Southeast Asia Visions” site, however, is that it is a convenient place to see what travel accounts exist. One can, for instance, browse by time period and see which texts were published at a given time.

So if I was going to research a topic that employed travel writings, I would start at the “Southeast Asia Visions” site to determine what texts exist (they don’t claim to have digitized every travel account, but they have digitized a lot).

archive

I would then go look for those texts on archive.org. Rather than viewing them online at this point, what I would do first is to click on the “full text” link, and to then copy and paste the text of the book in a notepad file which I would then save in ETF-8 encoding (that makes AntConc happy).

Having done that for all of the books that I want to examine (and that could be many more than I would ever have the time to actually read), I would then load them into AntConc and start searching for terms that might relate to whatever it is that I am interested in examining.

In actuality, there already is a search function at the “Southeast Asia Visions” site, however, I would still create my own corpus of texts and use AntConc to search through them as that tool is much faster and easier to use than the search function on the “Southeast Asia Visions” web site.

Siamese

Finally, after having identified something interesting to examine, I would then read more deeply into the texts (yes, I think it is still important to read. . .). I could do this in AntConc (since the files I created contain the entire texts), or I could read scans of the original text online at archive.org or the “Southeast Asia Visions” site.

Examining the Đại Việt sử ký toàn thư with AntConc

When it comes to things like databases, digitized texts and all of the various digital tools that scholars in many fields use today, the field of Vietnamese history is far behind, as very little has been done to move Vietnamese history (and particularly premodern Vietnamese history) into the digital age.

As frustrating as that is, there is only one solution to this problem and that is to “DIY” (“do it yourself”).

I spent much of the day today DIM (“doing it myself”) and have finally found a good way to use a free concordance tool to examine (much of) the Đại Việt sử ký toàn thư (大越史記全書).

AntConc

Laurence Anthony, a professor at Waseda University in Tokyo, has developed a concordance tool called AntConc that can be downloaded for free here (I’m using the 3.3.5w beta version).

save

It is very simple to use. You start the program, load text (.txt) files that have been saved in UTF-8 encoding, and then search the files for words – just input a word and click “start.”

Han search

What is wonderful about this piece of software is that it can handle Asian languages. However, for it to work with Chinese text you must click this box called “Regex” (and it took me hours today to figure that out. . .).

So, for instance, you can load text files of the Hán (i.e., classical Chinese) text of the Đại Việt sử ký toàn thư and do a search for a word like 殺 (sát, i.e., giết = “to kill”) – I think I’m in a violent mood after spending so much time trying to figure out how to get AntConc to work with Chinese text. . .

sat

The screen then shows you each of the sentences in which that term appears (the search term appears in blue). When you click on any of those blue terms, AntConc will then take you to the passage in the text in which that particular sentence appears.

sat in context

In the “File” menu you can then click “Close All Files” and “Clear All Tools,” and then load the Vietnamese languages files for the Đại Việt sử ký toàn thư. After doing that, you should un-click the “Regex” box and do a regular word search for a word like “giết” (still feeling kind of violent).

giet

So by doing simple word searches like these one can start to examine the past in interesting ways. From the two searches above, for instance, I could easily imagine someone going on to research and write a paper on “Killing in Vietnamese History.” A paper like that would be easily possible to research and write with the aid of a tool like AntConc.

files

I prepared the files that I use here by copying and pasting the text from these two web sites (here and here). There are some problems to note. First, the Vietnamese language text is not “clean.” The footnote text is not included, but the footnote numbers are still in the main text, and this might make some searches inaccurate. So to really do this well, one would need to clean up the text.

As for the Hán text, there are some issues with it as well. First of all, it is incomplete, as the text is being gradually uploaded to that web page.

Second, the text that is being uploaded to that web page is Chen Jinghe’s collated version of the Đại Việt sử ký toàn thư. That is a very good text, but Chen Jinghe did make some mistakes in compiling it. A more serious problem, however, is that the people who are OCR-ing and uploading Chen Jinghe’s version are making further mistakes. As a result, this text that is ending up on the Internet is “two steps away” from any (“original”) source text.

As such, the texts here are not substitutes for primary sources. They are tools that can help us engage in research in new ways. But in the end, one should always consult a primary/source text as well.

Finally, if anyone discovers something interesting in using AntConc with a text like the Đại Việt sử ký toàn thư, please leave a comment here and share what you find.

Have fun!!