Examining the Đại Việt sử ký toàn thư with AntConc

When it comes to things like databases, digitized texts and all of the various digital tools that scholars in many fields use today, the field of Vietnamese history is far behind, as very little has been done to move Vietnamese history (and particularly premodern Vietnamese history) into the digital age.

As frustrating as that is, there is only one solution to this problem and that is to “DIY” (“do it yourself”).

I spent much of the day today DIM (“doing it myself”) and have finally found a good way to use a free concordance tool to examine (much of) the Đại Việt sử ký toàn thư (大越史記全書).


Laurence Anthony, a professor at Waseda University in Tokyo, has developed a concordance tool called AntConc that can be downloaded for free here (I’m using the 3.3.5w beta version).


It is very simple to use. You start the program, load text (.txt) files that have been saved in UTF-8 encoding, and then search the files for words – just input a word and click “start.”

Han search

What is wonderful about this piece of software is that it can handle Asian languages. However, for it to work with Chinese text you must click this box called “Regex” (and it took me hours today to figure that out. . .).

So, for instance, you can load text files of the Hán (i.e., classical Chinese) text of the Đại Việt sử ký toàn thư and do a search for a word like 殺 (sát, i.e., giết = “to kill”) – I think I’m in a violent mood after spending so much time trying to figure out how to get AntConc to work with Chinese text. . .


The screen then shows you each of the sentences in which that term appears (the search term appears in blue). When you click on any of those blue terms, AntConc will then take you to the passage in the text in which that particular sentence appears.

sat in context

In the “File” menu you can then click “Close All Files” and “Clear All Tools,” and then load the Vietnamese languages files for the Đại Việt sử ký toàn thư. After doing that, you should un-click the “Regex” box and do a regular word search for a word like “giết” (still feeling kind of violent).


So by doing simple word searches like these one can start to examine the past in interesting ways. From the two searches above, for instance, I could easily imagine someone going on to research and write a paper on “Killing in Vietnamese History.” A paper like that would be easily possible to research and write with the aid of a tool like AntConc.


I prepared the files that I use here by copying and pasting the text from these two web sites (here and here). There are some problems to note. First, the Vietnamese language text is not “clean.” The footnote text is not included, but the footnote numbers are still in the main text, and this might make some searches inaccurate. So to really do this well, one would need to clean up the text.

As for the Hán text, there are some issues with it as well. First of all, it is incomplete, as the text is being gradually uploaded to that web page.

Second, the text that is being uploaded to that web page is Chen Jinghe’s collated version of the Đại Việt sử ký toàn thư. That is a very good text, but Chen Jinghe did make some mistakes in compiling it. A more serious problem, however, is that the people who are OCR-ing and uploading Chen Jinghe’s version are making further mistakes. As a result, this text that is ending up on the Internet is “two steps away” from any (“original”) source text.

As such, the texts here are not substitutes for primary sources. They are tools that can help us engage in research in new ways. But in the end, one should always consult a primary/source text as well.

Finally, if anyone discovers something interesting in using AntConc with a text like the Đại Việt sử ký toàn thư, please leave a comment here and share what you find.

Have fun!!