Researching Vietnamese History in the Digital Age

There was recently a debate in Vietnamese cyberspace between the author of a new book on early Vietnamese history, Tạ Đức, and a critic of that book, Trần Trọng Dương.

I haven’t read the book in question (Nguồn gốc người Việt – người Mường, The Origins of the Việt and the Mường), so I can’t comment on its contents, however the debate between Tạ Đức and Trần Trọng Dương has brought up some interesting points about conducting research in the digital age that I have thought about before.


Trần Trọng Dương has criticized Tạ Đức for using sources from the Internet. Apparently Tạ Đức himself stated in his book that around 50% of the sources that he used to research his book came from the web.

The reason why this bothers Trần Trọng Dương is, to quote, “Because information on the internet (such as Wikipedia) is a type of source-less information that can be put on the web by anyone, with no need to take academic responsibility for that information.”

[Bởi lẽ, thông tin trên mạng (như wikipedia) là kiểu thông tin không nguồn gốc, có thể do bất kỳ ai đưa lên mà không phải chịu trách nhiệm khoa học về những thông tin đó.]

Trần Trọng Dương notes further that an author can use his intelligence to check (kiểm soát bằng tri thức) that information. However, the fact that Tạ Đức apparently used information from a web page of a Chinese tourism company, suggests that the author did not always so.


Tạ Đức responded to this critique unapologetically by stating that the Internet is a fantastic resource. He listed various academic publications that he found on the web, and then he also defended his use of Wikipedia by stating that “. . . many Wikipedia pages, especially those in English, are essays that collect and spread serious information, that are presented in a condense manner and that clearly record the sources of their information.”

[. . . nhiều mục wikipedia, đặc biệt bằng tiếng Anh, là những bài viết tổng hợp và phổ biến thông tin nghiêm túc, được trình bày cô đọng và có ghi rõ nguồn gốc tư liệu.]

As an example of this, Tạ Đức points to a couple of the Wikipedia pages that he found valuable in conducting his research, one of this was on the Tanka or people.


I went to this web page and randomly checked a footnote, and this is what I found. There is a section called “Note on the Term” which says: “The term Tanka is now considered derogatory and no longer in common use.”

The source that is cited is the following: “Farewell to Peasant China: Rural Urbanization and Social Change in … – Page 75 Gregory Eliyu Guldin – 1997 “In Dongji hamlet, most villagers were originally shuishangren (boat people) [Also known in the West by the pejorative label, “Tanka” people. — Ed.] and settled on land only in the 1950s. Per-capita cultivated land averaged only 1 mu …”


This is a very strange source to cite in order to demonstrate that the term “Tanka” now has a pejorative meaning. I would expect to see cited a specific socio-linguistic study about the term.

Another important point to note is the “ – Ed.” in this citation. This means that the editor for this Wikipedia page put this information there. Whoever originally wrote this entry did not provide a citation for this statement. Someone who read this page later did.

How can we know that? By clicking on the “Talk” button on the top of the Wikipedia page.


When we do that, we find that someone noted that the author needs to provide evidence that the term is pejorative.



This person provided some sources to do this, and the editor used the first source listed. But are any of these sources authoritative studies of the use of the term “tanka”? No! They are all about other topics, but they make mention of the fact that the term is pejorative (hopefully by citing an authoritative study, but the person who put this information there did not indicate where those people got this information).

What is more, if you Google the sentences that are cited from these books you will find that they are all available online.

What this means is that whoever put that information there, did so simply after searching for the term “Tanka” and “pejorative” on the Internet. That person then found some academic studies that mention this, but none of those studies were actually about the topic of the word “Tanka.” Therefore, none of these sources are valid sources for demonstrating that the term “Tanka” has become pejorative.

So while I agree with Tạ Đức that the Internet is a wonderful resource, what is most important is that people know how to use the information on the Internet effectively. I think that Trần Trọng Dương has a sense of this, but it’s actually much more complex than Trần Trọng Dương perhaps realizes.

Wikipedia certainly can be helpful, but it is important to know how Wikipedia works and to examine it closely.


Visualizing Colonial Police Deaths & Casualties in 19th-Century Burma

In a 7 May 1900 issue of The Rangoon Gazette (pg. 10) I came across a “List of Civil Police Officers killed, wounded or died from 1886-1898.” The information that is provided consists of the following: battalion, name, date of death or casualty, nature of death or casualty.

excel image

I input that information into an Excel spreadsheet, and then looked up the latitude and longitude for the places where the battalions were based. For places like “Lower Chindwin” I used Monywa, as that was the administrative center at the time. In the case of Shwegyin, I used this one, but there is more than one Shwegyin, so I’m not sure if this is correct.

After inputting that information, and cleaning up some mistakes in the text from The Rangoon Gazette, I tried to map out and visualize this information.

There are various programs that you can use to visualize data. All you have to do is to save the Excel file as a csv (comma separated value) file and then you just drag and drop the file into a visualization program.

Google map

I created the above map by doing this with Google Maps Engine. One problem that I encountered is that when you have more than one entry for one geographic place, the map just shows one entry and then ignores the rest.

If you click on the “data” icon, you can see that information, but it doesn’t all get represented on the map.

Gephi 1

I then used Gephi to make these visualizations.

Gephi 2

I just used the information about battalion names and years, because the visualization I got when I input all the information was too confusing. I also had to combine place names like “Lower Chindwin” as “LowerChindwin” as Gephi used the words “Lower” and “Chindwin” as separate pieces of data.

In the end, this visualization looks cool, but I don’t find it to be all that helpful.


So I then used RAW and got the above visualization for battalions and years of deaths or casualties. This one is a little bit clearer, but still kind of difficult to see.


Ultimately I found that the clearest things to visualize were single items, like years when people were killed or wounded.


The battalions they belonged to.


And what happened to them. Here “died” means that the person simply died while in the service of the police, and not in some kind of active police mission.

Trying to visualize data is fun, but it is definitely a challenge to create visualizations that actually show someone something significant. I would say that the above visualizations all pretty much “fail” to do that, but you do learn things in trying to make visualizations.

If anyone wants to play around with this data, I’m attaching the Excel file here.


Postscript: I just made the above map in OpenHeatMap. I think that one was more successful.

Colonial Police Deaths & Casualties in Burma


Exploring Southeast Asia with Omeka and Neatline

I just spent a very long day trying to learn how to install and use Omeka and Neatline.

For those who don’t know, Omeka is (to quote Wikipedia because I’m too tired to think right now. . .) “a free, open source content management system for online digital collections” while Neatline is (quoting the Neatline web page) “a geotemporal exhibit-builder that allows you to create beautiful, complex maps and narrative sequences from collections of archives and artifacts.”


Omeka and Neatline are free, but you need access to a server in order to install and use them, and that is not free, but there are inexpensive options that one can chose.

Omeka can be used to create a “digital collection” of whatever digitized materials you want to collect, and you can then create displays of those materials.

Neatline, meanwhile, enables one to connect texts or images with online maps


Today I tried to create Omeka and Neatline exhibits using a report that James McCarthy, Superintendent of Surveys in Siam, presented to members of the Royal Geographic Society on 14 November 1887. This report was published in the Proceedings of the Royal Geographic Society and Monthly Record of Geography, New Monthly Series, Vol. 10, No. 3 (Mar., 1888): 117-134.

I used a passage from this report about a trip that McCarthy made to the northwest of Siam, into the area of what is today Laos.

omeka hosting

The documentation that Omeka and Neatline provide is not as detailed as one would like. There are some companies that provide server space and offer “one-click installation” of Omeka, and they are listed on the Omeka web page. That’s probably the easiest way to start.

Once Omeka is installed, the Neatline plugin has to be uploaded to the server and activated. The documentation for that is fine, and I used Web Disk to do that.


Finally, when it comes to building an exhibit, again, the documentation is not as clear as one would like. In the end I found that trial and error worked.

This is the result of my labors: an Omeka exhibit of McCarthy’s expedition to Laos, and the beginnings of a Neatline exhibit of the same expedition (click on the circles on the map).

I can definitely see a lot of potential in using these platforms to present information in interesting and helpful ways, but the learning curve at the beginning is a little steep. It’s not impossible, but it does take time.

Stitching Together Historical Maps of Southeast Asia

There are a lot of historical maps of different areas of Southeast Asia that have been digitized. One problem that I have encountered, however, is that in many cases maps are too big to scan in one image, so people have to scan such maps in parts.

The result is that you have multiple images that you have to look at (and the place I always seem to need to see is right where one scan ends and another begins!).


Today I just succeeded in “stitching” some digitized images of maps together.

Recently I came across an article from the late nineteenth century that had a nice detailed color map of Siam in it. The map, however, had been scanned into 8 images.

Using the professional version of Adobe (which I don’t own, but have access to), I first cut off a white portion at the bottom of each image. I then saved each image as a TIFF file.


I then downloaded the (free) Microsoft Image Composite Editor (ICE). It is incredibly easy to use. I first dragged and dropped the TIFF images from the top of the map, and the ICE aligned them perfectly.

I then saved this as a TIFF file, and repeated the process for the 4 images from the bottom of the map.


I then dragged and dropped these two files into ICE and it likewise aligned them very well. This image I then saved again as a TIFF file (but it can save in other formats too), and the quality of the final map was very good, much better than the image that I have here.

Using RAW to Create Visualizations for Southeast Asian History

There is a new tool that has just been released called RAW which allows users to easily create visualizations from information in a spreadsheet.

I decided to visualize some information from a US State Department report from October 1945 about nationalists in Vietnam. To do this, I created a simple Excel spreadsheet with the names of people and their political affiliations.


I pasted it into RAW.


Then I chose a layout and how I wanted to map it.


And I got my visualization.


Digitized Historical Maps of Southeast Asia

Not all that long ago, if you were an academic and you wanted a map, then you pretty much had to ask a professional cartographer to help you. These days that’s no longer the case.

With open source tools like Quantum GIS (QGIS), and with the help of training manuals, you can start making your own maps in a matter of hours.


That said, in doing so, you will quickly figure out that it is a lot easier to take maps or GIS files that have already been made by someone else, and to adapt them, such as by using an existing map as a base map, and then creating layers of new information that relate to whatever project you are doing on top of the base map.

This is where life can get frustrating, because not all digitized maps are the same. People digitize maps in different ways and this creates incompatibilities between some files and some GIS programs.

The US military made very nice topographic maps of Vietnam, for instance, and they have been digitized, but they are in GeoPDF which QGIS doesn’t accept, and which is a pain to convert to another format.


So this being the case, it’s always nice to come across digitized maps that are “user friendly.” I found this blog a while ago. It has some nice historical maps of Cambodia.

Then today I found some that the Library of Congress (LOC) has digitized.

Today I was reading a 1903 issue of the British North Borneo Herald about a guy who was prospecting for coal in the Serudong Valley. I wanted to see where that was, and I found that the LOC has digitized a map of British North Borneo in 1903 where I was easily able to locate the place I was looking for.


What is even better, is that you can download files of the maps from the LOC web page, and I was easily able to add the JP2 and TIFF files that they have there into QGIS as a raster layer. I could therefore use these maps as base maps and create layers of my own information on top of them. For history projects, that is fantastic.


In looking around a bit more on the LOC site, I came across this beautiful map of Vietnam in 1890.

Also, the LOC site allows you to view the map in full screen mode and to pan and zoom in. All of that is wonderful too.

Oh, and the first picture above is of a Russian topographic map of Bangkok. During the Cold War, the Russians and the Americans both “mapped the world.” Those maps are all more or less “out there,” but again, finding them and finding them in file formats that you can work with is not always easy.

Visualizing the Telephone Network in French Indochina (or The Beauty of Boring Books for the Digital Humanities)

When historians engage in historical scholarship, I think it is safe to say that they want to “visualize” or “see” the past in one way or another, and there are various techniques that they use to do this.

First, knowledge helps one see the past. The more one knows about the past, the more visible it becomes.

Second, knowing languages also helps one visualize the past. The greater one’s ability to read historical sources in their original form the more one is able to see.

Third, theory can also help. Looking at information from a different theoretical perspective can help one see the past more clearly as well.

Finally, these days the emerging field known as the Digital Humanities (DH) is providing various ways to visualize the past, one of which is by mapping.


These days there are numerous open source (meaning “free”) GIS programs (such as Quantum GIS and uDig) that can enable one to relatively easily make complex maps that one can then analyze and “query.”

I will write more specifically about those tools later, but here I want to point out that we now also have fabulous sources that we can use to build interesting maps with.

In particular, thanks to the digitization of texts by Google and various libraries around the world we can now easily access numerous (formerly obscure) books that are filled with valuable data that can be mapped.

All that is required is some imagination and effort.


In writing about Alexander Grossman (here), the inventor of the “Manhood Creator,” I came across an obscure book that Google has digitized called the International Chinese Business Directory of the World For the Year 1913.

This book lists the names of Chinese businesses and what they sold from around the world in 1913. Further, in many cases (but not all), it even names the street where each business was located.

For people who study the history of the Chinese in Southeast Asia, a source like this is fantastic, because with a little work (with historical maps and Quantum GIS), one could easily “map out” where the Chinese businesses were in a place like Batavia (or for all of Southeast Asia!!).

What is more, one could create a GIS map with different layers of information. The opium sellers could be on one layer, and the grocers on another, and one could examine relationships between different types of merchants and different communities, etc.

This would be fascinating to “see,” and with DH tools and techniques this is now possible.


Similarly, I was looking at some of the materials that the French National Library has digitized. There are many books there that are filled with fascinating data.

I was looking at one book called Indochine adresses, 1ère année 1933-1934: Annuaire complet (européen et indigène) de toute l’Indochine, commerce, industrie, plantations, mines, adresses particulières.

It contains information like a list of telephone numbers in the various cities of French Indochina. That is fascinating because one could use that information to “map out” the telephone network in the 1930s, and then think about where communication could “move” and where it couldn’t.

It would then be interesting to think about how historical events (like the First Indochina War) related to something like the telephone network. If we mapped the events of that war against a map of the telephone network, what would we see?

the telephone

Ten years ago, if I had come across either of these books in a library I would have probably laughed at how “obscure” and “boring” they were.

With the ways in which DH techniques are enabling historians to visualize the past, however, such “boring books” now have great potential.