The Mandoku project

Table of Contents

About Mandoku

Mandoku is a framework for supporting collaborative research on premodern Chinese texts. This includes the tasks of

  • Philologically establishing a text, possibly by referencing and collating multiple versions of a text, a digital facsimile and a text can be displayed side by side
  • Reading, annotating, analyzing and translating a text
  • Working with repositories of texts, compare, study and analyze differences among these texts
  • Establish and maintain metadata about such texts, for example concerning the dates, authors, place of creation etc. of the texts
  • Compare notes, share information with other researchers working on these texts

To achieve all this, Mandoku defines a file format, directory layout, establishes text repositories and develops software to operate with these. Documentation of the Mandoku text format:


[2017-02-09 Thu] I now renamed the package to Sinomacs, the repository is on Github here. There is also a bundle for Windows users, it contains the Emacs program and some other supporting programs in one big archive. The download of the latest version is here: See the updated installation instrucions under Sinomacs.

[2017-01-09 Mon] New installation procedure: To make installation easier, I am trying a new procedure with a slightly modified version of Easymacs by Peter Heslin. For the time being, it is just a fork with a few changes to install the Mandoku package, but I might also change some of the keybindings in the future. The forked package is here: Easymacs-Mandoku.

[2016-03-01 Tue] updated installation instructions and a new version is available. Installation instructions:

[2015-11-27 Fri] updated installation instructions and a new preview version available Installation instructions:

[2014-09-27 Sat] a new preview version is available, see Installation instructions

[2014-01-13 Mon] updated Installation instructions for a preview version.

[2013-09-26 Thu] Installation instructions for a preview version are now available.

[2013-07-09 Tue] A built of Emacs 24.3 that allows the display of Unicode Ext. B characters can be found here.

[2011-06-17 Fri] The project has not yet been publically announced, because at the moment, there is not much to be seen here. Once that changes, I will make an announcement and remove this silly notice.


The software for using Mandoku texts falls in two broad categories: text processing software and a frontend to the user. For both of these categories there are multiple possibilities, but not all of them will be realized right now.

What is there at the moment is pre-alpha quality, not documented and full of quirks. It is made available as is as open source under a BSD-style license, see Use it as you like and drop me a line if you are stuck.

Interface software

Emacs (UPDATE: see Sinomacs and the instructions above)

A set of Emacs lisp files, that together form a working environment for comfortable dealing with the texts. The advantage of using Emacs lies in the fact, that it can make use of the large amount of modules relevant for researchers. These include for example

  • Bibliography management (org-bibtex, reftex)
  • Publication (org-exp, many conversion targets, including HTML, PDF, ODT, DocBook [TEI to come])
  • Task management (org-mode)
  • Dictionary lookup (lookup)

Mandoku is thus standing on the shoulders of a giant, but by adding a few little pieces here and there, it becomes even more useful for specific tasks.


A web interface, mainly geared at casual users without the need to have the files locally, is under development, a first test version is currently in internal testing.

Text processing software

Many tasks require operating in some way on the texts, for example extracting certain parts, comparing them with other, collate texts etc. In the context of the Mandoku project, a special processing model has been developed to make these tasks possible. Several programs and scripts have been developed for such tasks.

Another set of scripts works with digital facsimiles, which have to be cleared up, cropped, deskewed, scaled, and cut in such a way that they can be presented alongside a transcribed text.

All these programs have been written in Python (Python Programming Language - Official Website), these are available in the python directories.

Author: Christian Wittern

Created: 2017-02-09 Thu 14:13