DocuSkyBETA A small tool to construct text-only DocuXml files and to use it to build a database Homepage (Chinese) | My Database | Chinese
Note:This tool can use your designated UTF-8 formatted .txt file and export it as a local XML file, so as to use it to build a database.
 You may package multiple .txt files into a corpus, and designate a name for this collection.
  * * What do I do if the file is not UTF-8 formatted?

1. Constructing a database from text-only UTF-8 files:
Please designate the name of your corpus before choosing which files the collection will include:
  • Name of text corpus:
  • .txt file
Uploaded corpus:
List of uploaded files:
Name of exported database:


2. Upload DocuXml file to construct your personal text database (requires DocuSky account)
Click to construct or delete database (max. file size of 100MB)


Below are some sample files which have already been converted to DocuXml format (You may download these .xml files before clicking on the button above to construct a database)(Note: they are all in Chinese):
  1. Dream of the Red Chamber in 120 Chapters: Full text of Dream of the Red Chamber without metadata and tagging.
  2. Hanlin Version of the Elementary School Chinese Textbooks: Contains the text within without metadata and tagging.
  3. The items related to Taiwan during Kangxi Reign in the Veritable Records of the Qing Dynasty: Also contains items Dr. Weng Chi-an considers as related to Taiwan; has metadata but without tagging, and may be used to test post-classification of search results.
  4. 200 old land deeds in the Taiwan Historical Digital Library (THDL): contains the text of these 200 land deeds; attached with metadata and tagging, and may be used to test post-classification of search results and termstats.