DocuSkyBETA Widget for constructing plain text DocuXml and uploading the Xml file to the library (v2.4 Beta) Home page | My Database | 中文
Explanation 1: This tool allows you to specify the text files in the UTF-8 encoded format and output them into a local XML file to facilitate the construction of a personal database.
 You can package multiple text files into a document set and specify the name of the document set.
  * What should I do if the text files are not in UTF-8 encoding?
Explanation 2: (a) Automatic Segmentation: 1. Enter 4 # (i.e. ####) in the text file where you want to split the file to realize the automatic splitting function.
         2. The default file name is the original file name in order plus the number (i.e. original_xxx).
         3. The file name and content can be changed in the view, if not, you can skip it and click Save.
 (b) Automatic Segmentation: 1. Input Enter or Return (the beginning and end of the article can be omitted) at the place where you want to segment the text file, that is to say, a blank line, then you can realize the function of automatic segmentation.
         2. By default, if there are more than 2 full-form (4 half-form) spaces at the beginning of each line of a text file, then it will not be segmented but only replace the line, so you can realize the function of automatic segmentation by using this feature.

1. Convert from UTF-8 plain text file to library file:
Please specify the document set name first, and then select the text file you want to include in the document set:
  • The name of the document set:
  • Choose my text file:
The loaded document set:
List of loaded files (in items):
» View and modify the file name and content of a UTF-8 plain text file:
Totals: 0 items
Current collection name:
Instructions for use
Current original file name::
Instructions for use
Retains all spaces (including spaces before paragraphs) Yes  No
The name of the output library file:


 2. Upload DocuXml file to build a personal text library (DocuSky account required)
Click here to create or delete the library (maximum file size 100MB)


Below are some sample files that have been converted to DocuXml (you can download the xml file directly and click the button above to build the library):
  1. The Dream of the Red Chamber in 120 chapters: Full text of The Dream of the Red Chamber, without metadata and tagging.
  2. Hanlin Edition National Primary School Texts: Hanlin Edition National Primary School Texts, without metadata and tagging.
  3. Entries related to Taiwan during the Kangxi period of the Qing Records: In addition to the "Album of Qing Records on Taiwan History," this book also contains entries that Dr. Ji-An Weng of the Department of History considers to be related to Taiwan. There is metadata but no tagging, so we can test the post-categorization function.
  4. THDL Ancient Deeds in 200 sections: 200 ancient deeds exported from the THDL system. It has metadata and tagging, and can be tested for post-test categorization and word frequency analysis.
Instructions for use
(1) Select the collection or document you want to view from the drop-down menu on the upper left.
(2) The original document/subpart: the file name (double click) and the text can be changed according to the user's needs.
(3)  "Add" sub-file:
  a. To specify the location where the "New" component file should be inserted, check the box above the location where you want to insert the component file. (e.g. There are two sub-files, a.txt and b.txt, and if you want to insert the "New" sub-file between these two sub-files, please check the box of a.txt.)
  b. If you do not specify the insertion location of the "New" component file, the system will recognize the "New" component file at the end by default.
(4) "Delete" sub-file: Check the box of the sub-file you want to "delete", then you can delete it.
(5) "Auto-detach": Check the text file you want to detach, and enter 4 # (i.e. ####) in the text file where you want to detach.
(6) "Range Selection": Please select the first and last 2 parts of the range you want to select and click "Partial Selection".
(7) "Batch Rename": a. Rename all the selected files according to the first file name and add the number in order.
         b. The system defaults the number to start from 001, if you want to start from other numbers, please add _xxx (three digits) after the first file name.
(8) "Confirmed Documents": the confirmed documents in the drop-down list are marked with gray color, in order to avoid the users to view them repeatedly,
and the unused ones do not affect the final output result.
(9) "Save": You can save the progress of the current sub-partition as a JSON file, so that you don't need to reload all the text files next time you want to make changes.