Batch Tagging Tool (also known as ContentTagging, CT for short):
Brief description of the specifications
A brief description of the specifications for the input format of Excel sheets:
- The first column (row) specifies the purpose of the field, and must have at least two fields, tagName and tagVal (case sensitive, so you can't use tagname and tagval).
The data columns below the second column specify that the text content's vocabulary, tagVal, should be labeled with the tag tagName.
- The first column allows you to specify that the output tags should contain the special attribute Term or RefId,
i.e., you can specify the value of the Term attribute under the tag via @Term or attribute:Term (both work the same way),
and the value of the RefId attribute under the tag via @RefId or attribute:RefId (both work the same way).
The value of @RefId is usually of the form <auth>_<id>, where <auth> is an abbreviation for an open authority file and
<id> is a unique identifier within the authority file.
For example, cbdb_1762 points to the character information of "Wang Anshi" in CBDB.
Currently DocuSky has limited support for @RefId.
It only supports cbdb, dila open authority files for PersonName tag,
and tgaz, dila, twgis open authority files for LocName tag recently.
Note: Under the PersonName or LocName tags, @RefId can also be filled with a full (starting with http:// or https:) URL.
- The cell value in the tagName field must conform to the DocuXml tagging specification
(only alphanumeric characters, underscores, English periods, and minus signs are allowed;
other characters are discarded).
Except for the DocuXml predefined PersonName, LocName, SpecificTerm, Date tags,
all other tag names are customized tags. Customized tags should be prefixed with "Udef_",
e.g. Udef_DrugName. If a customized tag name under the tagName column is not prefixed with "Udef_",
the tool will automatically add it
(i.e. filling DrugName and Udef_DrugName in the cell under the tagName column will have the same effect).
(i.e. in the cell under the tagName field, filling in DrugName and Udef_DrugName will have the same effect).
- A field filter:<metadata> can be added to specify what conditions the <metadata> of the text must fulfill when the text is compared.
For example, if you add a field with the name filter:filename and the value f001|f011~f050|f099,
only the text with the filenames f001, f011~f050, and f099 will pass the comparison
(and only those that do will be marked).
- If you want to add some special tags to the text where a particular term appears,
you can add a column to the Excel sheet with the name extraMetaTags in the first column,
if the tagVal of a row is t and extraMetaTags is A:a;B:b,
then the tool will add the tag MetaTags/Udef_A (value a) to the text where the term t appears.
Udef_A (value a) and MetaTags/Udef_B (value b).
- The following is a simple table example where the tool tags
- "滬尾" is labeled as「<LocName Term="淡水">滬尾</LocName>」
- "淡水" is labeled as「<LocName Term="淡水">淡水</LocName>」
- "胡麻" is labeled as「<Udef_DrugName Term="芝麻">胡麻</Udef_DrugName>」
- "芝麻" is labeled as「<Udef_DrugName>芝麻</Udef_DrugName>」
- "王安石" is labeled as「<PersonName RefId="cbdb_1762">王安石</PersonName>」
In addition, metatags/Udef_NameStage with the value "早期名稱" will also be added to the text appearing as "滬尾";
metatags/Udef_NameStage with the value "現今名稱" will be added to the text appearing as "淡水";
metatags/Udef_Usage with the value "Common" will be added to the text appearing as "胡麻";
metatags/Udef_Usage with the value "Common" and metatags/Udef_NameStage with a value of "現今名稱" will be added to the text appearing as "芝麻"
tagName | tagVal | @Term | @RefId | extraMetaTags |
LocName | 滬尾 | 淡水 | | NameStage:早期名稱 |
LocName | 淡水 | 淡水 | | NameStage:現今名稱 |
DrugName | 胡麻 | 芝麻 | | Usage:Common |
DrugName | 芝麻 | | | Usage:Common;NameStage:現今名稱 |
PersonName | 王安石 | | cbdb_1762 | |