Introduction to TBX-ISO-TML
TBX-ISO-TML is the name of the terminology markup language (TML) that was developed for use in ISO standards documents and has been adopted (with a few backwards-compatible changes) as part of NISO STS (ANSI/NISO Z39.102-2017 Standards Tag Suite) for use in a wide variety of standards documents.
TBX-ISO-TML is used to represent the terminology that is described in standards documents (in ISO standards, this is typically under Section 3). Based on XML, TBX-ISO-TML is compliant with ISO 30042 -TermBase eXchange (TBX). However, to accommodate some unique features of ISO documents, ISO 30042 was extended, and it has been further extended to work in the context of NISO STS. Likewise, because not all the data categories included in ISO 30042 are necessary for describing standards terminology; the unnecessary data categories have been removed to simplify the format.
TBX-ISO-TML adheres to the best practices of terminology management as described in the standards produced by ISO Technical Committee 37, "Terminology and other language and content resources." In particular, the following standards are normative in TBX-ISO-TML:
- ISO 704 - Terminology Work - - Principles and methods
- ISO 10241-1 - Terminological entries in standards - - Part 1: General requirements and examples of presentation
- ISO 30042 - TermBase eXchange
- ISO 16642 - Terminological Markup Framework
- ISO 26162 - Design, implementation and maintenance of terminology management systems Terminological data categories (datcats) refer to the various types of terminological information that can occur in a glossary or a terminology database, such as term and definition. The datcats in TBX and in TBX-ISO-TML are taken from the ISO TC37 Data Category Registry (DCR), which is accessible at the following Web site: www.isocat.org
The Terms and Definitions section
Many standard define the key terms that are used in a standard. The purpose of this is to ensure that users of the standard interpret the key terms the way that they were meant to be interpreted, otherwise, the standard would not be implementable.
The location of these definitions may vary from standards organization to standards organization. In most ISO standards, the terms and definitions are contained in Section 3, “Terms and definitions”. However, in ISO standards, terms can also be defined in a separate document; this approach is taken, for instance, when the terms and definitions need to be produced in multiple languages in a single document, or when it is decided to publish a collection of terms from a set of related standards in order to provide a wider view of the terminology used in a specific domain.
NISO STS provides two ways to encode terms and definitions: TBX-ISO-TML and <term-display>. This document describes TBX-ISO-TML, <term-display> is described in the NISO STS Tag Library.
DCT Format
TBX-ISO-TML uses an XML vocabulary style that adopts what is called the DCT format, or “Data Category as Tag name”. The DCT format is compliant to, but quite different from, the XML vocabulary style used in TBX. The latter can be comparatively called the DCA format, or “Data Category as Attribute value”. The following table provides several examples to show the differences.
DCT | DCA |
---|---|
<definition> | <descrip type="definition"> |
<partOfSpeech value="noun"/> | <termNote type="partOfSpeech">noun</termNote> |
Since ISO-TBX-TML is used within the larger XML format called NISO-STS, a “tbx” namespace prefix is required on all elements, for instance: <tbx:definition>.
In DCT format, data categories that take free text as their content, such as definitions, notes, examples, and so forth, are implemented as elements with textual content. Data categories that have limited values, such as part of speech and normative authorization, are implemented as empty elements with attribute values.
Guidelines: Terms and Definitions section of ISO standards
The content and layout of the Terms and Definitions section of an ISO standard, or of a separate standalone ISO glossary, are governed by well-established terminology management principles which are defined by ISO Technical Committee 37. These principles are outlined in this section. For more detailed information, refer to ISO 704: Terminology Work - Principles and methods.
Each entry describes one concept
The terms and definitions are organized into numbered clauses, each of which describes one and only one concept. Each clause, which is called an “entry”, corresponds to one
<tbx:termEntry>. If multiple terms are used for the same concept, this set of synonymous terms are all documented within the same entry, each within its own
<tbx:tig> element.
If a term has multiple meanings, each meaning shall be documented in a separate entry. In this case, the entries will contain the same term. Normally, each entry would be restricted to a different subject field.
Choosing a preferred term
When more than one term is used for a concept (i.e., there are synonyms), one of the terms shall be chosen as the preferred term, and shall be marked as such by associating a <normativeAuthorization> element with this term. This preferred term shall be predominately used to refer to the concept in the body of the standard. The other terms may have a “normativeAuthorization” value, such as admittedTerm or deprecatedTerm, if it is desired to indicate their usage, however this is not required.
Avoiding ambiguous terms
It is extremely important to avoid the use of ambiguous terms, not only in the Terms and Definitions section of a standard, but in the whole body of the standard itself. For instance, using the same term to designate different concepts is highly discouraged especially when these different usages are not differentiated by subject field. Ambiguity may also result when one or more of words in a multi-word term are dropped in an effort to be concise. For instance, shortening “class attribute” to simply “attribute” would result in ambiguity in a standard where attributes are associated with things other than classes.
Assigning a term type value
In the case of synonyms described above, often the synonyms correspond to abbreviations or spelling variants of a term, or even a representation of the concept in the form of a formula, symbol, or equation. In this case it is necessary to identify the “type” of term, by using the
<tbx:termType> element.
Defining the term
Each entry requires a definition. The definition is extremely important and shall therefore be drafted very carefully. Definitions shall be drafted in accordance with ISO 704. Definitions shall only contain the essential information to describe the concept, and therefore shall not contain notes, examples, usage notes, or other extraneous information. Dedicated elements are available for each of these types of information. Definitions are normally expressed in a single sentence. If the definition describes a concept that is restricted to a specific area or application, use the
<tbx:subjectField> element to specify the subject field.
If the definition originates from another standard, this source shall be added after the definition, in the
<tbx:source> element.
Adding cross references
Cross references are used to point readers to the entries of terms that have a similar, related meaning. To add a cross reference, use
<tbx:crossReference>for pointers to other entries in the same standard, and
<tbx:externalCrossReference> to point to entries in other standards.
Cross references will appear after the words “Related term: ”, after the definition.
To highlight a term in a definition that is also defined in Section 3, use
<tbx:entailedTerm> around the term, within the definition.
Referencing other sections of the standard
Sometimes it is useful to point the reader to another section of the standard, such as a table or figure, for additional information about the term and concept. For this purpose, use the element
<tbx:see>
Content Types
The elements in TBX-ISO-TML that can contain text can be formatted in various ways to support presentational styles such as bold, italics, and superscript. Some elements allow no formatting at all.
No formatting
The following elements allow no formatting at all, because their content is meant to be a plain value:
Formatted text but not pointers to other terms
The following elements may have most of the structures and formats allowed in NISO STS text, but may not reference other terms in the document and thus may not contain
<tbx:entailedTerm>
Glossary
data category | A type of information stored in a terminology database. Data categories typically correspond to the fields in a terminology management system. Examples include definition and part of speech. |
---|---|
picklist | A list of values from which only one value can be selected for a given data category. For instance, the partOfSpeech data category allows only one of the following values: noun, verb, adj (adjective), and adv (adverb). |
terminology database | a database specifically designed to store terms and information about terms |
terminology management system (TMS) | a software program designed for managing a terminology database |