ISO 24627-3:2021 言語リソース管理—包括的な注釈フレームワーク（ComAF）—パート3：図式セマンティックオーサリング（DSA）

この規格プレビューページの目次

序文Foreword
序章Introduction
1 スコープ1 Scope
2 参考文献2 Normative references
3 用語と定義3 Terms and definitions

※一部、英文及び仏文を自動翻訳した日本語訳を使用しています。

序文

ISO (国際標準化機構) は、各国の標準化団体 (ISO メンバー団体) の世界的な連合です。国際規格の作成作業は、通常、ISO 技術委員会を通じて行われます。技術委員会が設立された主題に関心のある各会員団体は、その委員会に代表される権利を有します。 ISOと連携して、政府および非政府の国際機関もこの作業に参加しています。 ISO は、電気技術の標準化に関するすべての問題について、国際電気標準会議 (IEC) と緊密に協力しています。

この文書の作成に使用された手順と、今後の維持のために意図された手順は、ISO/IEC 指令のPart 1 で説明されています。特に、さまざまな種類の ISO 文書に必要なさまざまな承認基準に注意する必要があります。この文書は、ISO/IEC 指令のPart 2 の編集規則に従って起草されました ( www.iso.org/directives を参照)

このドキュメントの要素の一部が特許権の対象となる可能性があることに注意してください。 ISO は、そのような特許権の一部または全部を特定する責任を負わないものとします。ドキュメントの開発中に特定された特許権の詳細は、序文および/または受信した特許宣言の ISO リストに記載されます ( www.iso.org/patents を参照)

このドキュメントで使用されている商号は、ユーザーの便宜のために提供された情報であり、保証を構成するものではありません。

規格の自主的な性質の説明、適合性評価に関連する ISO 固有の用語と表現の意味、および技術的貿易障壁 (TBT) における世界貿易機関 (WTO) の原則への ISO の準拠に関する情報については、以下を参照してください。 www.iso.org/iso/foreword.html .

このドキュメントは、技術委員会 ISO/TC 37, 言語と用語、小委員会 SC 4, 言語リソース管理によって作成されました。

ISO 24627 シリーズのすべての部品のリストは、ISO Web サイトにあります。

序章

グラフ (ノードとリンクで構成される図) は、ドキュメント (インスタンスデータ) とデータスキーマの両方を表現および視覚化するために、何十年も使用されてきました。このドキュメントは、ドキュメント (データスキーマではない) のグラフベースの表現 (視覚化ではない) に関するものです。

ドキュメントのグラフベースの表現と視覚化は、コンセプトマップ、 ^[¹⁵^]マインドマップ、引数マップなどによって対処されます。理論言語学と人工知能は、セマンティックネットワーク、メンタルスペース、 ^[¹⁰^]談話表現構造、 ^[¹³^]などに関連するグラフベースのコンテンツ視覚化も使用しています。

データスキーマ (またはオントロジー、用語、メタモデルなど) のグラフベースの視覚化は、より一般的な方法です。オントロジーは、多くの場合、ノードがクラス (およびデータ型) でリンクがプロパティ (関係) であるグラフとして視覚化されます。 ISO 24156-1 は、コンセプトモデリングの UML ベースの視覚化を指定します。通常、他のメタモデルも同様の図で表されます。

このドキュメントは、グラフドキュメントのデータスキーマを提供し、論理的なドキュメント構造を明示することによって構成と理解を容易にします。グラフの視覚化や操作、または既存のドキュメントへの注釈の定義は対象としませんが、セマンティックオーサリングのためにドキュメントのグラフィカル/ダイアグラム表現に対処します。つまり、人々がコンピューターディスプレイ上で構文/意味構造を直接表示および操作できるようにします。またはそれらの将来の代替案。従来のテキストドキュメントの直線性は、音声言語の直線性によるものであり、人とドキュメントの間の相互作用を制約し、人々が読み書きすることを困難にしています。 DSA は、人々が読み書きしやすくするために、テキストよりも明確な構造を持つグラフィカル/ダイアグラムドキュメントを定義します。 DSA に基づくドキュメントは、適切な視覚化と簡単な操作を含むいくつかのユーザーインターフェイスと共に、人と機械の間のコラボレーションを強化できます。

DSA は主に構文またはドキュメント構造を扱います。一部の断片的なセマンティック構造にも対処しますが、より体系的なセマンティクス (ドキュメントとその意味または論理形式との間の正式なマッピング) を別の仕様で提供できるため、マシンは DSA ベースのドキュメントをよりよく「理解」し、情報共有とコンセンサスをより適切に支援できます。人々の間で建物。

図 1 は、DSA およびその他の種類のドキュメントを含むワークフローを示しています。上半分の DSA ベースのドキュメントは、意味表現と注釈に関する適切な標準に基づいて、(命題の内容を保持しながら) 機械が理解できるドキュメントとの間で自動的に変換できます。逆変換は一般に自動化できませんが、これらの機械が理解できる文書から伝統的なテキスト文書を自動的に生成することは可能です (命題の内容も保持します) DSA ベースのドキュメントは (いくつかの適切なユーザーインターフェイスと共に) 人々がテキストドキュメントよりも簡単に作成および解釈できるため、通常、人々は DSA ベースのドキュメントに触れて見ることができますが、従来のドキュメントは従来の手順 (特許出願など) に使用できます。そして口頭発表。

図 1 — DSA を含むドキュメントワークフロー

DSA は、ISO TS 24617-, ISO 2461, ISO 24617 (SemAF) などの他の標準を使用することを前提としていますが、他の関連文献からの洞察も取り入れています^[¹^][⁸^][⁹^][¹⁰^][¹¹^][¹²^][¹³^][¹⁴^][¹⁵^][¹⁶^][¹⁷^][¹⁸^] 。

1 スコープ

このドキュメントでは、ドキュメント (データスキーマではなくインスタンスデータ) をグラフとして (視覚化せずに) 表現する方法を指定します。文書データの視覚化または操作方法は指定されていませんが、さまざまなグラフベースの柔軟なユーザーインターフェイスを可能にすることで、人々が文書をより簡単に作成および理解できるようにすることを目的としています。これに関連して、このドキュメントは既存のドキュメントへの注釈も指定しませんが、明示的な論理構造を持つドキュメントのスキーマを指定します。

2 参考文献

このドキュメントには規範的な参照はありません。

3 用語と定義

このドキュメントでは、次の用語と定義が適用されます。

ISO と IEC は、次のアドレスで標準化に使用する用語データベースを維持しています。

3.1

ハイパーノード

グラフセグメントであるノード

3.2

セグメント

グラフセグメントまたはデータセグメント (テキスト、画像、オーディオ、ビデオなど) のいずれかである、DSA ベースのドキュメントの参照可能な部分。

3.3

セマンティックオーサリング

論理構造を明示しながら文書を構成する

参考文献

[1]	ISO 24156-1:2014, 用語作業における概念モデリングのグラフィック表記と UML との関係 - Part 1: 用語作業における UML 表記の使用に関するガイドライン
[2]	ISO 24615-2:2018, 言語リソース管理 — 構文注釈フレームワーク (SynAF) — Part 2: XML シリアライゼーション (タイガー語彙)
[3]	ISO 24612:2012, 言語リソース管理 — 言語注釈フレームワーク (LAF)
[4]	ISO 24617-1:2012, 言語リソース管理 — セマンティックアノテーションフレームワーク (SemAF) — Part 1: 時間とイベント (SemAF-Time, ISO-TimeML)
[5]	ISO 24617-2:2012, 言語リソース管理 — セマンティックアノテーションフレームワーク (SemAF) — Part 2: 対話行為
[6]	ISO/TS 24617-5:2014, 言語リソース管理、セマンティックアノテーションフレームワーク (SemAF)、 Part 5: 談話構造 (SemAF-DS)
[7]	ISO/IEC 15938-5:2003/Amd.1:2004, 情報技術 — マルチメディアコンテンツ記述インターフェイス — Part 5: マルチメディア記述スキーム/ — 修正 1: マルチメディア記述スキーム拡張
[8]	Asher N, Lascarides A, 2003) 会話の論理。ケンブリッジ大学出版局。
[9]	Carlson L, Marcu D, Okurowski ME, 2003) 修辞構造理論のフレームワークにおける談話タグ付きコーパスの構築。 J van Kuppevelt & R Smith (eds.) Current Directions in Discourse and Dialogue, 85-112, Kluwer Academic Publisher
[10]	Gilles Fauconnier, 1994) 精神空間: 自然言語における意味構築の側面。ニューヨーク：ケンブリッジ大学出版局。
[11]	Haji J. et al., 2006) プラハディペンデンシーツリーバンク 2.言語データコンソーシアム、フィラデルフィア。
[12]	Hasida K.、1991) 制約ベースの文法の複雑さの軽減。 Barwise J, Gawron JM, Plotkin G, Tutiya S (編) Situation Theory and Its Applications, 第 2 巻、405-42
[13]	ハンス。 Kamp (1981) 真実と意味表現の理論。 Jeroen Groenendijkらで。 (eds.) 言語の研究における正式な方法。数学センタートラクト 135, アムステルダム大学。
[14]	Mann W.、Thompson S.、1988) 修辞構造理論: テキスト組織の理論。テキスト、8(3) 243-28
[15]	Novak JD, 2010) 知識の学習、作成、および使用: 学校や企業における促進ツールとしてのコンセプトマップ。 e ラーニングと知識社会のジャーナル、6, (3)、21-3
[16]	Palmer M, Gildea D, Kingsbury P, 2005) The Proposition Bank: An Annotated Corpus of Semantic Roles.計算言語学、31, (1)、71-10
[17]	Prasad R, Dinesh N, Lee A, Miltsakaki E, Robaldo L, Joshi A 他、2008) Penn Discourse Treebank 2.言語資源と評価に関する第 6 回国際会議の議事録。
[18]	ペンツリーバンクプロジェクト、 https://www.cis.upenn.edu/~treebank/

Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the different types of ISO documents should be noted. This document was drafted in accordance with the editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives ).

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any patent rights identified during the development of the document will be in the Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents ).

Any trade name used in this document is information given for the convenience of users and does not constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions related to conformity assessment, as well as information about ISO's adherence to the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html .

This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee SC 4, Language resource management.

A list of all parts in the ISO 24627 series can be found on the ISO website.

Introduction

Graphs (diagrams consisting of nodes and links) have been used for decades to represent and visualize both documents (instance data) and data schemas. This document concerns graph-based representation (not visualization) of documents (not data schemas).

Graph-based representation and visualization of documents are addressed by concept maps,^[¹⁵^] mind maps, argument maps, and so on. Theoretical linguistics and artificial intelligence have also used graph-based content visualization associated with semantic network, mental space,^[¹⁰^] discourse representation structure,^[¹³^] and so forth.

Graph-based visualization of data schemas (or ontologies, terminologies, metamodels, etc.) is a more usual practice. Ontologies are often visualized as graphs in which nodes are classes (and datatypes) and links are properties (relations). ISO 24156-1 specifies a UML-based visualization of concept modelling. Other metamodels are usually represented as similar diagrams, too.

This document gives a data schema of graph documents to facilitate composition and comprehension by making logical document structure explicit. It neither covers visualizations or manipulations of graphs nor does it define annotations to existing documents, but rather it addresses graphical/diagrammatic representation of documents for the sake of semantic authoring: i.e., for people to directly view and manipulate syntactic/semantic structures on computer displays or their future alternatives. The linearity of traditional text documents is due to the linearity of speech languages, which constrains the interaction between people and documents, making it hard for people to read and write. DSA defines graphical/diagrammatic documents with more explicit structures than in text in order to make it easier for people to read and write. Documents based on DSA, together with some user interfaces involving appropriate visualizations and easy operations, can enhance collaborations among people and between people and machines.

DSA mainly deals with syntactic or document structures. It addresses some fragmentary semantic structures as well, but more systematic semantics (formal mapping between documents and their meanings or logical forms) can be provided by another specification so that machines better ‘understand’ DSA-based documents and thereby better assist information sharing and consensus building among people.

Figure 1 shows a workflow involving DSA and other types of documents. The DSA-based documents in the upper half can be automatically converted (while preserving propositional content) to and from machine-understandable documents based on appropriate standards on semantic representations and annotations. It is possible to automatically generate traditional text documents from these machine-understandable documents (while preserving the propositional content, too), though the inverse conversion cannot generally be automated. Since DSA-based documents (together with some appropriate user interfaces) are easier for people to compose and interpret than text documents, people can usually touch and see DSA-based documents whereas traditional documents could be used for legacy procedures (such as patent applications) and oral presentations.

Figure 1—Document workflow involving DSA

DSA is a minimal metamodel for ISO TS 24617-5 (SemAF-DS), which in turn is based on ISO/IEC 15938.5/Amd.1 (MPEG-7 MDS AMD1 ― Linguistic description scheme). The machine-understandable documents in Figure 1 are assumed to use other standards including ISO 24615 (SynAF), ISO 24612 (LAF) and ISO 24617 (SemAF) while also incorporating insights from other relevant literature^[¹^][⁸^][⁹^][¹⁰^][¹¹^][¹²^][¹³^][¹⁴^][¹⁵^][¹⁶^][¹⁷^][¹⁸^].

1 Scope

This document specifies how to represent (not visualize) documents (instance data, not data schemas) as graphs. It does not specify how to visualize or operate on document data, but it aims at making documents easier for people to compose and comprehend by allowing for various graph-based flexible user interfaces, possibly incorporating document-visualization practices (see Introduction). In this connection, this document does not specify annotations to existing documents either, but rather it specifies a schema of documents with explicit logical structures.

2 Normative references

There are no normative references in this document.

3 Terms and definitions

For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

3.1

hypernode

node which is a graph segment

3.2

segment

referenceable part of a DSA-based document, which is either a graph segment or a data segment (text, image, audio, video, etc.)

3.3

semantic authoring

composition of documents while making their logical structures explicit

Bibliography

[1]	ISO 24156-1:2014, Graphic notations for concept modelling in terminology work and its relationship with UML — Part 1: Guidelines for using UML notation in terminology work
[2]	ISO 24615-2:2018, Language resource management — Syntactic annotation framework (SynAF) — Part 2: XML serialization (Tiger vocabulary)
[3]	ISO 24612:2012, Language resource management — Linguistic annotation framework (LAF)
[4]	ISO 24617-1:2012, Language resource management — Semantic annotation framework (SemAF) — Part 1: Time and events (SemAF-Time, ISO-TimeML)
[5]	ISO 24617-2:2012, Language resource management — Semantic annotation framework (SemAF) — Part 2: Dialogue acts
[6]	ISO/TS 24617-5:2014, Language resource management, Semantic annotation framework (SemAF), Part 5: Discourse structure (SemAF-DS)
[7]	ISO/IEC 15938-5:2003/Amd.1:2004, Information technology — Multimedia content description interface — Part 5: Multimedia description schemes/ — Amendment 1: Multimedia description schemes extensions
[8]	Asher N., Lascarides A., 2003) Logics of Conversation. Cambridge University Press.
[9]	Carlson L., Marcu D., Okurowski M. E., 2003) Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory. J. van Kuppevelt & R. Smith (eds.) Current Directions in Discourse and Dialogue, 85-112, Kluwer Academic Publishers.
[10]	Gilles Fauconnier, 1994) Mental Spaces: Aspects of Meaning Construction in Natural Language. New York: Cambridge University Press.
[11]	Haji J. et al., 2006) Prague Dependency Treebank 2.0. Linguistic Data Consortium, Philadelphia.
[12]	Hasida K., 1991) Reducing Complexity of Constraint-Based Grammars. In Barwise J., Gawron J. M., Plotkin G., Tutiya S., (eds.) Situation Theory and Its Applications, Volume 2, 405-424.
[13]	Hans. Kamp (1981) A theory of truth and semantic representation. In Jeroen Groenendijk, et al. (eds.) Formal Methods in the Study of Language. Mathematical Centre Tract 135, University of Amsterdam.
[14]	Mann W., Thompson S., 1988) Rhetorical Structure Theory: A Theory of Text Organisation. Text, 8(3) 243-281.
[15]	Novak J. D., 2010) Learning, Creating, and Using Knowledge: Concept maps as facilitative tools in schools and corporations. Journal of e-Learning and Knowledge Society, 6(3), 21-30.
[16]	Palmer M., Gildea D., Kingsbury P., 2005) The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31(1), 71-105.
[17]	Prasad R., Dinesh N., Lee A., Miltsakaki E., Robaldo L., Joshi A. et al., 2008) The Penn Discourse Treebank 2.0. Proceedings of the 6th International Conference on Language Resources and Evaluation.
[18]	The Penn Treebank Project, https://www.cis.upenn.edu/~treebank/

ISO 24627-3:2021 言語リソース管理—包括的な注釈フレームワーク（ComAF）—パート3：図式セマンティックオーサリング（DSA） | ページ 2

序文

序章