ISO/TR 19358:2002 人間工学—音声技術のテストの構築と適用

この規格プレビューページの目次

序文Foreword
序章Introduction
1 スコープ1 Scope
2 用語と定義2 Terms and definitions

※一部、英文及び仏文を自動翻訳した日本語訳を使用しています。

序文

ISO (国際標準化機構) は、各国の標準化団体 (ISO メンバー団体) の世界的な連合です。国際規格の作成作業は、通常、ISO 技術委員会を通じて行われます。技術委員会が設立された主題に関心のある各会員団体は、その委員会に代表される権利を有します。 ISOと連携して、政府および非政府の国際機関もこの作業に参加しています。 ISO は、電気技術の標準化に関するすべての問題について、国際電気標準会議 (IEC) と緊密に協力しています。

国際規格は、ISO/IEC 指令で指定された規則に従って起草されます。 3.

技術委員会の主な任務は、国際規格を準備することです。技術委員会によって採択されたドラフト国際規格は、投票のためにメンバー団体に配布されます。国際規格として発行するには、投票するメンバー団体の少なくとも 75% による承認が必要です。

例外的な状況で、技術委員会が、国際規格として通常公開されているものとは異なる種類のデータ (たとえば、「最新技術」) を収集した場合、参加メンバーの単純多数決により、次のことを決定することができます。テクニカルレポートを発行します。テクニカルレポートは、本質的に完全に有益であり、提供するデータがもはや有効または有用でないと見なされるまで、レビューする必要はありません。

このテクニカルレポートの一部の要素が特許権の対象となる可能性があることに注意してください。 ISO は、そのような特許権の一部または全部を特定する責任を負わないものとします。

ISO/TR 19358 は、技術委員会 ISO/TC 159, 人間工学、小委員会 SC 5, 物理環境の人間工学によって作成されました。

序章

このテクニカルレポートは、音声技術システム (自動音声認識装置、テキスト読み上げシステム、および音声信号を利用するその他のデバイス) のパフォーマンスを決定する方法と、適切なテスト手順の選択についてアドバイスします。

人間同士の音声コミュニケーションは、このテクニカルレポートには含まれていませんが、ISO 9921 でカバーされています。

1 スコープ

このテクニカルレポートは、音声関連の製品およびサービスのテストと評価を扱っており、音声技術の分野で活躍する専門家、ならびにそのようなシステムの購入者およびユーザーによる使用を目的としています。

上級ユーザーは、 EAGLES Handbook of Standards and Resources for Spoken Language Systems (Gibbon et al. 1997) およびEAGLES Handbook of Multimodel and Spoken dialog Systemsの詳細な評価の章を参照してください。 EAGLES は、欧州共同体が部分的に後援した研究プロジェクトでした。

2 用語と定義

このテクニカルレポートでは、次の用語と定義が適用されます。

2.1

自動音声認識

ASR

人間の発話を入力手段として受け入れるシステムの能力

2.2

ダイアログ

音声システムと人間の話者との間のインタラクティブな情報交換

2.3

対話管理

音声システムと人間との間の対話の制御

2.4

自然言語処理

NLP

人間に由来するテキストの自動処理

2.5

客観的評価

通常、事前に録音された音声を使用して、測定中に人間の被験者が直接関与しない評価

2.6

パフォーマンス測定

システムの性能を評価するために使用される手段。通常、診断または相対的な性能方法によって行われる。

2.7

話者依存システム

特定のユーザーの音声でトレーニングされる音声認識システムの必要性

2.8

話者識別

可能性のある話者の閉じたセットからの特定の話者の識別

2.9

スピーカー独立方式

システムは特定のユーザー向けにトレーニングされていませんが、選択したグループ (ネイティブスピーカー、大人など) のすべてのユーザーに適用できます。

2.10

話者認識

話者の身元を識別または検証する技術の総称

2.11

スピーカー検証

スピーチの特定の側面を評価することによる個人の身元の確認

2.12

話し方

発話は、孤立しているか、連続しているか、読み上げられているか、自発的であるか、または口述されている場合があります。

2.13

音声通信

スピーチ、スピーキング、およびヒアリングのモダリティを使用して情報を伝達または交換する

1 年生から初級:スピーチコミュニケーションには、短い文章、文、単語のグループ、孤立した単語、ハミング、単語の一部が含まれる場合があります。

2.14

音声認識

話し言葉を認識された単語に変換できる機械でのプロセス

注記1:これは、コンピュータが音響音声信号をテキストに変換するプロセスです。

2.15

音声合成

データからの音声生成

2.16

ことばの理解

音声の意味内容を抽出する技術

2.17

主観的評価

測定中に被験者が直接関与する評価

2.18

テキストから音声への合成

テキストからの可聴音声の生成

2.19

ボキャブラリー

特定の文脈で使用される一連の単語

2.20

語彙数

音声認識エンジンの語彙の単語数

参考文献

[1]	ISO 9921: ^—1) 、人間工学 — 音声コミュニケーションの評価
[2]	Cohen J.、公称尺度の一致係数、教育および心理測定、 20 、pp.37-46, 1960
[3]	Cohen J.、Weighted kappa: スケーリングされた不一致または部分的なクレジットの条項を備えた名目上のスケールの合意、 Psychological Bulletin 、( 70 )4, pp.213-220
[4]	Barnett , J., Bamberg , P., Held , M., Huerta , J., Manganaro , L. and Weiss , A. (1995)、 5 つのヨーロッパ言語における大語彙の孤立した単語認識における比較パフォーマンス。 pro Eurospeech '95 マドリッド、スペイン、pp. 189-192
[5]	ELR, ELRA/ELDA, " http://www.icp.grenet.fr/ELRA/home.html "
[6]	Gibbon 、 Dafydd 、Inge Mertins & Roger Moore編。（2000）。 Multimodal and Spoken Language Systems: Resources, Terminology and Product Evaluation のハンドブック。ボストン、ドルドレヒト、ロンドン: Kluwer Academic Publishers
[7]	ギボン、ダフィッド、ロジャー・ムーア、リチャード・ウィンスキー編。（1997）。音声言語システムの標準とリソースのハンドブック。ベルリン：ムートン・ド・グリュイター
[8]	King , M. et al., Evaluation of Natural Language Processing Systems - EAGLES 最終報告書、EAG-WEG-PR.2, (1996 年 10 月)、ISBN-87-90708-00-8
[9]	K rippendorf 、K.、 Content Analysis: An Introduction to Its Methodology 、Sage Publications, カリフォルニア州ビバリーヒルズ、1980
[10]	LD, "http://www.ldc.upenn.edu"
[11]	Leeuwen 、DA van, Steeneken 、HJM, 音声言語システムの標準とリソースのハンドブック、認識システムの評価の章、pp. 381-40ムートン・ド・グリュイター、ベルリン、ニューヨーク (1997)
[12]	L eeuwen 、DA van, および S teeneken 、HJM, マルチモデルおよび音声対話システムのハンドブック、章: 消費者既製 (COTS) 音声技術製品およびサービス評価、pp. 204-23クルワー学術出版社.ベルリン、ニューヨーク (2000)、ISBN 0-7923-7904-7
[13]	Sparck Jones, K.、 Galliers 、J.R.、 Evaluating Natural Language Processing Systems 、Springer-Verla, ISBN-3-540-61309-9
[14]	Steeneken 、HJM Digital Speech Processing 、第 6 章、音声処理システムの品質評価。 Kluwer Academic Publishers ボストン/ドルドレヒト/ロンドン (1992)
[15]	Walker , M.、Kamm, C.、Litman, D.、 Towards Development General Models of Usability with PARADISE 、自然言語工学、音声言語対話システム工学のベストプラクティス、特別号、第 6 巻、 2000 年 10 月 3 日
[16]	軍事利用のための音声および言語技術システムの可能性: アプリケーションおよび技術指向の調査.エド。 HJM Steeneken, NATO RTO, ヌイイ・シュル・セーヌ、(1996)

Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.

International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, 3.

The main task of technical committees is to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval by at least 75 % of the member bodies casting a vote.

In exceptional circumstances, when a technical committee has collected data of a different kind from that which is normally published as an International Standard ("state of the art", for example), it may decide by a simple majority vote of its participating members to publish a Technical Report. A Technical Report is entirely informative in nature and does not have to be reviewed until the data it provides are considered to be no longer valid or useful.

Attention is drawn to the possibility that some of the elements of this Technical Report may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights.

ISO/TR 19358 was prepared by Technical Committee ISO/TC 159, Ergonomics, Subcommittee SC 5, Ergonomics of the physical environment.

Introduction

This Technical Report advises on methods for determining the performance of speech-technology systems (automatic speech recognizers, text-to-speech systems and other devices that make use of the speech signal) and on selecting appropriate test procedures.

Human-to-human speech communication is not included in this Technical Report but is covered by ISO 9921.

1 Scope

This Technical Report deals with the testing and assessment of speech-related products and services, and is intended for use by specialists active in the field of speech technology, as well as purchasers and users of such systems.

Advanced users are referred to the detailed evaluation chapters of the EAGLES Handbook of Standards and Resources for Spoken Language Systems (Gibbon et al. 1997) and the EAGLES Handbook of Multimodel and Spoken dialogue Systems. EAGLES was a research project partly sponsored by the European Community.

2 Terms and definitions

For the purposes of this Technical Report, the following terms and definitions apply.

2.1

Automatic Speech Recognition

ASR

ability of a system to accept human speech as a means of input

2.2

dialogue

interactive exchange of information between the speech system and the human speaker

2.3

dialogue management

control of the dialogue between the speech system and the human

2.4

Natural Language Processing

NLP

automatic processing of text originating from humans

2.5

objective assessment

assessment without direct involvement of human subjects during measurement, typically using prerecorded speech

2.6

performance measures

means used to assess the system performance, typically by diagnostic or relative performance methods

2.7

speaker-dependent system

need of a speech-recognition system to be trained with the speech of the specific user

2.8

speaker identification

identification of a particular speaker from a closed set of possible speakers

2.9

speaker-independent system

system not trained for a specific user but applicable for any user of a selected group (native speakers, adults, etc.)

2.10

speaker recognition

general term for technology which identifies or verifies the identity of a speaker

2.11

speaker verification

verification of the identity of a person by assessment of specific aspects of his/her speech

2.12

speaking style

speech may be isolated or continuous, read or spontaneous, or dictated

2.13

speech communication

conveying or exchanging information using speech, speaking, and hearing modalities

Note 1 to entry: Speech communication may involve brief texts, sentences, groups of words, isolated words, hums and parts of words.

2.14

speech recognizer

process in a machine capable of converting spoken language to recognized words

Note 1 to entry: This is the process by which a computer transforms an acoustic speech signal into text.

2.15

speech synthesis

generation of speech from data

2.16

speech understanding

technology that extracts the semantic contents of speech

2.17

subjective assessment

assessment with the direct involvement of human subjects during measurement

2.18

text-to-speech synthesis

generation of audible speech from a text

2.19

vocabulary

set of words used in a particular context

2.20

vocabulary size

number of words in a vocabulary of the speech recognizer

Bibliography

[1]	ISO 9921:— ¹⁾ , Ergonomics — Assessment of speech communication
[2]	Cohen J., A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20 , pp. 37-46, 1960
[3]	Cohen J., Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit, Psychological Bulletin, ( 70 )4, pp.213-220
[4]	Barnett, J., Bamberg, P., Held, M., Huerta, J., Manganaro, L. and Weiss, A. (1995), Comparative performance in large vocabulary isolated word recognition in five European languages. Proc. Eurospeech ’95 Madrid, Spain, pp. 189-192
[5]	ELRA (European Linguistic Resources Association), ELRA/ELDA," http://www.icp.grenet.fr/ELRA/home.html "
[6]	Gibbon, Dafydd, Inge Mertins & Roger Moore, eds. (2000). Handbook of Multimodal and Spoken Language Systems: Resources, Terminology and Product Evaluation. Boston, Dordrecht, London: Kluwer Academic Publishers
[7]	Gibbon, Dafydd, Roger Moore & Richard Winski, eds. (1997). Handbook of Standards and Resources for Spoken Language Systems. Berlin: Mouton de Gruyter
[8]	King, M. et al., Evaluation of Natural Language Processing Systems - EAGLES Final Report, EAG-WEG-PR.2, (October 1996), ISBN-87-90708-00-8
[9]	Krippendorf, K., Content Analysis: An Introduction to Its Methodology, Sage Publications, Beverly Hills, CA, 1980
[10]	LDC (Linguistic Data Consortium),"http://www.ldc.upenn.edu"
[11]	Leeuwen, D.A. van, and Steeneken, H.J.M., Handbook of Standards and Resources for Spoken Language Systems, Chapter Assessment of recognition systems, pp. 381-407. Mouton de Gruyter, Berlin, New York (1997)
[12]	Leeuwen, D.A. van, and Steeneken, H.J.M., Handbook of Multimodel and Spoken Dialogue Systems, Chapter: Consumer off-the-shelf (COTS) speech technology product and service evaluation, pp. 204-239. Kluwer academic publisher. Berlin, New York (2000), ISBN 0-7923-7904-7
[13]	Sparck Jones, K., Galliers, J. R, Evaluating Natural Language Processing Systems, Springer-Verlag (1995), ISBN-3-540-61309-9
[14]	Steeneken, H.J.M. Digital Speech Processing, Chapter 6, Quality evaluation of speech processing systems. Kluwer Academic Publishers Boston/Dordrecht/London (1992)
[15]	Walker, M., Kamm, C. and Litman, D., Towards Developing General Models of Usability with PARADISE, Natural Language Engineering, Best Practice in Spoken Language Dialogue System Engineering, Special Issue, Volume 6, 3, October 2000
[16]	Potentials of speech and language technology systems for military use: an application and technology-oriented survey. Ed. H.J.M. Steeneken, NATO-RTO, Neuilly sur Seine, (1996)

ISO/TR 19358:2002 人間工学—音声技術のテストの構築と適用 | ページ 2

序文

序章