ISO/IEC 30137-4:2021 情報技術—ビデオ監視システムでの生体認証の使用—パート4：グラウンドトゥルースとビデオ注釈手順

この規格プレビューページの目次

序文Foreword
序章Introduction
1 スコープ1 Scope
2 参考文献2 Normative references
3 用語と定義3 Terms and definitions

※一部、英文及び仏文を自動翻訳した日本語訳を使用しています。

序文

ISO (国際標準化機構) と IEC (国際電気標準会議) は、世界標準化のための専門システムを形成しています。 ISO または IEC のメンバーである国家機関は、技術活動の特定の分野を扱うために、それぞれの組織によって設立された技術委員会を通じて、国際規格の開発に参加しています。 ISO と IEC の技術委員会は、相互に関心のある分野で協力しています。 ISO および IEC と連携して、政府および非政府の他の国際機関もこの作業に参加しています。

この文書の開発に使用された手順と、今後の維持のために意図された手順は、ISO/IEC 指令で説明されています。 1. 特に、さまざまなタイプの文書に必要なさまざまな承認基準に注意する必要があります。この文書は、ISO/IEC 指令の編集規則に従って作成されました。 2 ( www.iso.org/directives or www.iec.ch/members_experts/refdocs を参照)

このドキュメントの要素の一部が特許権の対象となる可能性があることに注意してください。 ISO および IEC は、そのような特許権の一部またはすべてを特定する責任を負わないものとします。文書の作成中に特定された特許権の詳細は、序論および/または受信した特許宣言の ISO リスト ( www.iso.org/patents を参照) または受信した特許宣言の IEC リスト ( patents. iec.ch ）。

このドキュメントで使用されている商号は、ユーザーの便宜のために提供された情報であり、保証を構成するものではありません。

規格の自主的な性質の説明、適合性評価に関連する ISO 固有の用語と表現の意味、および技術的貿易障壁 (TBT) における世界貿易機関 (WTO) の原則に対する ISO の遵守に関する情報については、 www を参照してください。 .iso.org/iso/foreword.html . IEC については、 www.iec.ch/understanding-standards を参照してください。

このドキュメントは、合同技術委員会 ISO/IEC JTC 1, 情報技術、小委員会 SC 37, バイオメトリクスによって作成されました。

ISO/IEC 30137 シリーズのすべての部品のリストは、ISO および IEC の Web サイトにあります。

序章

自動顔認識 (AFR) のパフォーマンスが大幅に向上した結果、e パスポートにエンコードされた顔画像を、コントロールポイントで旅行者が提示した顔と比較する自動国境管理などのアプリケーションが生まれました。これらの第 1 世代の AFR システムの成功により、サプライヤーは、被験者が生体認証比較の使用を必ずしも認識していない場合や、画像収集の環境が最適とはほど遠い場合に、他のアプリケーションを検討するようになりました。このような管理の行き届いていない識別アプリケーションではパフォーマンスが低下するため、訓練を受けた担当者によるより多くの関与が必要になる場合があります。

ISO/IEC 30137 シリーズは、ウォッチリストに対するリアルタイム操作やビデオデータのイベント後の分析など、いくつかのシナリオで、ビデオ監視システム (VSS) における生体認証技術 (主に自動顔認識) の使用に関するガイダンスを提供します。 ISO/IEC 30137 シリーズには、カメラの選択と配置からシステム仕様、テスト、保守までのガイダンスが含まれています。 ISO/IEC 30137 シリーズでは、VSS という用語を使用して、古いが一般的に使用されている閉回路テレビ (CCTV) という用語を置き換えています。

ISO/IEC 30137 シリーズは、人間の注釈に対応しています。車、動物、荷物など、人間以外のオブジェクトの注釈を提供することは意図されていません。

このドキュメントに準拠した記録は、次のいずれかの方法でビデオから作成できます。

ソフトウェアがビデオを分析し、このドキュメントで定義されている量を推定する、または
人間のレビュアーが手動でビデオに注釈を付け、受信システム (つまり、標準化されたデータをデコード、解釈、および使用する任意のサービスまたはデバイス) で使用できるグラウンドトゥルースビデオアノテーションを生成することを目標にします。

これは、次のようないくつかのアプリケーションをサポートしています。

人数カウント:
- ある場所にいる人の数を示す、
- 特定のポイントまたはボリュームを横断する人数の記述、
- 人口密度の記述（群集など）、
- 群衆密度の測定、
- 群集行動分析のパフォーマンス。
自動検出と追跡:
- 対象者を監視リストに自動登録 (追加) する、網羅的または行動分析後、
- 被写体、および被写体の一部（顔など）の検出、
- 時間の経過に伴う被写体の追跡 (例: 単一のビデオでの動きの追跡)
- 被写体が異なるカメラで同時に見られる場合や、被写体が複数のカメラの前に順番に現れる場合など、カメラネットワークを介して現れる被写体の追跡、
- 再識別、2 つ以上のビデオシーケンスにわたって対象のアイデンティティを関連付けるプロセス。
自動識別:
- 法執行機関、ウォッチリスト (否定的な識別、ブラックリスト) に存在する関心のある対象を探します。
- 法執行機関、監視リストに対する 1 つまたは複数のカメラからのイベント後の VSS ビデオのレビューにおけるアプリケーション、
- 優遇サービスを提供する個人を探している私的な商業設定、
- 協力的な登録対象の識別 (積極的なアクセス制御、ホワイトリスト)

このドキュメントには、次の情報の注釈が含まれています。

イメージングタイプ: シングルカメラ、シーケンシャルカメラ、ステレオカメラ、コンビネーション、カメラキャプチャスペクトル。
対象者が動画に登場する時間 (開始時間) と退場する時間 (終了時間)
- 主題の簡単な説明 (ビデオで何が見られますか?)
被写体の顔がいつ、どこに現れるか。
- 顔の簡単な説明 (ポーズ、向き、表情、オクルージョン)
被写体と顔の開始時間と終了時間の間の中間トラッキングポイント。
件名の絶対的な説明:
- 推定年齢、性別、
- 髪と目の色、
- 推定身長と肥満度、
- 服装や服装の色、
- メガネ・帽子、
- 最高の被写体の画像または最高の被写体の顔の画像。
他の被験者やグループとの被験者の相互作用。
他のビデオ要素 (バッグ、車など) との被験者の相互作用。
対象の既知の身元。
注釈が付けられていない他の被験者の存在。
アルゴリズムまたは受信システムが動作しない範囲外の関心領域。
不在: 主題を含む関心のある項目が不在であることが知られている場合。

標準化されたアノテーションは、評価、研究開発、運用展開をサポートします。

1 スコープ

このドキュメントは、人間、人間の顔、その他の体の部分、および画像に表示される任意のオブジェクトの注釈に関する要件を確立します。以下を指定します。

ビデオストリームに挿入されるメタデータ。
空間的および時間的グラウンドトゥルース情報の完全および部分的なエンコード:
- ビデオに存在するオブジェクト、および
- ビデオに存在しないオブジェクト。
既知および未知の主題の異なる注釈の手順。

このドキュメントでは、次のことを指定していません。

ビデオデータのエンコード。

2 参考文献

このドキュメントには規範的な参照はありません。

3 用語と定義

このドキュメントでは、次の用語と定義が適用されます。

ISO と IEC は、次のアドレスで標準化に使用する用語データベースを維持しています。

3.1

注釈

画像から注釈データを生成するプロセス

3.2

注釈データ

特定の VSS カメラの視野を通過する被写体に関連付けられたメタデータ

注記 1:このドキュメントに従ってインスタンスを作成するアノテーターは、サブジェクトのアノテーションが作成された基準を文書化する必要があります。たとえば、眼間距離が 12 ピクセル未満の顔には注釈を付けないというポリシーを設定できます。

注記 2:厳密な、厳密に制約された、または狭い一連の基準に従って注釈が作成された場合、検出、追跡、認識、またはアルゴリズムは、より寛容または一般的な基準が使用された場合よりも正確であることが期待されます。

注記3例えば，追跡アルゴリズムの評価は，適合しない方法でトラバースする被験者を除外するかもしれない．これには、被写体の進行方向、他の人や物体による遮り、カメラの操作機能 (正しいフォーカスなど)、または環境条件 (夜間または日中の操作など) などの要因が含まれる場合があります。

3.3

境界ボックス

注釈付きオブジェクトを囲む長方形の領域

注記1:長方形の長軸と短軸は画像の縁に平行である.回転したボックスの場合、ポリゴンアノテーションが使用されます。

3.4

境界ポリゴン

注釈付きオブジェクトを囲む任意の領域

3.5

ビデオ監視システム

保護区域の監視に必要となる可能性がある、送信および制御目的のカメラ機器、監視および関連機器からなるシステム。

3.6

ランダムアクセス

メディアアイテムの任意の部分にアクセスする機能

3.7

認識

被験者に生体認証識別子を割り当てるプロセス

3.8

身元

個人の画像から形成されたデータベースに対して生体認証モードの画像を比較することにより、対象の身元を決定するプロセス

注記1:これには通常、対象となる被験者がデータベースで見つからない場合に識別子を割り当てることは含まれません。

参考文献

[1]	ISO/IEC 39794-5, 情報技術 — 拡張可能な生体認証データ交換フォーマット — 5：顔画像データ
[2]	ISO/IEC 30137-1, 情報技術 — ビデオ監視システムにおけるバイオメトリクスの使用 — 1: システムの設計と仕様
[3]	IEC 62676-2-31, セキュリティアプリケーションで使用するビデオ監視システム — 2-31: ライブ配信とWebサービスによる制御
[4]	Mathias M, Benenson R, Pedersoli M, Van Gool L ベルとホイッスルを使用しない顔検出 Proc. ECCV, 2014
[5]	Cao Z, Simon T, Wei S-E, Sheikh Y. Realtime Multi Person 2D Pose Estimation usingアフィニティフィールド、Pro CVPR, 2017年 https://github.com/CMU-Perceptual-Computing-Lab/openpose
[6]	ISO 22311, 社会保障 — ビデオ監視 — エクスポートの相互運用性

Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.

The procedures used to develop this document and those intended for its further maintenance are described in the ISO/IEC Directives, 1. In particular, the different approval criteria needed for the different types of document should be noted. This document was drafted in accordance with the editorial rules of the ISO/IEC Directives, 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs ).

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights. Details of any patent rights identified during the development of the document will be in the Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents ) or the IEC list of patent declarations received (see patents.iec.ch ).

Any trade name used in this document is information given for the convenience of users and does not constitute an endorsement.

For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions related to conformity assessment, as well as information about ISO's adherence to the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html . In the IEC, see www.iec.ch/understanding-standards .

This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 37, Biometrics.

A list of all parts in the ISO/IEC 30137 series can be found on the ISO and IEC websites.

Introduction

Considerable improvements in the performance of automated face recognition (AFR) have resulted in applications such as automated border controls, where facial images encoded in ePassports are compared with the face presented by a traveller at a control point. The success of these first generation AFR systems has encouraged suppliers to consider other applications, where the subject is not necessarily aware of the use of biometric comparison and where the environment for collection of images can be far from optimal. The inferior performance in such less-controlled identification applications can necessitate a greater involvement by trained personnel.

The ISO/IEC 30137 series provides guidance on the use of biometric technologies (primarily automated face recognition) in video surveillance systems (VSS) for several scenarios, including real-time operation against watchlists and post-event analysis of video data. The ISO/IEC 30137 series includes guidance on the selection and placement of cameras through to system specification, testing and maintenance. The ISO/IEC 30137 series uses the term VSS to replace the older but commonly used term, closed circuit television (CCTV).

The ISO/IEC 30137 series addresses the annotation of human beings. It is not intended to provide for annotation of non-human objects such as cars, animals, or luggage.

Records conformant to this document can be produced from video in either of the following ways:

automatically, in which software analyses video and estimates quantities defined in this document, or
manually, in which human reviewers annotate video with a goal of producing ground truth video annotation, which can be used by a receiving system (i.e. any service or device that decodes, interprets and uses standardized data).

This supports several applications, including:

People counting:
- stating of the number of people present in a location,
- stating of the number of people traversing a given point or volume,
- stating of population density (e.g. in crowds),
- measurement of crowd densities,
- performance of crowd behavioural analyses.
Automated detection and tracking:
- automated enrolment (addition) of subjects to a watchlist, exhaustively or after behavioural analysis,
- detection of subjects, and parts of subjects (e.g. faces),
- tracking of subjects through time, e.g. following motion in a single video,
- tracking of subjects appearing through camera networks, including cases where a subject is viewed simultaneously by different cameras, and cases where the subject appears sequentially before several cameras,
- re-identification, the process of connecting an identity of a subject across two or more video sequences.
Automated identification:
- law enforcement, looking for subjects of interest present on watchlists (negative identification, blacklists),
- law enforcement, applications in review of post-event VSS video from one or multiple cameras against watchlists,
- private commercial settings, looking for individuals to be given preferential service,
- identification of cooperative enrolled subjects (positive access control, whitelists).

This document includes annotation of the following information:

Imaging type: single camera, sequential cameras, stereo cameras, combination, camera capture spectrum.
When the subject appears in the video (start time) and when they leave (end time).
- Brief description of the subject (what can be seen in the video?).
Where and when the face of the subject appears.
- Brief description of the face (pose, orientation, expression, occlusion).
Intermediate tracking points between the start and end times, for subject and face.
Absolute description of the subject:
- estimated age, sex,
- hair and eye colour,
- estimated height and corpulence,
- clothing and clothing colour,
- glasses/hat,
- best subject image or best subject face image.
Subject interactions with other subjects and groups.
Subject interactions with other video elements (bag, car, etc.).
Known identity of the subject.
The presence of other subjects who are not annotated.
Regions of interest, outside of which an algorithm or receiving system would not operate.
Absence: Where items of interest, including subjects, are known to be absent.

Standardized annotation supports evaluation, research and development, and operational deployment.

1 Scope

This document establishes requirements for the annotation of humans, human faces and other body parts, and arbitrary objects appearing in imagery. It specifies the following:

metadata to be inserted in a video stream;
encoding of full and partial spatial and temporal ground truth information for:
- objects present in a video, and
- objects absent in a video;
procedures for different annotation of known and unknown subjects.

This document does not specify:

encoding of video data.

2 Normative references

There are no normative references in this document.

3 Terms and definitions

For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

3.1

annotation

process of generating annotation data from imagery

3.2

annotation data

metadata associated with a subject traversing the field of view of a specific VSS camera

Note 1 to entry: An annotator preparing instances in accordance with this document should document the criteria under which a subject annotation was made. For example, it can be policy to not annotate faces for which interocular distance is below 12 pixels.

Note 2 to entry: If annotations are made by following a strict, tightly constrained or narrow set of criteria, then detection, tracking, recognition or algorithm is expected to be more accurate than if more permissive or general criteria has been used.

Note 3 to entry: An evaluation of, a tracking algorithm, for example, might exclude subjects that traverse in a non-conformant way. This could include factors such as the subject’s direction of travel, obscuration by other people or objects, operational functionalities of the camera (such as correct focus) or environmental conditions (e.g. operation during night or day).

3.3

bounding box

rectangular region enclosing annotated object

Note 1 to entry: The major and minor axes of the rectangle are parallel to the edges of the images. For rotated boxes, the polygon annotation is to be used.

3.4

bounding polygon

arbitrary region enclosing annotated object

3.5

video surveillance system

system consisting of camera equipment, monitoring and associated equipment for transmission and controlling purposes, which can be necessary for the surveillance of a protected area

3.6

random access

ability to access arbitrary parts of a media item

3.7

recognition

process of assigning a biometric identifier to a subject

3.8

identification

process of determining a subject’s identity by comparing imagery of a biometric mode against a database formed from imagery of individuals

Note 1 to entry: This generally does not include assigning an identifier when the target subject is not found in the database.

Bibliography

[1]	ISO/IEC 39794-5, Information technology — Extensible biometric data interchange formats — 5: Face image data
[2]	ISO/IEC 30137-1, Information technology — Use of biometrics in video surveillance systems — 1: System design and specification
[3]	IEC 62676-2-31, Video surveillance systems for use in security applications — 2-31: Live streaming and control based on web services
[4]	Mathias M., Benenson R., Pedersoli M., Van Gool L., Face detection without bells and whistles, Proc. ECCV, 2014
[5]	Cao Z., Simon T., Wei S.-E., Sheikh Y., Realtime Multi Person 2D Pose Estimation using Affinity Fields, Proc. CVPR, 2017 https://github.com/CMU-Perceptual-Computing-Lab/openpose
[6]	ISO 22311, Societal security — Video-surveillance — Export interoperability

ISO/IEC 30137-4:2021 情報技術—ビデオ監視システムでの生体認証の使用—パート4：グラウンドトゥルースとビデオ注釈手順 | ページ 2

序文

序章