ISO/IEC 15938-15:2019 情報技術—マルチメディアコンテンツ記述インターフェース—パート15：ビデオ分析用のコンパクトな記述子

この規格プレビューページの目次

※一部、英文及び仏文を自動翻訳した日本語訳を使用しています。

3 用語と定義

この文書の目的上、次の用語と定義が適用されます。

ISO と IEC は、標準化に使用する用語データベースを次のアドレスで維持しています。

3.1

画像記述子

入力ビデオ (3.8) からサンプリングされた単一のキーフレーム (3.6) から抽出された記述子。これには、グローバル記述子 (3.2) 、ローカル特徴記述子 (3.3) およびディープ特徴記述子 (3.4) が含まれます。

注記 1:画像記述子は、第 6 項の説明に従ってエンコードされます。

3.2

グローバル記述子

局所特徴記述子を画像のコンパクトな表現に集約 (3.5)

注記 1:集計は 6.1.2 項で説明されているとおりです。

3.3

局所特徴記述子

関心点の周囲で抽出された局所領域の記述子 (透視変換、画像スケールの変化、照明の変化など、画像領域の局所的および全体的な摂動下での検出の安定性を示す画像 (3.5) 内の点)

注記 1:抽出は 6.1.3 項で説明されているとおりです。

3.4

深い特徴記述子

トレーニングされた畳み込みニューラルネットワークの層から抽出された特徴記述子

注記 1:抽出は 6.1.4 項に記載されているとおりです。

3.5

画像

キーフレーム (3.6) を画像記述子 (3.1) エンコーダに入力します。

注記 1:画像は第 6 項に記載されているとおりです。

3.6

キーフレーム

入力ビデオセグメント(3.7) からカラーヒストグラムのフレーム差分処理により抽出されたフレーム

注記 1:抽出は 6.2 項で説明されているとおりです。

3.7

入力ビデオセグメント

ビデオの時間範囲 (時間セグメント) であり、そこから記述子が抽出されます。

3.8

入力ビデオ

CDVA 抽出プロセスまでの多数の入力ビデオセグメント (3.7) を含むシステムによって処理される画像シーケンス

注記 1:入力ビデオは第 6 項に記載されているとおりです。

3.9

セグメント記述子

入力ビデオセグメント (3.7) のサンプリングされたキーフレーム (3.6) から抽出された記述子

注記 1:セグメント記述子は、第 6 項で説明されているように符号化されます。セグメント記述子は、入力ビデオセグメントのサンプリングされたキーフレームの画像記述子 (3.1) から構築されます。

3.10

代表フレーム

非圧縮記述子が表現され、差分エンコーディングの基礎として使用される入力ビデオセグメント (3.7) のフレーム

3.11

ピクセル

元の画像または変換された画像の整数グリッド上のインデックス可能な要素。空間座標、輝度値、および (オプションの) クロミナンス値で構成されます。

参考文献

1	Langdon GG, 「マルチメディアアプリケーションのための適応バイナリ算術コーディング」、Compcon Spring, 論文ダイジェスト、サンフランシスコ、カリフォルニア州、米国、1991 年、354 ～ 357 ページ、DOI: 10.1109/CMPCON.1991.128833
2	Simonyan K.、Zisserman A.、大規模な画像認識のための非常に深い畳み込みネットワーク。 arXiv:1409.1556, 2014

3 Terms and definitions

For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

3.1

image descriptor

descriptor extracted from a single key frame (3.6) sampled from the input video (3.8) , which contains global descriptor (3.2) , local feature descriptor (3.3) and deep feature descriptor (3.4)

Note 1 to entry: Image descriptors are encoded as described in Clause 6.

3.2

global descriptor

aggregation of local feature descriptors into a compact representation of the image (3.5)

Note 1 to entry: The aggregation is as described in subclause 6.1.2.

3.3

local feature descriptor

descriptor of a local region, extracted around an interest point (a point in an image (3.5) showing detection stability under local and global perturbations in the image domain, including perspective transformations, changes in image scale, and illumination variations)

Note 1 to entry: The extraction is as described in subclause 6.1.3.

3.4

deep feature descriptor

feature descriptor extracted from a layer of a trained convolutional neural network

Note 1 to entry: The extraction is as described in subclause 6.1.4.

3.5

image

input key frame (3.6) to the image descriptor (3.1) encoder

Note 1 to entry: The image is as described in Clause 6.

3.6

key frame

frame extracted from the input video segment (3.7) by the frame difference process of colour histogram

Note 1 to entry: The extraction is as described in subclause 6.2.

3.7

input video segment

time range (temporal segment) of a video and from which a descriptor is extracted

3.8

input video

image sequence to be processed by the system containing a number of input video segment(s) (3.7) to CDVA extraction process

Note 1 to entry: Input video is as described in Clause 6.

3.9

segment descriptor

descriptor extracted from the sampled key frames (3.6) of an input video segment (3.7)

Note 1 to entry: Segment descriptors are encoded as described in Clause 6. They are contructed from the image descriptors (3.1) of the sampled key frames of the input video segment.

3.10

representative frame

frame of an input video segment (3.7) for which an uncompressed descriptor is represented and which is used as the basis for differential encoding

3.11

pixel

indexable element on an integer grid of the original image or the converted image, comprising spatial coordinates, a luminance value and (optional) chrominance values

Bibliography

1	Langdon G. G., Adaptive binary arithmetic coding for multi-media applications, Compcon Spring, Digest of Papers, San Francisco, CA, USA, 1991, pp. 354-357, DOI: 10.1109/CMPCON.1991.128833
2	Simonyan K., Zisserman A., Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014

ISO/IEC 15938-15:2019 情報技術—マルチメディアコンテンツ記述インターフェース—パート15：ビデオ分析用のコンパクトな記述子 | ページ 6

3 用語と定義

参考文献

3 Terms and definitions

Bibliography

ISO PDF プレビュー