ISO/IEC 11172-3:1993 情報技術—最大約1.5 Mbit / sでのデジタルストレージメディア用の動画および関連オーディオのコーディング—パート3：オーディオ

この規格プレビューページの目次

序文Foreword
序章Introduction

※一部、英文及び仏文を自動翻訳した日本語訳を使用しています。

序文

ISO (国際標準化機構) と IEC (国際電気標準会議) は、世界標準化のための専門システムを形成しています。 ISO または IEC のメンバーである国家機関は、技術活動の特定の分野を扱うために、それぞれの組織によって設立された技術委員会を通じて、国際規格の開発に参加しています。 ISO と IEC の技術委員会は、相互に関心のある分野で協力しています。 ISO および IEC と連携して、政府および非政府の他の国際機関もこの作業に参加しています。

情報技術の分野では、ISO と IEC は合同技術委員会 ISO/IEC JTC 1 を設立しました。合同技術委員会によって採択された国際規格の草案は、投票のために各国の機関に回覧されます。国際規格として発行するには、投票を行う国の機関の少なくとも 75% による承認が必要です。

国際標準 ISO/IEC 11172-3 は、合同技術委員会 ISO/IEC JTC 1, 情報技術、小委員会 SC 29, 音声、画像、マルチメディア、およびハイパーメディア情報のコード化表現によって作成されました。

ISO/IEC 11172 は、一般的なタイトルである情報技術 — 約 1.5 Mbit/s までのデジタルストレージメディア用の動画および関連するオーディオのコーディング:

Part 1: システム
Part 2: ビデオ
Part 3: オーディオ
Part 4: コンプライアンステスト

附属書 A および B は、ISO/IEC 11172 のこの部分の不可欠な部分を形成します。附属書 C, D, E, F, G, および H は、情報提供のみを目的としています。

序章

注: MPEG オーディオの概要に関心のある読者は、この序文を読んでから、規範条項 1 および 2 を読む前に、付録 A (図) および付録 C (符号化プロセス) に進む必要があります。

格納された圧縮ビットストリームの仕様とその復号化の理解を助けるために、一連の符号化、格納、および復号化について説明します。

0.1 エンコーディング

エンコーダーはデジタルオーディオ信号を処理し、保存用の圧縮ビットストリームを生成します。エンコーダアルゴリズムは標準化されておらず、聴覚マスキングしきい値の推定、量子化、およびスケーリングなど、エンコードにさまざまな手段を使用する場合があります。ただし、エンコーダ出力は、2.4 節の仕様に準拠するデコーダが意図したアプリケーションに適したオーディオを生成するようなものでなければなりません。

図 1 —エンコーダの基本構造のスケッチ

図 1 は、オーディオエンコーダーの基本構造を示しています。入力オーディオサンプルは、エンコーダーに供給されます。マッピングは、入力オーディオストリームのフィルター処理およびサブサンプリングされた表現を作成します。マッピングされたサンプルは、サブバンドサンプル (レイヤ I または II のように、以下を参照) または変換されたサブバンドサンプル (レイヤ III のように) と呼ばれます。心理音響モデルは、量子化器とコーディングを制御するための一連のデータを作成します。これらのデータは、実際のコーダーの実装によって異なります。 1 つの可能性は、マスキングしきい値の推定を使用して、この量子化制御を行うことです。量子化器およびコーディングブロックは、マッピングされた入力サンプルから一連のコーディングシンボルを作成します。繰り返しますが、このブロックはエンコーディングシステムに依存する可能性があります. ブロック「フレームパッキング」は、他のブロックの出力データから実際のビットストリームを組み立て、必要に応じて他の情報 (エラー訂正など) を追加します。

シングルチャネル、デュアルチャネル (1 つのビットストリーム内で符号化された 2 つの独立したオーディオ信号)、ステレオ (1 つのビットストリーム内で符号化されたステレオペアの左右の信号)、およびジョイントステレオ (ステレオの左右の信号) の 4 つの異なるモードが可能です。ステレオの無関係性と冗長性を利用して、1 つのビットストリーム内でコード化されたペア)

0.2層

アプリケーションに応じて、エンコーダーの複雑さとパフォーマンスが向上するコーディングシステムのさまざまなレイヤーを使用できます。 ISO/IEC 11172-3 オーディオレイヤー N デコーダーは、レイヤー N および N より下のすべてのレイヤーでエンコードされたビットストリームデータをデコードできます。

レイヤーⅠ

このレイヤーには、デジタルオーディオ入力の 32 のサブバンドへの基本的なマッピング、データをブロックにフォーマットする固定セグメンテーション、適応ビット割り当てを決定する心理音響モデル、およびブロックの圧伸とフォーマットを使用した量子化が含まれます。レイヤー I の理論上の最小エンコード/デコード遅延は約 19 ミリ秒です。

レイヤーⅡ

この層は、ビット割り当て、スケール係数、およびサンプルの追加コーディングを提供します。異なるフレーミングが使用されます。レイヤー II の理論上の最小エンコード/デコード遅延は、約 35 ミリ秒です。

レイヤー III

このレイヤーでは、ハイブリッドフィルターバンクに基づいて周波数分解能が向上しています。これは、異なる (不均一な) 量子化器、適応セグメンテーション、および量子化された値のエントロピーコーディングを追加します。レイヤ III の理論上の最小エンコード/デコード遅延は約 59 ミリ秒です。

ジョイントステレオコーディングは、任意のレイヤーに追加機能として追加できます。

0.3ストレージ

エンコードされたビデオ、エンコードされたオーディオ、同期データ、システムデータ、および補助データのさまざまなストリームを、記憶媒体に一緒に格納することができます。編集ポイントがアドレス可能なポイントと一致するように制限されていると、オーディオの編集が容易になります。

ストレージへのアクセスには、通信システムを介したリモートアクセスが含まれる場合があります。アクセスは、オーディオデコーダ自体以外の機能ユニットによって制御されると想定されます。このコントロールユニットは、ユーザーコマンドを受け取り、データベース構造情報を読み取って解釈し、メディアから格納された情報を読み取り、非オーディオ情報を逆多重化し、格納されたオーディオビットストリームを必要なレートでオーディオデコーダーに渡します。

0.4 デコード

デコーダは、2.4.1 で定義された構文で圧縮オーディオビットストリームを受け取り、2.4.2 に従ってデータ要素をデコードし、2.4.3 に従ってデジタルオーディオ出力を生成するために情報を使用します。

図 2 —デコーダの基本構造のスケッチ

図 2 は、オーディオデコーダの基本構造を示しています。ビットストリームデータはデコーダに供給されます。エンコーダーでエラーチェックが適用されている場合、ビットストリームのアンパックおよびデコードブロックはエラー検出を行います (2.4.2.4 を参照)ビットストリームデータは、さまざまな情報を復元するためにアンパックされます。再構成ブロックは、マップされたサンプルのセットの量子化されたバージョンを再構成します。逆マッピングは、これらのマッピングされたサンプルを一様な PCM に変換します。

Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.

In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1. Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International Standard requires approval by at least 75 % of the national bodies casting a vote.

International Standard ISO/IEC 11172-3 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Sub-Committee SC 29, Coded representation of audio, picture, multimedia and hypermedia information.

ISO/IEC 11172 consists of the following parts, under the general title Information technology — Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s:

Part 1: Systems
Part 2: Video
Part 3: Audio
Part 4: Compliance testing

Annexes A and B form an integral part of this part of ISO/IEC 11172. Annexes C, D, E, F, G and H are for information only.

Introduction

Note: Readers interested in an overview of MPEG Audio should read this Introduction and then proceed to annex A (Diagrams) and annex C (The encoding process) before reading the normative clauses 1 and 2.

To aid in the understanding of the specification of the stored compressed bitstream and its decoding, a sequence of encoding, storage and decoding is described.

0.1 Encoding

The encoder processes the digital audio signal and produces the compressed bitstream for storage. The encoder algorithm is not standardized, and may use various means for encoding such as estimation of the auditory masking threshold, quantization, and scaling. However, the encoder output must be such that a decoder conforming to the specifications of clause 2.4 will produce audio suitable for the intended application.

Figure 1—Sketch of the basic structure of an encoder

Figure 1 illustrates the basic structure of a audio encoder. Input audio samples are fed into the encoder. The mapping creates a filtered and subsampled representation of the input audio stream. The mapped samples may be called either subband samples (as in Layer I or II, see below) or transformed subband samples (as in Layer III). A psychoacoustic model creates a set of data to control the quantizer and coding. These data are different depending on the actual coder implementation. One possibility is to use an estimation of the masking threshold to do this quantizer control. The quantizer and coding block creates a set of coding symbols from the mapped input samples. Again, this block can depend on the encoding system. The block 'frame packing' assembles the actual bitstream from the output data of the other blocks, and adds other information (e.g. error correction) if necessary.

There are four different modes possible, single channel, dual channel (two independent audio signals coded within one bitstream), stereo (left and right signals of a stereo pair coded within one bitstream), and Joint Stereo (left and right signals of a stereo pair coded within one bitstream with the stereo irrelevancy and redundancy exploited).

0.2 Layers

Depending on the application, different layers of the coding system with increasing encoder complexity and performance can be used. An ISO/IEC 11172-3 Audio Layer N decoder is able to decode bitstream data which has been encoded in Layer N and all layers below N.

Layer I

This layer contains the basic mapping of the digital audio input into 32 subbands, fixed segmentation to format the data into blocks, a psychoacoustic model to determine the adaptive bit allocation, and quantization using block companding and formatting. The theoretical minimum encoding/decoding delay for Layer I is about 19 ms.

Layer II

This layer provides additional coding of bit allocation, scalefactors and samples. Different framing is used. The theoretical minimum encoding/decoding delay for Layer II is about 35 ms.

Layer III

This layer introduces increased frequency resolution based on a hybrid filterbank. It adds a different (nonuniform) quantizer, adaptive segmentation and entropy coding of the quantized values. The theoretical minimum encoding/decoding delay for Layer III is about 59 ms.

Joint Stereo coding can be added as an additional feature to any of the layers.

0.3 Storage

Various streams of encoded video, encoded audio, synchronization data, systems data and auxiliary data may be stored together on a storage medium. Editing of the audio will be easier if the edit point is constrained to coincide with an addressable point.

Access to storage may involve remote access over a communication system. Access is assumed to be controlled by a functional unit other than the audio decoder itself. This control unit accepts user commands, reads and interprets data base structure information, reads the stored information from the media, demultiplexes non-audio information and passes the stored audio bitstream to the audio decoder at the required rate.

0.4 Decoding

The decoder accepts the compressed audio bitstream in the syntax defined in 2.4.1, decodes the data elements according to 2.4.2, and uses the information to produce digital audio output according to 2.4.3.

Figure 2—Sketch of the basic structure of a decoder

Figure 2 illustrates the basic structure of a audio decoder. Bitstream data is fed into the decoder. The bitstream unpacking and decoding block does error detection if error-check is applied in the encoder (see 2.4.2.4). The bitstream data are unpacked to recover the various pieces of information. The reconstruction block reconstructs the quantized version of the set of mapped samples. The inverse mapping transforms these mapped samples back into uniform PCM.

ISO/IEC 11172-3:1993 情報技術—最大約1.5 Mbit / sでのデジタルストレージメディア用の動画および関連オーディオのコーディング—パート3：オーディオ | ページ 2

序文

序章