ISO/IEC 13818-3:1998 情報技術—動画および関連するオーディオ情報の一般的なコーディング—パート3：オーディオ

この規格プレビューページの目次

※一部、英文及び仏文を自動翻訳した日本語訳を使用しています。

序章

ISO/IEC 13818 は、MPEG (Moving Pictures Expert Group) としても知られる SC29/WG11 によって作成されました。 MPEG は、1988 年に、デジタルストレージメディアに保存された動画および関連するオーディオのコード化表現の標準を確立するために設立されました。

ISO/IEC 13818 は 3 部構成で発行されています。 1 — システム — 標準のシステムコーディング層を指定します。これは、オーディオとビデオのデータを結合するための多重化構造と、同期シーケンスをリアルタイムで再生するために必要なタイミング情報を表す手段を定義します。 2 — ビデオ — ビデオデータのコード化された表現と、画像を再構築するために必要なデコードプロセスを指定します。 3 — オーディオ — オーディオデータのコード化表現と、オーディオ信号のデコードに必要なデコードプロセスを指定します。

ISO/IEC 13818-3 (1995) の初版と比較したこの第 2 版の技術的変更点は次のとおりです。

1.最初の出版物では、動的クロストークと予測の特定の組み合わせは禁止されていませんでしたが、実際には実装できませんでした。この 2 番目の改訂では、これらの組み合わせは明示的に禁止されています。
2.最初の出版物では、マトリックスモード 2 (アナログサラウンドモード) でモノフォニックサラウンド信号にローパスフィルターが適用されることになっていました。このフィルターは、このエディションでは省略されており、デコーダーが大幅に簡素化されています。
3. LFE チャネルの構文の記述があいまいでした。この説明は明確になりました。

これらの技術的な変更に加えて、多くの編集上の変更が加えられ、読みやすさと明確さが向上しました。

0.1 低いサンプリング周波数への ISO/IEC 11172-3 オーディオコーディングの拡張

非常に低いビットレート (オーディオチャネルあたり 64 kbit/s 未満) でより優れたオーディオ品質を実現するために、特に ITU-T (以前の CCITT) 勧告 G.722 パフォーマンスと比較した場合、ISO/ IEC 11172-3 層 I, II, および II追加のサンプリング周波数 (Fs) は、16 kHz, 22.05 kHz, および 24 kHz です。これにより、対応するオーディオ帯域幅が約 7.5 kHz, 10.3 kHz, および 11.25 kHz になります。 ISO/IEC 11172-3 の構文、セマンティクス、およびコーディング手法は、サンプリング周波数フィールド、ビットレートインデックスフィールド、およびビット割り当てテーブルの新しい定義を除いて維持されます。これらの新しい定義は、ISO/IEC 11172-3 ヘッダーの ID ビットがゼロの場合に有効です。最高のオーディオパフォーマンスを得るには、エンコーダで使用される心理音響モデルのパラメータを適宜変更する必要があります。

これらのサンプリング周波数では、オーディオフレームの長さは次のようになります。

レイヤー	サンプリング周波数 (kHz)
レイヤー	16	22.05	24

I	24ms	17.41..ms	16ms
ii	72ms	52.24..ms	48ミリ秒
Ⅲ	36ms	26.12..ms	24ms

0.2 マルチチャンネルオーディオの低ビットレート符号化

0.2.1 ユニバーサルマルチチャンネルオーディオシステム

モノラルまたはステレオオーディオ信号の低ビットレートコーディングに関する標準は、ISO/IEC 11172-3 の MPEG-1 Audio によって確立されました。この規格は、限られた容量のストレージメディアまたは伝送チャネルで、画像情報を含む、または含まない高品質のデジタルオーディオ信号を伝送する場合に適用されます。

ISO/IEC 11172-3 オーディオコーディング標準は、2 チャネルステレオのみが必要な場合に限り、MPEG-1 ビデオと MPEG-2 ビデオの両方で使用できます。 MPEG-2 オーディオ (ISO/IEC 13818-3) は、最大 3/2 マルチチャネルオーディオとオプションの低周波拡張チャネル (LFE) までの拡張を提供します。

ISO/IEC 13818 のこの部分では、ISO/MPEG-Audio Multichannel と呼ばれるオーディオサブバンドコーディングシステムについて説明します。このシステムは、限られた容量のストレージメディアまたは伝送チャネルで高品質のデジタルマルチチャネルおよび/または多言語オーディオ情報を転送するために使用できます。基本的な機能の 1 つは、ISO/IEC 11172-3 コード化されたモノラル、ステレオ、またはデュアルチャネルオーディオプログラムとの後方互換性です。これは、ISO/MPEG オーディオグループと、ITU-R (以前の CCIR) の専門家グループ TG 10/1, 10/2, および 10/3 によって検討されているさまざまなアプリケーションで使用するために設計されています。

マルチチャンネルオーディオシステムは、従来の 2 チャンネルオーディオシステムに比べて強化されたステレオパフォーマンスを提供します。改善されたプレゼンテーションパフォーマンスが望ましいと認識されているのは、

添付の画像だけでなく、音声のみのアプリケーションにも使用できます。衛星または地上波テレビ放送、デジタルオーディオ放送 (地上波および衛星放送)、およびその他の非放送メディアに適用可能な、ユニバーサルで互換性のあるマルチチャンネルオーディオシステム。

ケーブルテレビ	ケーブルテレビ配信
CDAD	ケーブルデジタルオーディオ配信
たたく	デジタル音声放送
DVD	デジタル多用途ディスク
密接に	電子取材（衛星取材を含む）
ハイビジョンテレビ	ハイビジョンテレビ
IPC	対人コミュニケーション（テレビ会議、テレビ電話など）
主義	対話型記憶媒体（光ディスクなど）
FIS	ネットワークデータベースサービス（ATM等経由）
DSM	デジタル記録メディア（デジタルVTRなど）
EC	エレクトロニックシネマ
HTT	ホームテレビシアター
ISDN	統合サービスデジタルネットワーク

メーカー、生産者、消費者にとって非常に魅力的なようです。

0.2.2 マルチチャンネルオーディオの表現

0.2.2.1 3/2 ステレオ + LFE フォーマット

ステレオプレゼンテーションに関して、ITU-R, SMPTE, および EBU の専門家グループは、追加のセンターラウドスピーカーチャネル C と 2 つのサラウンドラウドスピーカーチャネル LS および RS の使用を推奨し、フロント左右のラウドスピーカーチャネル L および R を増強します。これは「3/2 ステレオ」（フロント 3 つ/ サラウンドラウドスピーカー 2 つ）と呼ばれ、適切にフォーマットされた 5 つのオーディオ信号の送信が必要です。

音声付き画像アプリケーション (HDTV など) の場合、3 つのフロントラウドスピーカーチャンネルは、映画の一般的な慣行に従って、画像に関連する正面画像の十分な方向安定性と明瞭さを保証します。主な利点は、リスナーのどの位置でも保証され、ほとんどの対話にとって重要な「安定した中心」です。

さらに、オーディオのみのアプリケーションでは、3/2 ステレオ形式が 2 チャネルステレオフォニーよりも優れていることがわかっています。 1 組のサラウンドスピーカーチャンネルを追加することで、聴覚環境のリアリズムを向上させることができます。

低周波強化チャネル (ISO/IEC 13818 のこの部分では LFE チャネルと呼ばれる) は、必要に応じて、これらの構成のいずれかに追加できます。このチャンネルの目的は、リスナーが再生されたプログラムの低周波成分を周波数とレベルの両方の観点から拡張できるようにすることです。このように、映画業界がデジタルサウンドシステム用に提案している LFE チャンネルと同じです。

LFE チャネルは、マルチチャネルサウンドプレゼンテーションの低周波コンテンツ全体に使用しないでください。 LFE チャネルは受信機ではオプションであるため、高レベルの可能性がある低周波サウンドエフェクトのみを伝送する必要があります。 LFE チャネルは、デコーダのデマトリックス操作には含まれません。 LFE チャネルのサンプリング周波数は、メインチャネルのサンプリング周波数を 96 で割った値に対応します。これにより、1 つのオーディオフレーム内に 12 個の LFE サンプルが提供されます。 LFE チャンネルは、15 Hz ～ 120 Hz の範囲の信号を処理できます。

0.2.2.2 互換性

2/0ステレオからマルチチャンネルサウンドへの拡張。

従来の 2 チャンネルステレオ (2/0 ステレオ) 再生が普及した結果、既存の 2/0 ステレオサウンド再生システムまたは既存のマトリックス化されたサラウンドサウンドレシーバーとの互換性を維持する必要があります。これは、多くのアプリケーションで、マルチチャンネル番組のオーディオ情報の適切なダウンミックスを含む基本的なステレオ信号を、マルチチャンネルオーディオ情報と一緒に送信する必要があることを意味します。適切なダウンミックス方程式は、方程式のペア (1,2)、(3,4)、(5,6)、および (7,8) によって与えられます。

Lo = L + ½√2 ∗ C + ½√2 ∗ LS

(1)

Ro = R + ½√2 ∗ C + ½√2 ∗ RS

(2)

Lo = L + ½√2 ∗ C + ½∗ LS

(3)

Ro = R + ½√2 ∗ C + ½∗ RS

(4)

Lo = L

(5)

Ro = R

(6)

Lo = L + ½√2 ∗ C − ½√2 ∗ jS

（7）

Ro = R + ½√2 ∗ C + ½√2 ∗ jS

(8位)

ここで、jS はモノ成分の計算によって LS と RS から導き出されます。次に、ダイナミックレンジ圧縮と 90 度の位相シフトがこのコンポーネントに適用されます。ダウンミックス (7,8) は、既存のマトリックス化されたサラウンドデコーダーに適しています。

ISO/IEC 13818-3 ビットストリームのフォーマットは、ISO/IEC 11172-3 オーディオデコーダーが上記のダウンミックス方程式のセットの 1 つに従って基本的なステレオ情報を適切にデコードするようなものです (0.2.3.1 を参照)式 (7) および (8) を使用した既存のサラウンドサウンドデコーダとの互換性は、ISO/IEC 13818 のこの部分の印刷時には検証されていません。

ISO/IEC 13818 のこの部分の場合、基本的なステレオダウンミックスとマルチチャネルオーディオ情報をユーザーに提供する 3 つの異なる可能性を特定できます。

1. ISO/IEC 11172-3 との下位互換性のある方法で、2/0 ステレオサウンドを本質的に 1 つのビットストリーム内のマルチチャネル情報と共に送信し、サイマルキャストを回避します。これにより、2/0 ステレオとマルチチャンネルオーディオ信号の両方に必要なビットレートを最も効率的に使用できます。追加の利点は、両方のプログラムが PCM オーディオサンプルベースで厳密に同期されること、および MPEG オーディオビットストリームの補助データフィールドで伝送されるオーディオプログラム関連データを 1 回だけ送信する必要があることです。マルチチャンネルオーディオ信号からのステレオダウンミックスは、ISO/IEC 13818-3 エンコーダーによって処理されます。このダウンミックスでは、式 (1) と (2) および式 (3) と (4) による多数のマトリックスオプションが、ISO/IEC 13818 のこの部分によって提供されます (2.5.2.13 を参照)
2. ISO/IEC 13818 のこの部分に従って符号化されたマルチチャネルオーディオ信号と、ISO/IEC 11172-3 に従って符号化された 2/0 ステレオ信号の同時放送。このソリューションには、ISO/IEC 13818-1 によって多重化および送信できる 2 つの独立したビットストリームが必要です。両方のビットストリームの同期が必要な場合は、プログラムプロバイダーが規定を作成する必要があります。さらに、サイマルキャストオプションでは、3/2 マルチチャンネルサウンドの場合の 5 チャンネルではなく、合計 7 オーディオチャンネルを送信する必要があるため、かなり高いビットレートが必要です。ただし、サイマルキャストオプションを使用すると、サウンドエンジニアが制御できる 2/0 ステレオサウンドへのダイナミックダウンミックスなど、個別に対応できます。
3.非マトリックスモードを使用して、マルチチャネル信号のみを送信します (ダウンミックスの式 (5,6) )

次に、各ステレオデコーダーは、5 つのチャネルすべてをデコードし、ステレオダウンミックスを作成できる必要があります。ダウンミックスはデコーダでのフィルタリング操作の前に適用でき、フィルタは 2 つのチャネルでのみ実行する必要がありますが、これはデコーダを大幅に複雑にします。

既存のマトリックス化されたサラウンドサウンドデコーダーとの互換性が必要な場合、ISO/IEC 13818 のこの部分でも 3 つのソリューションが提供されます。

1. 3/2 マルチチャネルとマトリックス化されたサラウンド信号の両方に必要なビットレートに関して高い効率を確保するために、このサラウンド信号は下位互換性のあるステレオチャネルで送信できます。式（７）および（８）によるマトリクスオプション「１０」は、基本ステレオチャネルで送信される適切な互換信号を提供する。 ISO/IEC 11172-3 2 チャンネルデコーダーを使用することにより、既存のマトリックスサラウンドデコーダーに適したマトリックスサラウンド信号を受信機で取得できます。対応する 3/2 チャンネル出力は、ISO/IEC 13818-3 デコーダーを使用して取得できます。
2. ISO/IEC 11172-3 を使用したマトリックス化されたサラウンド信号と、ISO/IEC 13818 のこの部分を使用した 3/2 マルチチャネルオーディオ信号のサイマルキャストには、より高いビットレートが必要です。サウンドエンジニアが制御できるマトリックス化されたサラウンド信号。このソリューションの欠点は、マトリックスオプション '10' (2.5.2.13 を参照) が使用されている場合、5 つのチャネルだけではなく、7 つのオーディオチャネルを送信するために必要な追加のビットレートです。
3.非マトリックスモードを使用して、マルチチャネル信号のみを送信します。次に、各ステレオデコーダは、5 つのチャネルすべてをデコードし、式 (7,8) に従ってダウンミックスを作成できなければなりません。ダウンミックスはデコーダでのフィルタリング操作の前に適用でき、フィルタは 2 つのチャネルでのみ実行する必要がありますが、これはデコーダを大幅に複雑にします。

下位互換性。

ITU-R 勧告 775 では、ラウドスピーカーチャネルの数を減らし、プレゼンテーションパフォーマンスを低下させる (2/0 ステレオまたはモノラルにまで下げる) オーディオフォーマットの階層と、対応する一連の下位ミキシング方程式が推奨されています。経済的またはチャネル容量の制約が適用される状況で使用できる代替の下位レベルのオーディオ形式は、3/1, 3/0, 2/2, 2/1, 2/0 です。そして1/対応するスピーカー配置は、3/2, 3/1, 3/0, 2/2, 2/1, 2/0, 1/0 です。

下位互換性。

いくつかのアプリケーションでは、サイマルキャスト操作を使用せずに追加のオーディオチャネル (センター、サラウンド) を送信することにより、既存の 2/0 ステレオサウンドシステムを拡張することを目的としています。この既存の受信機との下位互換性の提供は、互換性マトリックスの使用を意味します。前世代のデコーダーは、2 つの従来の基本ステレオ信号 L'o/R'o を再生する必要があり、マルチチャンネルデコーダーは、完全な 3/2 ステレオプレゼンテーションを生成します。基本ステレオ信号と拡張信号から L'/C'/R'/LS'/RS'。

MPEG-2 Audio のすべてのアプリケーションで後方互換性が必要とされるわけではないことが認識されています。

したがって、下位互換性の制約がない非下位互換性 (NBC) オーディオコーディングシステムは、ISO/IEC 13818 のこの部分でのオプションの使用について評価されています。

0.2.2.3 多言語機能

特に HDTV アプリケーションでは、マルチチャンネルステレオパフォーマンスとバイリンガルプログラムまたは多言語解説が必要です。 ISO/IEC 13818 のこの部分は、5 チャネルサウンドシステムで代替のオーディオチャネル構成を提供します。たとえば、バイリンガル 2/0 ステレオプログラムまたは 1 つの 2/0, 3/0 ステレオサウンドと付随するサービス (「クリーンダイアログ」など) です。難聴者向け解説、視覚障がい者向け解説、多言語解説など）。重要な構成は、一般的な音楽/エフェクトステレオダウンミックス (例: ドキュメント映画、スポーツレポート) と共に解説ダイアログ (センターラウドスピーカー経由など) を再生することです。

0.2.3 マルチチャネルオーディオ符号化システムの基本パラメータ

3/2 サウンドシステムの 5 つのオーディオ信号の伝送には、5 つの伝送チャネルが必要です (ただし、ビットレートが削減された信号のコンテキストでは、これらは必ずしも独立しているわけではありません)送信された信号のうちの 2 つがそれ自体でステレオサービスを提供できるようにするために、ソースサウンド信号は通常、エンコードの前にリニアマトリックスで結合されます。これらの結合された信号 (およびそれらの伝送チャネル) は、表記 T0, T1, T2, T3, および T4 によって識別されます。

0.2.3.1 ISO/IEC 11172-3 との互換性

ISO/MPEG-Audio マルチチャンネルシステムは、ISO/IEC 11172-3 との完全な互換性を提供します。マルチチャネルオーディオビットストリームの場合、下位互換性とは、ISO/IEC 11172-3 オーディオデコーダが基本的なステレオ情報を適切にデコードすることを意味します (0.2.2.2 を参照)前方互換性とは、MPEG-2 マルチチャネルオーディオデコーダが ISO/IEC 11172-3 オーディオビットストリームを適切にデコードできることを意味します。

下位互換性は、ISO/IEC 11172-3 に準拠して基本ステレオ情報をコーディングし、ISO/IEC 11172-3 オーディオフレーム (ISO/IEC のこの部分のコンテキストではベースフレーム) の補助データフィールドを活用することによって実現されます。 13818) に加えて、マルチチャンネル拡張用のオプションの拡張フレーム。

完全な ISO/IEC 11172-3 オーディオフレームには、次の 4 種類の情報が組み込まれています。

ISO/IEC 11172-3 オーディオフレームの最初の 32 ビット内のヘッダー情報。
ヘッダー情報の直後の 16 ビットからなる巡回冗長検査 (CRC) (オプション)
ビット割り当て (BAL)、スケールファクタ選択情報 (SCFSI)、スケールファクタ (SCF)、およびサブバンドサンプルで構成されるレイヤ II のオーディオデータ。
補助データ。 ISO/IEC 13818 のこの部分を使用するさまざまなアプリケーションが多数あるため、このフィールドの長さと用途は指定されていません。

補助データフィールドの可変長により、チャンネル T2/T3/T4 の完全な拡張情報を補助データフィールドの最初の部分に詰め込むことができます。 MC エンコーダーがマルチチャネル拡張情報に補助データフィールドのすべてを使用しない場合、フィールドの残りの部分を他の補助データに使用できます。

マルチチャネル拡張情報に必要なビットレートは、音声信号に応じて、フレームごとに異なる場合があります。全体のビットレートは、オプションの拡張ビットストリームを使用することにより、ISO/IEC 11172-3 で規定されている値よりも高くすることができます。拡張ビットストリームを含む最大ビットレートは、次の表に示されています。

サンプリング周波数	レイヤー	最大合計ビットレート
32kHz	I	903kbps
32kHz	ii	839kbps
32kHz	Ⅲ	775kbps
44.1kHz	I	1075kbps
44.1kHz	ii	1011kbps
44.1kHz	Ⅲ	947kbps
48kHz	I	1130kbps
48kHz	ii	1066kbps
48kHz	Ⅲ	1002kbps

ISO/IEC 13818 のこの部分では、レイヤー I, II, および III の基本的な Lo, Ro ステレオと、レイヤー II mc およびレイヤー III mc のマルチチャネル拡張の組み合わせについて説明します。次の組み合わせが可能です。

ベーシックロー、ローステレオ	マルチチャネル拡張
レイヤーⅡ	レイヤー II MC
レイヤー III	レイヤ III MC
レイヤーⅠ	レイヤー II MC

0.2.3.2 オーディオ入出力フォーマット

サンプリング周波数: 48, 44.1 または 32 kHz

量子化: 最大 24 ビット/サンプル PCM 解像度

次のオーディオチャネルの組み合わせは、オーディオエンコーダーへの入力として適用できます。

a) 3/2 構成を使用した 5 つのチャネルL, C, R と 2 つのサラウンドチャンネル LS, RS
b) 3/1 構成を使用した 4 つのチャネルL, C, R とシングルサラウンドチャンネル S
c) 3/0 構成を使用した 3 つのチャネルL, C, R サラウンドなし
d) 3/0 + 2/0 構成を使用した 5 つのチャネル第 1 プログラムの L, C, R と第 2 プログラムの L2, R2
e) 2/2 構成を使用した 4 つのチャネルL, R と 2 つのサラウンドチャンネル LS, RS
f) 2/1 構成を使用した 3 チャネルL, R シングルサラウンドチャンネル S
g) 2/0 (または 1/0+1/0) 構成を使用した 2 つのチャネルISO/IEC 11172-3 に準拠したステレオ (またはデュアルチャネルモード)
h) 4 チャネル、2/0 + 2/0 (または 1/0+1/0+ 2/0) 構成を使用第 1 プログラムの L, R (またはチャンネル I とチャンネル II) と第 2 プログラムの L2, R2
i) 1/0 構成を使用した 1 つのチャネルシングルチャンネルモード (ISO/IEC 11172-3 と同様)
j) 1/0 + 2/0 構成を使用した 3 つのチャネル単一チャネルモード (ISO/IEC 11172-3 に準拠) と 2 番目のプログラムの L2, R2

オーディオ入力信号のさまざまな組み合わせがエンコードされ、最大 5 つの使用可能な伝送チャネル T0, T1, T2, T3, および T4 内で伝送されます。チャネル T0 と T1 は、ISO/IEC 11172-3 の 2 つの基本チャネルであり、後方互換性信号 Lo および Ro. 伝送チャネル T2, T3, および T4 は一緒にマルチチャネル拡張情報を形成し、ISO/IEC 11172-3 補助データフィールドおよびオプションの拡張ビットストリーム内で互換性を持って送信されます。

マルチチャネルのデコード後、最大 5 つのオーディオチャネルが復元され、リスナーが選択した任意の便利な形式で表示できます。

a) 3/2 構成を使用した 5 つのチャネルフロント: 左 (L) および右 (R) チャンネルと中央チャンネル (C)サラウンド: 左サラウンド (LS) と右サラウンド (RS)
b) 3/1 構成を使用した 4 つのチャネルフロント: 左 (L) および右 (R) チャンネルと中央チャンネル (C)サラウンド：モノサラウンド（S）
c) 3/0 構成を使用した 3 つのチャネルフロント: 左 (L) および右 (R) チャンネルと中央チャンネル (C)サラウンド: サラウンドなし
d) 2/2 構成を使用した 4 つのチャネルフロント: 左 (L) および右 (R) チャンネルサラウンド: 左サラウンド (LS) と右サラウンド (RS)
e) 2/1 構成を使用した 3 つのチャネルフロント: 左 (L) および右 (R) チャンネルサラウンド：モノサラウンド（S）
f) 2/0 構成を使用した 2 つのチャネルフロント: 左 (L) と右チャンネル (R)サラウンド: サラウンドなし
g) 1/0 構成を使用した 1 つのチャネル出力フロント：モノチャンネル（月）サラウンド: サラウンドなし

オプションで、1/0 構成を除くこれらの構成のいずれかに低周波強化チャネルを追加できます。

ITU-R 勧告 775 で定義されているように、個別の信号を提供するために出力が必要になる場合や、下方ミキシングまたは上方変換式に従って結合される場合があります。

0.2.3.3 複合符号化モード

動的伝送チャネル切り替え

２つの互換信号Ｔ０およびＴ１と、さらに送信される３つの信号Ｔ２、Ｔ３およびＴ４との間により良い直交性を提供するために、チャネルＴ２、Ｔ３およびＴ４の選択に柔軟性が必要である。 ISO/IEC 13818 のこの部分では、T2, T3, および T4 で送信される 5 つの信号 L, C, R, LS, RS から 3 つの多くの組み合わせを、多くの周波数領域に対して独立して選択できます。

動的クロストーク

バイノーラル聴覚モデルによれば、立体音響提示の空間知覚に関して無関係な立体音響信号の部分を決定することが可能である。ステレオに関係のない信号成分はマスクされませんが、音源の定位には寄与しません。それらは無視されます

人間の聴覚系のバイノーラルプロセッサ. したがって、ステレオ信号 (L, C, R, LS, または RS) のステレオに関係のないコンポーネントは、ステレオ感に影響を与えることなく、任意のラウドスピーカーまたは配置の複数のラウドスピーカーを介して再生できます。 .これは、多くの周波数領域に対して個別に実行できます。

適応型マルチチャネル予測

統計的なチャネル間の依存関係を利用するために、適応型マルチチャネル予測が冗長性の削減に使用されます。伝送チャネルＴ２、Ｔ３、Ｔ４で実際の信号を伝送する代わりに、対応する予測誤差信号が伝送される。遅延補償付きの最大 2 次の予測子が使用されます。

センターのファントムコーディング

人間の聴覚系は、より高い周波数での定位にオーディオ信号の強さの手がかりのみを使用するという事実により、フロントの左右のチャンネルでセンターチャンネルの高周波部分を送信し、ファントムソースを構成することが可能です。センタースピーカーの位置。

0.2.3.4 エンコーダーおよびデコーダーのパラメーター

エンコードとデコード:	ISO/IEC 11172-3 に類似。
コーディングモード:	3/2, 3/1, 3/, 2/2, 2/1, 2/, 1/0+1/, 1/0 (+2/0)
	2番目のステレオプログラム、
	最大 7 つの追加の多言語または解説チャンネル、
	関連サービス。
サブバンドフィルター変換:	サブバンド数:	32
	サンプリング周波数：	Fs/32
	サブバンドの帯域幅:	Fs/64
MDCT による追加の分解 (レイヤー III のみ):
	周波数分解能:	サブバンドあたり 6 または 18 のコンポーネント
LFE チャンネルフィルター変換:	LFE チャンネル数:	1
	サンプリング周波数：	Fs/96
	LFE チャネルの帯域幅:	125Hz
ダイナミックレンジ：	20ビット以上。

附属書 E

（参考）

補助データの使用

序章

国際標準 (DAB [1], ITU-T J.52 [2] など) を含む MPEG Audio の既存のアプリケーションの多くは、特定の要件に従って、補助データフィールドのフォーマットを定義しています。この附属書では、将来のアプリケーションに役立つ可能性があるいくつかの例を示しています。

各 ISO/IEC 13818-3 フレームには、多数の補助データバイトが含まれる場合があります。このデータは、ISO/IEC 13818-3 エンコードフレームの 2 つの別個のフィールドで伝送できます。 ISO/IEC 11172-3 の補助データ定義と互換性を保つために、1 つのフィールドは基本フレームの最後に配置され、もう 1 つのフィールドは拡張フレームの最後に配置されます。

補助データの最も一般的な用途は、オーディオ信号に密接に関連するデータであるプログラム関連データ (PAD) です。

代表的な番組関連データ

プログラム関連データの典型的な例は、音楽または音声の表示 (音楽/音声フラグ)、プログラム関連のテキスト (ITTS [1])、ユニバーサル製品コード/欧州商品番号 (UPC/EAN [1]])、特別なコマンドです。オーディオプログラムに同期して提供されるレシーバー/デコーダー、およびダイナミックレンジ制御情報 (DRC) DRC 信号は、オーディオ信号のダイナミックレンジを圧縮するために受信機でオプションで使用できます。これは、共有データサービスでキューに入れるときに遅延すると役に立たなくなるデータの例です。

PAD が提供するすべての機能と PAD フィールドの長さは、ユーザーが定義できます。したがって、PAD フィールドで情報を送信することは必須ではありません。

ダイナミックレンジ制御

完全ではない環境にいる多くのリスナーにとって、デジタルオーディオ信号によって伝送されるダイナミックレンジをフルに活用することは現実的ではないことが、長い間認識されてきました。オーディオ信号の再生ダイナミックレンジの制限を可能にするために、コード化されたビットストリームでデータを運ぶ方法は、デジタルオーディオ放送用の ISO/IEC 11172-3 レイヤー II (DAB [1]) で既に定義されています。

ダイナミックレンジコントロール (DRC) の助けを借りて、受信機はオーディオ信号のダイナミックレンジを減らすことができます。この目的は、ノイズの多い環境でのリスニング、または家庭でのリスニングにはダイナミックレンジが高すぎるオーディオソース (典型的な映画のサウンドトラック) のリスニングに、オーディオ信号のダイナミックレンジを適応させることです。 ISO/IEC 13818-3 デコーダは、オーディオ信号自体から、または補助データフィールドで送信された適切な DRC 信号から制御情報を取得するプロセスを使用して、オーディオダイナミックレンジのそのような圧縮をオプションで提供することができます。 DRC 信号を送信することは、プログラムプロバイダーのオプションです。これはシステムの要件ではありません。

DAB 仕様では、オーディオと共に運ばれる追加データの一部 (「F-PAD」) は、とりわけ、再生されたオーディオ信号に適用されるゲインを変更するために使用される 6 ビットの DRC データフィールドを運ぶことができます。 .現在の提案 [1] では、ダイナミックレンジコントロールが通知されると、6 ビットは、0 ～ 15.75 dB の範囲で 0.25 dB のステップで復元されたオーディオに適用されるゲインを表します。 0.25 dB のステップサイズは、実験により、クラシック音楽のゆっくりとしたゲイン変更時にスムーズなゲイン制御を提供するために許容できる最大値であることがわかっています。最大信号ゲインとしての 15.75 dB の上限は、それほど困難ではないリスニング条件下で適切なダイナミックレンジの縮小を可能にするのに十分であると考えられます。非常に悪条件のためにダイナミックレンジをさらに縮小する必要がある場合は、シグナル値をスケーリングすることができ、ステップサイズの増加はほとんど聞こえません。ダイナミックレンジ制御データが送信されている場合、6 ビット値は 24 ミリ秒ごとに 1 回送信する必要があります。これは、250 ビット/秒のビット使用量を表します (DRC データの使用を通知するために必要なオーバーヘッドはカウントされません)

音楽/音声表示

これらの 2 つのフラグは、送信されたサウンドが音楽または音声で構成されているかどうかを示します。受信機は、この情報を使用して、音声処理回路を制御できます。フラグの特別な組み合わせの 1 つは、指示がないことを示します。通常、音楽/音声表示には 2 ビットが必要で、1 秒あたり約 10 回繰り返されます。

受信機/デコーダーへのコマンド

オーディオ信号に同期して受信機/デコーダに特別なコマンドを伝えるために、チャネルを提供することができます。このようなコマンドは、たとえば、事前に非同期でいっぱいになったバッファメモリからの画像の読み出しをトリガーするために使用できます。このチャネルは、不規則な間隔で 0.2 ～ 0.5 秒以内に数バイトを伝送できます。

番組関連テキスト

送信されたオーディオ信号 (曲、プログラム項目) を解明するために、コード化されたテキストがオーディオと一緒に運ばれることがあります。このテキストは、プログラムプロバイダーによってオンサイトで作成されたり、デジタルで事前に記録されたソフトウェアから読み取られたり、多かれ少なかれ透過的に中継されたり、さまざまなソースを組み合わせたりすることができます。テキストに必要なチャネル容量は、サービスがどれだけ包括的で魅力的なものになるかに依存します。

館内情報

チャネルは、短い同期コマンドと非同期データの長い文字列の両方に提供できます。これらのコマンドの意味は、特定のアプリケーション内での内部使用のみを目的としています。

参考文献:

[1]	European Telecommunication Standard pr ETS 300 401: 1995, Radio Broadcasting system;モバイル、ポータブル、および固定受信機へのデジタルオーディオ放送 (DAB)
[2]	ITU-T 勧告 J.52: 1995, モノラル信号ごとに 1 つ、2 つ、または 3 つの 64 kbit/s チャネル (およびステレオ信号ごとに最大 6 つ) を使用した高品質サウンドプログラム信号のデジタル伝送。

Introduction

ISO/IEC 13818 was prepared by SC29/WG11, also known as MPEG (Moving Pictures Expert Group). MPEG was formed in 1988 to establish a standard for the coded representation of moving pictures and associated audio stored on digital storage media.

ISO/IEC 13818 is published in three parts. 1 — systems — specifies the system coding layer of the standard. It defines a multiplexed structure for combining audio and video data and means of representing the timing information needed to replay synchronised sequences in real-time. 2 — video — specifies the coded representation of video data and the decoding process required to reconstruct pictures. 3 — audio — specifies the coded representation of audio data and the decoding process required to decode audio signals.

The technical changes in this 2nd edition compared to the first publication of ISO/IEC 13818-3 (1995) are:

1. In the first publication, certain combinations of dynamic crosstalk and prediction were not prohibited but not practically implementable. In this 2nd revision, these combinations are explicitly prohibited.
2. In the first publication, a low-pass filter was to be applied to the monophonic surround signal in matrix mode 2 (analogue surround mode). This filter is omitted in this edition, greatly simplifying the decoder.
3. The description of the syntax of the LFE channel was ambiguous. This description has been clarified.

Next to these technical changes, many editorial changes have been made, improving readability and clarity.

0.1 Extension of ISO/IEC 11172-3 Audio Coding to Lower Sampling Frequencies

In order to achieve better audio quality at very low bit rates (<64 kbit/s per audio channel), in particular if compared with ITU-T (formerly CCITT) Recommendation G.722 performance, three additional sampling frequencies are provided for ISO/IEC 11172-3 layers I, II and III. The additional sampling frequencies (Fs) are 16 kHz, 22,05 kHz and 24 kHz. This allows corresponding audio bandwidths of approximately 7,5 kHz, 10,3 kHz and 11,25 kHz. The syntax, semantics, and coding techniques of ISO/IEC 11172-3 are maintained except for a new definition of the sampling frequency field, the bitrate index field, and the bit allocation tables. These new definitions are valid if the ID bit in the ISO/IEC 11172-3 header equals zero. To obtain the best audio performance, the parameters of the psychoacoustic model used in the encoder have to be changed accordingly.

With these sampling frequencies, the duration of the audio frame corresponds to:

Layer	Sampling Frequency in kHz
Layer	16	22,05	24

I	24 ms	17,41.. ms	16 ms
ii	72 ms	52,24.. ms	48 ms
III	36 ms	26,12.. ms	24 ms

0.2 Low bitrate coding of multichannel audio

0.2.1 Universal multichannel audio system

A standard on low bit rate coding for mono or stereo audio signals was established by MPEG-1 Audio in ISO/IEC 11172-3. This standard is applicable for carrying of high quality digital audio signals associated with or without picture information on storage media or transmission channels with limited capacity.

The ISO/IEC 11172-3 audio coding standard can be used together with both MPEG-1 and MPEG-2 Video as long as only two-channel stereo is required. MPEG-2 Audio (ISO/IEC 13818-3) provides the extension up to 3/2 multichannel audio and an optional low frequency enhancement channel (LFE).

This part of ISO/IEC 13818 describes an audio subband coding system called ISO/MPEG-Audio Multichannel, which can be used to transfer high quality digital multichannel and/or multilingual audio information on storage media or transmission channels with limited capacity. One of the basic features is the backwards compatibility to ISO/IEC 11172-3 coded mono, stereo or dual channel audio programmes. It is designed for use in different applications as considered by the ISO/MPEG audio group and the specialist groups TG 10/1, 10/2 and 10/3 of the ITU-R (previously CCIR).

Multichannel audio systems provide enhanced stereo performance compared to conventional two channel audio systems. It is recognised that improved presentation performance is desirable not only for applications with

accompanying picture but also for audio-only applications. A universal and compatible multichannel audio system applicable to satellite or terrestrial television broadcasting, digital audio broadcasting (terrestrial and satellite), as well as other non-broadcasting media, e.g.,

CATV	Cable TV Distribution
CDAD	Cable Digital Audio Distribution
DAB	Digital Audio Broadcast
DVD	Digital Versatile Disc
ENG	Electronic News Gathering (including Satellite News Gathering)
HDTV	High Definition Television
IPC	Interpersonal Communications (video conference, videophone, etc.)
ISM	Interactive Storage Media (optical disks, etc.)
NDB	Network Database Services (via ATM, etc.)
DSM	Digital Storage Media (digital VTR, etc.)
EC	Electronic Cinema
HTT	Home Television Theatre
ISDN	Integrated Services Digital Network

seems to be very attractive to the manufacturer, producer and consumer.

0.2.2 Representation of multichannel audio

0.2.2.1 The 3/2-stereo plus LFE format

Regarding stereophonic presentation, specialist groups of ITU-R, SMPTE, and EBU recommend the use of an additional centre loudspeaker channel C and two surround loudspeaker channels LS and RS, augmenting the front left and right loudspeaker channels L and R. This reference audio format is referred to as"3/2-stereo" (3 front/ 2 surround loudspeaker channels) and requires the transmission of five appropriately formatted audio signals.

For audio accompanying picture applications (e.g. HDTV), the three front loudspeaker channels ensure sufficient directional stability and clarity of the picture related frontal images, according to the common practice in the cinema. The dominant benefit is the"stable centre", which is guaranteed at any location of the listener and important for most of the dialogue.

Additionally, for audio-only applications, the 3/2-stereo format has been found to be an improvement over two-channel stereophony. The addition of one pair of surround loudspeaker channels allows improved realism of auditory ambience.

A low frequency enhancement channel (in this part of ISO/IEC 13818 called LFE channel) can, optionally, be added to any of these configurations. The purpose of this channel is to enable listeners to extend the low frequency content of the reproduced programme in terms of both frequency and level. In this way it is the same as the LFE channel proposed by the film industry for their digital sound systems.

The LFE channel should not be used for the entire low frequency content of the multichannel sound presentation. The LFE channel is optional at the receiver, and thus should only carry low frequency sound effects, which may have a high level. The LFE channel is not included in any dematrixing operation in the decoder. The sampling frequency of the LFE channel corresponds to the sampling frequency of the main channels, divided by a factor of 96. This provides 12 LFE samples within one audio frame. The LFE channel is capable of handling signals in the range from 15 Hz to 120 Hz.

0.2.2.2 Compatibility

Extension from 2/0-stereo towards multichannel sound.

As a result of the widespread use of conventional two-channel stereo (2/0-stereo) reproduction, compatibility with existing 2/0-stereo sound reproduction systems or with existing matrixed surround sound receivers has to be maintained. This means that for many applications a basic stereo signal which contains an appropriate downmix of the audio information of the multichannel programme has to be transmitted together with the multichannel audio information. Appropriate downmix equations are given by equation pairs (1,2), (3,4), (5,6) and (7,8).

Lo = L + ½√2 ∗ C + ½√2 ∗ LS

(1)

Ro = R + ½√2 ∗ C + ½√2 ∗ RS

(2)

Lo = L + ½√2 ∗ C + ½∗ LS

(3)

Ro = R + ½√2 ∗ C + ½∗ RS

(4)

Lo = L

(5)

Ro = R

(6)

Lo = L + ½√2 ∗ C − ½√2 ∗ jS

(7)

Ro = R + ½√2 ∗ C + ½√2 ∗ jS

(8)

where jS is derived from LS and RS by calculation of the mono component. Then, a dynamic range compression and 90 degrees phase shift are applied to this component. The downmix (7,8) is suitable for existing matrixed surround decoders.

The format of an ISO/IEC 13818-3 bit stream is such that an ISO/IEC 11172-3 audio decoder properly decodes the basic stereo information according to one of the sets of downmix equations above (see 0.2.3.1). Compatibility with existing surround sound decoders by use of equations (7) and (8) has not been verified at the time of printing of this part of ISO/IEC 13818.

In the case of this part of ISO/IEC 13818, three different possibilities can be identified to provide to the user a basic stereo downmix together with the multichannel audio information:

1. Transmitting the 2/0-stereo sound inherently with the multichannel information in one bit stream in a backwards compatible way with ISO/IEC 11172-3, thus avoiding simulcast. This allows for the most efficient use of bit rate required for both, the 2/0-stereo and the multichannel audio signal. Additional advantages are that both programmes are strictly synchronized on a PCM audio sample basis, and that audio programme associated data carried in the ancillary data field of the MPEG-Audio bit stream have to be transmitted only once. The stereo downmix from the multichannel audio signal is handled by the ISO/IEC 13818-3 encoder. For this downmix, a number of matrix options according to equations (1) and (2) and equations (3) and (4) are provided by this part of ISO/IEC 13818 (see 2.5.2.13).
2. Simulcast of the multichannel audio signal, coded according to this part of ISO/IEC 13818, together with the 2/0-stereo signal coded according to ISO/IEC 11172-3. This solution requires two independent bit streams which can be multiplexed and transmitted by ISO/IEC 13818-1. The programme provider has to make provisions if a synchronization of both bit streams is required. Further, the simulcast option requires a significantly higher bit rate because instead of 5 channels in the case of 3/2 multichannel sound, altogether 7 audio channels have to be transmitted. However, the simulcast option allows for an individual, i.e. dynamic downmix to 2/0-stereo sound which can be controlled by a sound engineer.
3. Transmitting only the multichannel signal, by using the non-matrixed mode (downmix equation (5,6) ).

Each stereo decoder has then to be able to decode all the five channels, and to make a stereo downmix. Although the downmix can be applied before the filtering operation in the decoder, and the filter only needs to be done on two channels, this complicates the decoder significantly.

If compatibility with existing matrixed surround sound decoders is required, this part of ISO/IEC 13818 again provides three solutions:

1. To ensure a high efficiency regarding the bit rate required for both, the 3/2-multichannel and the matrixed surround signal, this surround signal can be transmitted in the backwards compatible stereo channel. The matrix-option '10' according to equations (7) and (8) provides an appropriate compatible signal which is transmitted in the basic stereo channels. A matrixed surround signal, suitable for existing matrixed surround decoders, can be obtained at the receiver by using an ISO/IEC 11172-3 two-channel decoder. The corresponding 3/2-channel output can be derived by using an ISO/IEC 13818-3 decoder.
2. A higher bit rate is necessary for simulcast of a matrixed surround signal using ISO/IEC 11172-3 and a 3/2-multichannel audio signal using this part of ISO/IEC 13818. This simulcast option allows for an independent mix of the matrixed surround signal which can be controlled by a sound engineer. The drawback of this solution is the additional bit rate necessary for transmitting 7 audio channels instead of only five channels if matrix-option '10' (see 2.5.2.13) is used.
3. Transmitting only the multichannel signal, by using the non-matrixed mode. Each stereo decoder has then to be able to decode all the five channels, and to make the downmix according to equation (7,8). Although the downmix can be applied before the filtering operation in the decoder, and the filter only needs to be done on two channels, this complicates the decoder significantly.

Downwards compatibility.

A hierarchy of audio formats providing a lower number of loudspeaker channels and reduced presentation performance (down to 2/0-stereo or even mono) and a corresponding set of downwards mixing equations are recommended in ITU-R Recommendation 775:"Multichannel stereophonic audio system with and without accompanying picture", November 1992. Alternative lower level audio formats which may be used in circumstances where economic or channel capacity constraints apply, are 3/1, 3/0, 2/2, 2/1, 2/0, and 1/0. Corresponding loudspeaker arrangements are 3/2, 3/1, 3/0, 2/2, 2/1, 2/0, and 1/0.

Backwards compatibility.

For several applications, the intention is to extend the existing 2/0-stereo sound system by transmitting additional audio channels (centre, surround) without making use of simulcast operation. This provision of backwards compatibility with existing receivers implies the use of compatibility matrices: the decoder of the previous generation must reproduce the two conventional basic stereo signals L'o/R'o, and the multichannel decoder produces the complete 3/2-stereo presentation L'/C'/R'/LS'/RS' from the basic stereo signal and the extension signals.

It is recognised that backward compatibility may not be required for all applications of MPEG-2 Audio.

Therefore, nonbackward compatible (NBC) audio coding systems free of the constraints of backwards compatibility are being evaluated for optional use with this part of ISO/IEC 13818.

0.2.2.3 Multilingual capability

Particularly for HDTV applications, multichannel stereo performance and bilingual programmes or multilingual commentaries are required. This part of ISO/IEC 13818 provides for alternative audio channel configurations in the five-channel sound system, for example a bilingual 2/0 stereo programme or one 2/0, 3/0 stereo sound plus accompanying services (e.g."clean dialogue" for the hard-of-hearing, commentary for the visually impaired, multilingual commentary etc.). An important configuration is the reproduction of commentary dialogue (e.g. via centre loudspeaker) together with the common music/effect stereo downmix (examples are documentation film, sport reports).

0.2.3 Basic Parameters of the Multichannel Audio Coding System

The transmission of the five audio signals of a 3/2 sound system requires five transmission channels (although, in the context of bitrate reduced signals, these are not necessarily independent). In order that two of the transmitted signals can provide a stereo service on their own, the source sound signals are generally combined in a linear matrix prior to encoding. These combined signals (and their transmission channels) are identified by the notation T0, T1,T2, T3 and T4.

0.2.3.1 Compatibility with ISO/IEC 11172-3

The ISO/MPEG-Audio Multichannel system provides full compatibility with ISO/IEC 11172-3. For a multichannel audio bit stream, backwards compatibility means, that an ISO/IEC 11172-3 audio decoder properly decodes the basic stereo information (see 0.2.2.2). Forwards compatibility means that an MPEG-2 multichannel audio decoder is able to decode properly an ISO/IEC 11172-3 audio bit stream.

The backwards compatibility is realised by coding the basic stereo information in conformance with ISO/IEC 11172-3 and exploiting the ancillary data field of the ISO/IEC 11172-3 audio frame (base frame, in the context of this part of ISO/IEC 13818) plus an optional extension frame for the multichannel extension.

The complete ISO/IEC 11172-3 audio frame incorporates four different types of information:

Header information within the first 32 bits of the ISO/IEC 11172-3 audio frame.
Cyclic Redundancy Check (CRC), consisting of 16 bits, just after the header information (optional).
Audio data, for Layer II consisting of bit allocation (BAL), scalefactor select information (SCFSI), scalefactors (SCF), and the subband samples.
Ancillary data. Due to the large number of different applications which will use this part of ISO/IEC 13818, the length and usage of this field are not specified.

The variable length of the ancillary data field enables packing the complete extension information of the channels T2/T3/T4 into the first part of the ancillary data field. If the MC encoder does not use all of the ancillary data field for the multichannel extension information, the remaining part of the field can be used for other ancillary data.

The bit rate required for the multichannel extension information may vary on a frame by frame basis, depending on the sound signals. The overall bit rate may be increased above that provided for in ISO/IEC 11172-3 by the use of an optional extension bit stream. The maximum bit rate, including the extension bit stream, is given by the following table:

Sampling Frequency	Layer	Maximum Total Bit Rate
32 kHz	I	903 kbit/s
32 kHz	ii	839 kbit/s
32 kHz	III	775 kbit/s
44.1 kHz	I	1075 kbit/s
44.1 kHz	ii	1011 kbit/s
44.1 kHz	III	947 kbit/s
48 kHz	I	1130 kbit/s
48 kHz	ii	1066 kbit/s
48 kHz	III	1002 kbit/s

This part of ISO/IEC 13818 describes the combinations of the basic Lo, Ro stereo of Layer I, II and III and the multichannel extension of Layer II mc and Layer III mc. The following combinations are possible:

Basic Lo, Ro Stereo	Multichannel Extension
Layer II	Layer II mc
Layer III	Layer III mc
Layer I	Layer II mc

0.2.3.2 Audio Input/Output Format

Sampling frequencies: 48, 44.1 or 32 kHz

Quantisation: up to 24 bits/sample PCM resolution

The following combinations of audio channels can be applied as inputs to the audio encoder:

a) Five channels, using the 3/2 configurationL, C, R plus two surround channels LS, RS
b) Four channels, using the 3/1 configurationL, C, R plus single surround channel S
c) Three channels using the 3/0 configurationL, C, R without surround
d) Five channels, using the 3/0 + 2/0 configurationL, C, R of first programme plus L2, R2 of second programme
e) Four channels, using the 2/2 configurationL, R plus two surround channels LS, RS
f) Three channels using the 2/1 configurationL, R with single surround channel S
g) Two channels, using the 2/0 (or 1/0+1/0) configurationStereo (or dual channel mode) as in ISO/IEC 11172-3
h) Four channels, using the 2/0 + 2/0 (or 1/0+1/0+ 2/0) configurationL, R (or channel I and channel II) of first programme plus L2, R2 of second programme
i) One channel, using the 1/0 configurationSingle channel mode (as in ISO/IEC 11172-3)
j) Three channels, using the 1/0 + 2/0 configurationSingle channel mode (as in ISO/IEC 11172-3) plus L2, R2 of second programme

The different combinations of audio input signals are encoded and transmitted within the up to five available transmission channels T0, T1, T2, T3 and T4, of which channels T0 and T1 are the two basic channels of ISO/IEC 11172-3 and convey the backwards compatible signals Lo and Ro. Transmission channels T2, T3 and T4 together form the multichannel extension information, which is compatibly transmitted within the ISO/IEC 11172-3 ancillary data field and an optional extension bit stream.

After multichannel decoding, the up to five audio channels are recovered and can then be presented in any convenient format at the choice of the listeners:

a) Five channels, using the 3/2 configurationFront: Left (L) and right (R) channel plus centre channel (C)Surround: Left surround (LS) and right surround (RS)
b) Four channels, using the 3/1 configurationFront: Left (L) and right (R) channel plus centre channel (C)Surround: Mono surround (S)
c) Three channels using the 3/0 configurationFront: Left (L) and right (R) channel plus centre channel (C)Surround: No surround
d) Four channels, using the 2/2 configurationFront: Left (L) and right (R) channelSurround: Left surround (LS) and right surround (RS)
e) Three channels, using the 2/1 configurationFront: Left (L) and right (R) channelSurround: Mono surround (S)
f) Two channels, using the 2/0 configurationFront: Left (L) and right channel (R)Surround: No surround
g) One channel output, using the 1/0 configurationFront: Mono channel (Mo)Surround: No surround

A low frequency enhancement channel can, optionally, be added to any of these configurations, except for the 1/0 configuration.

Outputs may be required to provide discrete signals, or may be combined in accordance with downward mixing, or upwards conversion equations, as defined in ITU-R Recommendation 775.

0.2.3.3 Composite Coding Modes

Dynamic Transmission Channel Switching

In order to provide a better orthogonality between the two compatible signals T0 and T1, and the three additionally transmitted signals T2, T3 and T4, it is necessary to have flexibility in the choice of the channels T2, T3 and T4. This part of ISO/IEC 13818 allows, independently for a number of frequency regions, the selection of a number of combinations of three out of the five signals L, C, R, LS, RS to be transmitted in T2, T3 and T4.

Dynamic Crosstalk

According to a binaural hearing model, it is possible to determine the portion of the stereophonic signal which is irrelevant with respect to the spatial perception of the stereophonic presentation. The stereo-irrelevant signal components are not masked, but they do not contribute to the localisation of sound sources. They are ignored in

the binaural processor of the human auditory system. Therefore, stereo-irrelevant components of any stereo signal (L, C, R, LS or RS) may be reproduced via any loudspeaker, or via several loudspeakers of the arrangement, without affecting the stereophonic impression. This can be done independently for a number of frequency regions.

Adaptive Multichannel Prediction

In order to make use of the statistical inter-channel dependencies, adaptive multichannel prediction is used for redundancy reduction. Instead of transmitting the actual signals in the transmission channels T2, T3, T4, the corresponding prediction error signals are transmitted. A predictor of up to 2nd order with delay compensation is used.

Phantom Coding of Centre

Due to the fact that the human auditory system uses only intensity cues of the audio signal for localisation at higher frequencies, it is possible to transmit the high frequency part of the centre channel in the front left and right channels, constituting a phantom source at the location of the centre loudspeaker.

0.2.3.4 Encoder and Decoder Parameters

Encoding and decoding:	similar to ISO/IEC 11172-3.
Coding modes:	3/2, 3/1, 3/0 (+ 2/0), 2/2, 2/1, 2/0 (+ 2/0), 1/0+1/0 (+ 2/0), 1/0 (+ 2/0)
	second stereo programme,
	up to 7 additional multilingual or commentary channels,
	associated services.
Subband filter transforms:	Number of subbands:	32
	Sampling frequency:	Fs/32
	Bandwidth of subbands:	Fs/64
Additional decomposition by MDCT (Layer III only):
	Frequency Resolution:	6 or 18 components per subband
LFE channel filter transform:	Number of LFE channels:	1
	Sampling frequency:	Fs/96
	Bandwidth of LFE channel:	125 Hz
Dynamic range:	more than 20 bits.

Annex E

(informative)

Ancillary Data Use

Introduction

A number of existing applications of MPEG Audio, including international standards (e.g. DAB [1], ITU-T J.52 [2]) have defined formats for the ancillary data field, according to their specific requirements. In this Annex, some examples are illustrated, that might be of interest for future applications.

Each ISO/IEC 13818-3 frame may contain a number of ancillary data bytes. This data can be carried in two separate fields of an ISO/IEC 13818-3 encoded frame. One field is located at the end of the base frame, in order to be compatible with the ancillary data definition of ISO/IEC 11172-3, the other is located at the end of the extension frame.

The most common use of ancillary data is Programme Associated Data (PAD), data intimately related to the audio signal.

Typical Programme Associated Data

Typical examples of Programme Associated Data are the indication of music or speech (Music/Speech flags), programme related text (ITTS [1]), Universal Product Code/European Article Number (UPC/EAN [1]]), special commands to a receiver/decoder which are provided synchronously to the audio programme, and Dynamic Range Control information (DRC). The DRC signal can be optionally used in the receiver to compress the dynamic range of the audio signal. This is an example of such data which would become useless if delayed when queuing in a shared data service.

All functions provided by PAD and the length of the PAD fields are user definable. Therefore, it is not mandatory to send any information in the PAD field.

Dynamic Range Control

It has long been recognised that for many listeners in less than perfect environments it is impractical to make use of the full dynamic range which may be carried by a digital audio signal. Methods of carrying data in a coded bitstream in order to allow a limitation of the reproduced dynamic range of an audio signal have already been defined for ISO/IEC 11172-3 Layer II for Digital Audio Broadcasting (DAB [1]).

With the help of the Dynamic Range Control (DRC) the receiver may reduce the dynamic range of the audio signal. The purpose of this is to adapt the dynamic range of the audio signal to listening either in a noisy environment or to an audio source which has a too high dynamic range (typical movie sound tracks) for domestic listening. ISO/IEC 13818-3 decoders may, optionally, provide such a compression of the audio dynamic range, using a process which either derives its control information from the audio signal itself or from a suitable DRC signal transmitted in the ancillary data field. It is an option for the programme provider to transmit a DRC signal; it is not a requirement of the system.

In the DAB specification, part of the extra data carried with the audio (the"F-PAD") can, amongst other things, carry a six bit DRC data field which is to be used to vary the gain applied to the reproduced audio signal. In current proposals [1], when Dynamic Range Control is signalled the six bits represent a gain to be applied to the recovered audio in the range of 0 — 15.75 dB in steps of 0.25 dB. The step size of 0.25 dB has been found by experimentation to be largest acceptable to provide smooth gain control during slow gain changes in classical music. The upper limit of 15.75 dB as the maximum signalled gain is thought to be adequate to allow suitable dynamic range reduction under not too difficult listening conditions. If a further reduction in dynamic range is required due to extremely adverse conditions, the signalled values can be scaled and the increase in step size is unlikely to be audible. If Dynamic Range Control data is being sent, the six bit value is required to be sent once every 24 ms. This represents a bit usage of 250 bits/sec (not counting the overhead required to signal the use of the DRC data).

Music/Speech Indication

These two flags indicate whether the transmitted sound consists of music or speech. The receiver may use this information to control any sound processing circuitry. One special combination of the flags signals that no indication is given. The Music/Speech indication typically requires two bit, repeated about 10 times per second.

Commands to a Receiver/Decoder

A channel can be provided to convey, synchronously to the audio signal, special commands to the receiver/decoder. Such commands may be used, for instance, to trigger the read out of a picture from a buffer memory that was filled, asynchronously, in advance. This channel is able to carry a few bytes within 0.2-0.5 second, at irregular intervals.

Programme related text

To elucidate the transmitted audio signal — a song, a programme item — coded text may be carried together with the audio. This text may be made on-site by the programme provider, it may be read from digital pre-recorded software and relayed more or less transparently, or various sources can be combined. The channel capacity required for text is dependent on how comprehensive and attractive the service is made.

In-house Information

Channels can be provided for both short, synchronous commands and for long strings of asynchronous data. The meaning of these commands is intended for internal use within a specific application only.

References:

[1]	European Telecommunication Standard pr ETS 300 401: 1995, Radio Broadcasting system; Digital Audio Broadcasting (DAB) to mobile, portable and fixed receivers.
[2]	ITU-T Recommendation J.52: 1995, Digital Transmission of High Quality Sound Programme Signals Using One, Two or Three 64 kbit/s Channels per Mono Signal(and up to Six per Stereo Signal).

ISO/IEC 13818-3:1998 情報技術—動画および関連するオーディオ情報の一般的なコーディング—パート3：オーディオ | ページ 3

序章

0.1 低いサンプリング周波数への ISO/IEC 11172-3 オーディオコーディングの拡張

0.2 マルチチャンネルオーディオの低ビットレート符号化

0.2.1 ユニバーサルマルチチャンネルオーディオシステム

0.2.2 マルチチャンネルオーディオの表現

0.2.2.1 3/2 ステレオ + LFE フォーマット

0.2.2.2 互換性

0.2.2.3 多言語機能

0.2.3 マルチチャネルオーディオ符号化システムの基本パラメータ

0.2.3.1 ISO/IEC 11172-3 との互換性

0.2.3.2 オーディオ入出力フォーマット

0.2.3.3 複合符号化モード

0.2.3.4 エンコーダーおよびデコーダーのパラメーター

附属書 E

補助データの使用

Introduction

0.1 Extension of ISO/IEC 11172-3 Audio Coding to Lower Sampling Frequencies

0.2 Low bitrate coding of multichannel audio

0.2.1 Universal multichannel audio system

0.2.2 Representation of multichannel audio

0.2.2.1 The 3/2-stereo plus LFE format

0.2.2.2 Compatibility

0.2.2.3 Multilingual capability

0.2.3 Basic Parameters of the Multichannel Audio Coding System

0.2.3.1 Compatibility with ISO/IEC 11172-3

0.2.3.2 Audio Input/Output Format

0.2.3.3 Composite Coding Modes

0.2.3.4 Encoder and Decoder Parameters

Annex E

Ancillary Data Use

ISO PDF プレビュー

序章

0.1 低いサンプリング周波数への ISO/IEC 11172-3 オーディオ コーディングの拡張

0.2 マルチチャンネルオーディオの低ビットレート符号化

0.2.1 ユニバーサル マルチチャンネル オーディオ システム

0.2.2 マルチチャンネルオーディオの表現

0.2.2.1 3/2 ステレオ + LFE フォーマット

0.2.2.2 互換性

0.2.2.3 多言語機能

0.2.3 マルチチャネル オーディオ符号化システムの基本パラメータ

0.2.3.1 ISO/IEC 11172-3 との互換性

0.2.3.2 オーディオ入出力フォーマット

0.2.3.3 複合符号化モード

0.2.3.4 エンコーダーおよびデコーダーのパラメーター

附属書 E

補助データの使用

Introduction

0.1 Extension of ISO/IEC 11172-3 Audio Coding to Lower Sampling Frequencies

0.2 Low bitrate coding of multichannel audio

0.2.1 Universal multichannel audio system

0.2.2 Representation of multichannel audio

0.2.2.1 The 3/2-stereo plus LFE format

0.2.2.2 Compatibility

0.2.2.3 Multilingual capability

0.2.3 Basic Parameters of the Multichannel Audio Coding System

0.2.3.1 Compatibility with ISO/IEC 11172-3

0.2.3.2 Audio Input/Output Format

0.2.3.3 Composite Coding Modes

0.2.3.4 Encoder and Decoder Parameters

Annex E

Ancillary Data Use

ISO PDF プレビュー

0.1 低いサンプリング周波数への ISO/IEC 11172-3 オーディオコーディングの拡張

0.2.1 ユニバーサルマルチチャンネルオーディオシステム

0.2.3 マルチチャネルオーディオ符号化システムの基本パラメータ