Network Working Group S. Pfeiffer
Internet-Draft C. Parker
Expires: December 7, 2003 CSIRO
June 8, 2003
Specification of the ANNODEX(TM) annotation format for
time-continuous bitstreams, Version 1.0
draft-pfeiffer-annodex-00
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026 except that the right to
produce derivative works is not granted.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 7, 2003.
Copyright Notice
Copyright (C) The Internet Society (2003). All Rights Reserved.
Abstract
This specification defines a file format for annotating and indexing
time-continuous bitstreams for the World Wide Web. The format has
been named ANNODEX(TM) for annotating and indexing. The ANNODEX(TM)
format enables the specification of named anchor points in
time-continuous bitstreams together with textual annotations and
hyperlinks in URI [4] format. These anchor points are merged
time-synchronously with the time-continuous bitstreams on authoring a
file in ANNODEX(TM) format. The ultimate aim of the ANNODEX(TM)
format is to enable an integration of time-continous bitstreams into
Pfeiffer & Parker Expires December 7, 2003 [Page 1]
Internet-Draft ANNODEX June 2003
the browsing and searching functionality of the World Wide Web.
At this point in time, the right to produce derivative works is not
granted to the IETF as the authors are uncertain about the necessity
to create a working group. The specification is not encumbered by
patents. The ANNODEX(TM) format is protected by a trademark to
prevent the use of the term "annodex" for any related but
non-conformant and therefore non-interoperable technology.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. The architecture of a Continuous Media Web . . . . . . . . . . 4
3. Overview of the ANNODEX(TM) bitstream format . . . . . . . . . 6
4. The annotation bitstream . . . . . . . . . . . . . . . . . . . 7
4.1 The 'head' frame . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 The 'a' frame . . . . . . . . . . . . . . . . . . . . . . . . 7
5. Media encapsulation format . . . . . . . . . . . . . . . . . . 10
5.1 Media mapping for Ogg encapsulation . . . . . . . . . . . . . 10
5.2 The format of the ANNODEX(TM) media mapping bos . . . . . . . 11
5.3 The format of the media and annotation bitstream media
mapping bos . . . . . . . . . . . . . . . . . . . . . . . . . 13
6. Handling time in the ANNODEX(TM) format bitstream . . . . . . 16
7. MIME media type registration for 'application/annodex' . . . . 19
8. Security considerations . . . . . . . . . . . . . . . . . . . 21
References . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 23
A. Head frame DTD . . . . . . . . . . . . . . . . . . . . . . . . 24
B. Anchor frame DTD . . . . . . . . . . . . . . . . . . . . . . . 27
C. Definitions of terms and abbreviations . . . . . . . . . . . . 30
D. Glossary of acronyms . . . . . . . . . . . . . . . . . . . . . 31
E. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 32
Intellectual Property and Copyright Statements . . . . . . . . 33
Pfeiffer & Parker Expires December 7, 2003 [Page 2]
Internet-Draft ANNODEX June 2003
1. Introduction
When searching the World Wide Web, continuous media files such as
audio and video files are still treated as "dark matter". It is not
possible to look inside such files, search for their content through
common text-based search engines, and actually directly access points
of interest inside them. The file can only be consumed in its
entirety until the point of interest is reached. In addition, such
files are "dead ends" in that by consuming their content the
hyperlinking functionality of the Web is left behind.
This document specifies a file format for interleaving of XML markup
with time-continuous data giving ANNODEX(TM) format media. The
ANNODEX(TM) format together with the Continuous Media Markup Language
(CMML) [14] and the URI standard [4], extended by temporal URI
references [13] build the basis technology to enable searching and
surfing of time-continuous data via existing Web infrastructure. The
ANNODEX(TM) format enables encapsulation of any type of streamable
time-continuous bitstream format thus being independent of current or
future compression formats. The XML tags were chosen to be very
similar to XHTML to enable a simple transfer of knowledge for HTML
authors.
The file extension of ANNODEX(TM) files is ".anx". This document
also applies for registration of the mime-type "application/annodex"
for ANNODEX(TM) format bitstreams.
The structure of this document is as follows: the introduction
describes the architecture of a Continuous Media Web based on
ANNODEX(TM) format media files and give an overview of the
ANNODEX(TM) media file format. The XML tags required to create
ANNODEX(TM) format media consist of two types of frames: header and
anchor frames. They form the annotation bitstream and are described
in section 4. The section on media encapsulation then describes the
mutiplexing format. The handling of the different time constructs in
ANNODEX(TM) format media is quite complex and gets discussed in
detail in section 6. The MIME type application and security
considerations build the final sections. Last but not least the
appendices give the actual specifications of the "head" and "a" DTDs
and definitions and acronyms.
Please note that this document assumes that the reader has a fluent
working knowledge of XML [1], HTML [2], XHTML [3] and the World Wide
Web. Deep knowledge of the Ogg encapsulation format version 0 [10]
is also a prerequisite to understanding of this specification. It is
also a sister document to the specification of the Continous Media
Markup Language (CMML Version 1.0) [14] for authoring ANNODEX(TM)
format bitstreams.
Pfeiffer & Parker Expires December 7, 2003 [Page 3]
Internet-Draft ANNODEX June 2003
2. The architecture of a Continuous Media Web
As with Webpages, ANNODEX(TM) format bitstreams first have to be
authored and then published on a Server. Authoring includes the
creation of the media bitstream plus the creation of annotations
(i.e. textual data descriptions), indexes (i.e. anchor points) and
hyperlinks (i.e. URIs [4]) for fragments of the media data.
Annotations, indexes and hyperlinks are encoded in XML [1] according
to the DTDs given in the appendix and interleaved into the media
document to create ANNODEX(TM) format bitstreams in a
time-synchronous fashion. This procedure can be performed both on
files and live streams. The collection of ANNODEX(TM) format
bitstreams on the Internet is called the Continuous Media Web as it
builds a Web of time-continuous resources.
Distribution of ANNODEX(TM) format bitstreams happens via a network
protocol such as HTTP [5] or RTP/RTSP [6]. The basic process is the
following: The client dispatches a download or streaming request to
the server with a certain URI. The server resolves the URI and
starts packetising ANNODEX(TM) format bitstreams, taking into account
temporal URI fragment specifications.
In case of packet loss due to an unreliable transport, media data or
anchor data may get lost; this may be important to the application or
not. Both media and mark-up data are treated with the same
importance. If a user doesn't care whether the media data is
completely received, then the mark-ups will be regarded the same way.
Anchors are typically treated as state changes; if an anchor tag gets
lost, the next anchor tag will restore the proper state. We
envisage, however, that a client may require the current state
information, so there should be a protocol request for sending the
current state again. This will be delivered by the server by
inserting another copy of the currently active anchor into the
ANNODEX(TM) bitstream.
To access the Continuous Media Web, a client such as a conformant Web
browser is required. A client can link to an ANNODEX(TM) bitstream
via a URI. A URI can point to a temporal offset in the ANNODEX(TM)
bitstream using temporal URI fragment identifiers [13] or to a named
offset by using the id tag of an anchor frame as a URI fragment
identifier. In this way, direct access to points of interest in the
media document is enabled. While playing back ANNODEX(TM) format
bitstreams, a user is being offered hyperlinks (URIs) to other Web
resources which (in the author's eye) are related to the currently
displayed media content.
A client may be a special player or a browser plugin. This
application must split an ANNODEX(TM) format bitstream into its
Pfeiffer & Parker Expires December 7, 2003 [Page 4]
Internet-Draft ANNODEX June 2003
constituent header and anchor frames, and the media document. A
decoder is required to display the encapsulated media document after
decoding it with the appropriate media decoder. While playing back
the media document, the application displays the hyperlinks and the
annotations for the active anchor frames.
Search engines can include published ANNODEX(TM) format files into
their search repertoire by finding annotations in the anchor frames
in a standard way independent of the encoding and packetising format
of the media document. This allows any media format to be spidered.
In addition, the protocol should allow to download only the CMML
mark-up from a published ANNODEX(TM) format file. This will stop
spiders from creating extensive network loads as they do not need to
download the media bitstream to gain the necessary information. It
also reduces the size of search archives, even for large amounts of
published ANNODEX(TM) format files, because a CMML file contains all
searchable annotations for the media fragments of its ANNODEX(TM)
format file.
Pfeiffer & Parker Expires December 7, 2003 [Page 5]
Internet-Draft ANNODEX June 2003
3. Overview of the ANNODEX(TM) bitstream format
The format of ANNODEX(TM) bitstreams consists of a bitstream of
time-continuous data interspersed with structured XML mark-up of an
annotation bitstream. It is designed to be used both as a persistent
file format and as a streaming format. Any encoding format for
time-continuous data can be encapsulated in the ANNODEX(TM) format as
long as it is streamable and is based on a regular data sampling rate
(called granulerate). XML mark-up is inserted between media packets
at the synchronised point in time.
There are two types of XML mark-up that are inserted: a header frame
("head"), and an arbitrary number of anchor frames ("a"). There is
only one head at the start of an annotation bitstream. It contains
structured and unstructured meta data describing the complete
time-continuous data bistream. In the simple case, an anchor frame
contains information on the fragment of media between the current
anchor and the next one (or the end of the document if none follows).
The following figure gives an example of such an ANNODEX(TM) format
bitstream and the temporal regions during which the "head" and "a"
frames are valid. It describes the simple case where anchor frames
don't overlap in time and there is only one media bitstream.
Annodexed media file (default annotation track only)
_______________________________________________________________________
| | | | | | | | | | |
|head| |a| |a| |a| |a| |
| | | | | | | | | | |
_______________________________________________________________________
|-------------|
|---------------|
|--------------------|
|----------|
|----------------------------------------------------------------------|
There is also a more complex scheme of authoring anchors for
ANNODEX(TM) format bitstreams. In it, anchors are grouped together
by giving them a type. Anchors of one type are not allowed to
overlap in time, but anchors of different type may overlap. This
enables the creation of different tracks of annotation. The
advantage is that it gives the author the choice to describe a
specific media file from different aspects, e.g. by giving different
language tracks. The client application then has the choice to
display only the default track or offer all existing tracks to the
user.
Pfeiffer & Parker Expires December 7, 2003 [Page 6]
Internet-Draft ANNODEX June 2003
4. The annotation bitstream
The annotation bitstream consists of a "head" frame and and arbitrary
number of "a" frames. These tags are briefly explained in this
section.
4.1 The 'head' frame
A "head" frame is an XML document that contains information about the
complete ANNODEX(TM) format bitstream. It is enclosed in "head"
tags. The DTD for the "head" frame can be found at http://
www.annodex.net/DTD/anxhead_1_0.dtd . It can be used for validation
of a "head" frame.
An example for a "head" frame is the following:
The Matrix
The xml declaration and the reference to the DTD make the "head"
frame a proper xml document. The DTD of the "head" frame is given in
the appendix and technically fully specifies the "head" frame. The
semantic meaning of each of the tags and attributes is the same as in
the CMML [14]. Please refer to the CMML specification document for
details.
4.2 The 'a' frame
An anchor frame is an XML document that contains information about a
fragment of the encapsulated time-continuous bitstream. It is active
from the time instant in the time-continuous bitstream at which it is
inserted until the time instant at which it is deactivated either
through another anchor frame (on the same annotation track) or
through the end of the file. It is enclosed in "a" tags. The DTD
for the "a" frame can be found at http://www.annodex.net/DTD/
anxa_1_0.dtd . It can be used for validation of an "a" frame.
An example for an "a" frame is the following:
Pfeiffer & Parker Expires December 7, 2003 [Page 7]
Internet-Draft ANNODEX June 2003
There is no spoon: Neo is waiting to see the Oracle in a room
full of children doing seemingly impossible things. One is making
spoons bend through telekinesis. Neo tries to do it himself, but
fails. Spoon boy: "Do not try and bend the spoon that's impossible,
instead only try to realize the truth." Neo: "What truth?" Spoon
boy: "There is no spoon." Neo: "There is no spoon?" Spoon boy: "Then
you'll see that it is not the spoon that bends, it is only
yourself." Neo tries again...
Den Löffel gibt es nicht: Neo entdeckt beim Besuch
des Orakels wie unwirklich seine Welt ist. Beim Versuch, einen
Löffel durch Telekinese zu verbiegen, bekommt er von dem Kind den
Rat: "Den Löffel gibt es nicht."
Unlike the "head" frame, the anchor frame contained within an
ANNODEX(TM) format media file does not contain an xml declaration and
a reference to the DTD. This information can be extrapolated from
the information stored in the "head" frame and would create an
unnecessary overhead if included in every anchor frame. It is
however necessary when extracting the anchor frame into a proper XML
document. The used version of xml can be extracted from the related
head frame, the dtd reference is the same as the one for the head
frame with replacing every occurance of "head" by "a". For example:
Although the "a" element can be considered as the root element of an
anchor frame described as an XML document, it also does not contain a
xmlns attribute. The reason is that the same namespace as in the
associated "head" frame is valid for all "a" frames in an ANNODEX(TM)
format bitstream and a repetition would only spend bandwidth
unnecessarily and be a cause for error. Similarly the default
language and directionality specified in the "head" frame are also
valid for the anchor frames.
Pfeiffer & Parker Expires December 7, 2003 [Page 8]
Internet-Draft ANNODEX June 2003
The DTD of the anchor frame is given in the appendix and technically
fully specifies the frame format. The semantic meaning of each of
the tags and attributes is the same as in the CMML [14]. Please
refer to the CMML specification document for details.
Pfeiffer & Parker Expires December 7, 2003 [Page 9]
Internet-Draft ANNODEX June 2003
5. Media encapsulation format
An ANNODEX(TM) format bitstream consists of XML markup in the
annotation bitstream interleaved with the related media frames of the
media bitstreams into a single bitstream.
It is not possible to use straight XML as encapsulation because XML
cannot enclose binary data except encoded as Unicode. The use of
Unicode would introduce too much overhead. Therefore, an
encapsulation format that could handle binary bitstreams and textual
frames was required.
The following list gives a summary of the requirements for the
ANNODEX(TM) format bitstream:
o framing for binary time-continuous data and XML.
o temporal synchronisation between time-continuous media bitstreams
and XML on interleaving.
o temporal resynchronisation after parsing error.
o detection of corruption.
o seeking landmarks for direct random access.
o streaming capability (i.e. the information required to parse and
decode a bitstream part is available at the time at which the
bitstream part is reached and does not come e.g. at the end of
the stream).
o small overhead.
o simple interleaving format with a track paradigm.
The Ogg encapsulation format version 0 [10] was chosen as the
encapsulation format for ANNODEX(TM) format bitstreams as it provides
for all the requirements and has proven reliable and stable.
5.1 Media mapping for Ogg encapsulation
This section specifies the way the Ogg media encapsulation framework
is used for creating ANNODEX(TM) format bitstreams. As such,
knowledge of the Ogg bitstream format as specified in the Ogg RFC
[10] is presumed. Please also refer to that document for
descriptions of the terms used in this document. This section
describes the specific media mapping that is used for ANNODEX(TM)
format bitstreams.
Pfeiffer & Parker Expires December 7, 2003 [Page 10]
Internet-Draft ANNODEX June 2003
ANNODEX(TM) format bitstreams consist of one or more time-continuous
media bitstreams and an XML annotation bitstream concurrently
interleaved (in Ogg terms: multiplexed) into an Ogg bitstream.
Sequential multiplexing is allowed, but can only happen with complete
ANNODEX(TM) format bitstreams.
Every ANNODEX(TM) format bitstream consists of at least two logical
bitstreams: the ANNODEX(TM) media mapping bitstream and the
annotation bitstream that contains the "head" and the "a" tags. The
bos pages of these two (in order) are followed by the bos pages of
any number of media bitstreams. Then all the secondary header pages
of all the media bitstreams follow, including a packet of the
annotation bitstream containing the "head" tag as secondary header
for the annotation bitstream. Then, the bitstream data is
multiplexed in time-synchronous fashion.
5.2 The format of the ANNODEX(TM) media mapping bos
The ANNODEX(TM) media mapping bitstream consists only of one bos page
which contains information for the complete physical bitstream. The
bos page has the following format:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier 'Annodex\0' |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version major | Version minor |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timebase numerator |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timebase denominator |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| UTC |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
Pfeiffer & Parker Expires December 7, 2003 [Page 11]
Internet-Draft ANNODEX June 2003
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The LSb (least significant bit) comes first in the Bytes. Fields
with more than one byte length are encoded LSB (least significant
byte) first.
The fields in the ANNODEX(TM) bos page have the following meaning:
1. Identifier: a 8 Byte field that identifies this file to be of
ANNODEX(TM) format. It contains the magic numbers:
0x41 'A'
0x6e 'n'
0x6e 'n'
0x6f 'o'
0x64 'd'
0x65 'e'
0x78 'x'
0x00 '\0'
2. Version major: 2 Byte short integer number signifying the major
version number of the ANNODEX(TM) format bitstream. This
document specifies the major version 1.
3. Version minor: 2 Byte short integer number signifying the minor
version number of the ANNODEX(TM) format bitstream. This
document specifies the minor version 0.
4. Timebase numerator & denominator: 8 Byte integer number each.
They represent together the timebase of the ANNODEX(TM) format
bitstream given as a rational number. The denominator represents
the temporal resolution at which the timebase is given. E.g. 5
on 1000 results in a timebase of 0.005 sec. This enables a very
high temporal resolution without having to store floating point
numbers.
5. UTC: a 20 Byte string containing a UTC time in the form of
YYYYMMDDTHHMMSS.sssZ. It associates a calendar date and a
wall-clock time with the timebase. It is a zero length string if
not in use.
Pfeiffer & Parker Expires December 7, 2003 [Page 12]
Internet-Draft ANNODEX June 2003
Please note: The possible temporal resolution of the timebase is on
the order of 2^-64. However the time formats in use for media that
are described in this document range from 1/24 to 1/60 for the
different smpte formats and to 1/1000 for npt. Thus, this resolution
is enough for anyone of them. What's more, this resolution is
expected to accommodate any future needs of time resolution for any
other time format (and time-continuous sampled data).
5.3 The format of the media and annotation bitstream media mapping bos
The media and annotation bitstreams start each with one bos page
containing information required for the decoding of the bitstream.
After that, secondary header pages follow that contain information to
set up the decoder for the bitstream and other stream-specific
information. Then, the pages that contain the actual data follow.
For the annotation bitstream, the "head" frame is encapsulated into
one (or more) secondary header pages. The anchor frames represent
the actual data of the annotation bitstream.
The bos page of a media or annotation bitstream has the following
format:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier 'AnxData\0' |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Granule rate numerator |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Granule rate denominator |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Number of secondary header pages |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Number of bytes used for mime type string |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Mime type string |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Number of bytes used for identifier string |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier string |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... |
Pfeiffer & Parker Expires December 7, 2003 [Page 13]
Internet-Draft ANNODEX June 2003
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The LSb (least significant bit) comes first in the Bytes. Fields
with more than one byte length are encoded LSB (least significant
byte) first.
The fields in the ANNODEX(TM) bos page have the following meaning:
1. Identifier: a 8 Byte field that identifies this file to be of a
logical input bitstream with encoded information. It contains
the magic numbers:
0x41 'A'
0x6e 'n'
0x78 'x'
0x44 'D'
0x61 'a'
0x74 't'
0x61 'a'
0x00 '\0'
2. Granule rate numerator & denominater: 8 Byte integer number each.
They represent the temporal resolution of the logical bitstream
in Hz given as a rational number in the same way as the timebase
attribute above.
3. Number of secondary header pages: a 4 Byte integer number that
contains the number of secondary header pages of that particular
logical bitstream following after this bos page.
4. Number of bytes used for mime type string: a 4 Byte integer
number giving the length of the Mime type field following
directly after this field and including the final NUL character.
5. Mime type string: a character sequence containing the mime [8]
type of the data encoded in this logical bitstream. E.g. for
the annotation bitstream it will be "text/cmml" as upon
extraction of the annotation bitstream from the ANNODEX(TM)
format bitstream a CMML document [14] results.
6. Number of bytes used for identifier string: a 4 Byte integer
Pfeiffer & Parker Expires December 7, 2003 [Page 14]
Internet-Draft ANNODEX June 2003
number giving the length of the identifier string following
directly after this field, including the final NUL character.
7. Identifier string: a character sequence containing an XML ID
identifier text for this logical bitstream.
Pfeiffer & Parker Expires December 7, 2003 [Page 15]
Internet-Draft ANNODEX June 2003
6. Handling time in the ANNODEX(TM) format bitstream
ANNODEX(TM) format bitstreams inherently represent one timeline only,
where the different media and the annotation bitstream can be thought
of as content tracks on that timeline. All of these tracks relate to
the same timeline which starts at a certain time point and ends when
the last bitstream ends. An example bitstream can be seen in the
following figure. It consists of an ANNODEX(TM) format bitstream
that contains 4 media bitstreams and the annotation bitstream. In
the flat representation these will be multiplexed such that the data
frames of each of these bitstreams occurs at the correct time.
The following bitstream is a conceptual representation of the time
intervals covered by the different logical bitstreams:
t0 tn
|------------------------------------------------------------------->|
----------------------------------------------------------------------
| a1 | a2 | a3 | a4 |
----------------------------------------------------------------------
annotation bitstream
---------------------------------------------
| audio bitstream 1 |
---------------------------------------------
--------------------------------------------------------------
| video bitstream 1 |
--------------------------------------------------------------
-----------------------------------------------------
| audio bitstream 2 |
-----------------------------------------------------
------------------------------
| video bitstream 2 |
------------------------------
The time point at which the ANNODEX(TM) format bitstream starts (t0
in the above example) is called the "timebase" and represents the
playback time in seconds associated with the beginning of the
ANNODEX(TM) format bitstream. This start time may but does not have
to be 0 - it can be any positive time offset. This time offset is
stored in the ANNODEX(TM) bitstream bos page.
Each one of the encapsulated media bitstreams and the annotation
bitstream have their own temporal resolution at which they can
provide data to cover the given timeline. This temporal resolution
is usually given through the sampling rate of the particular
bitstream. For example, a raw audio bitstream at CD quality is
sampled with a sampling rate of 44100 Hz. A video bitstream may be
Pfeiffer & Parker Expires December 7, 2003 [Page 16]
Internet-Draft ANNODEX June 2003
sampled with a frame rate of 25 frames per second. This temporal
resolution is stored in the "granulerate" field of the bos page of
the bitstream.
The "granulerate" is used for the calculation of the time position
for which a data packet of the media bitstreams contains data. The
"granulepos" field in an Ogg page when divided by the "granulerate"
of that page's logical bitstream provides the time position that is
reached in that bitstream after decoding all data packets finished on
this page. E.g. if an audio bitstream has a granulerate of 44100
and starts at 0, then a granulepos of 88200 signifies that the
bitstream has reached the second sec after the end of decoding this
page's packets.
The annotation bitstream's "granulerate" can be chosen arbitrarily by
the bitstream multiplexer. One option is to choose the least common
multiple of the granulerates of all the media bitstreams to gain at
least the resolution of the bitstreams. However, that resultion may
not be enough compared to the one that the author of anchors is
asking for on insertion time. One solution is to accommodate for all
possible time schemes of the anchors. Thus, a time resolution of the
least common multiple of the resolution of all the npt and smpte time
schemes is another option.
The possible time schemes with their respecitve resolutions are:
o npt: 1000
o smpte-24: 24
o smpte-24-drop: 24/1.001 = 23.976 (approx. as per SMPTE)
o smpte-25: 25
o smpte-30: 30
o smpte-30-drop: 30/1.001 = 29.970 (approx. as per SMPTE)
o smpte-50: 50
o smpte-60: 60
o smpte-60-drop: 60/1.001 = 59.940 (approx. as per SMPTE)
To get to integer values, it is necessary to multiply all resolutions
by 1000 and then take the least common multiple: lcm(1000000, 24000,
23976, 25000, 30000, 29970, 50000, 60000, 59940) = 2997000000. The
"granulerate" would therefore be 2997000. This provides for a
Pfeiffer & Parker Expires December 7, 2003 [Page 17]
Internet-Draft ANNODEX June 2003
temporal resolution on the order of 10^-6, accommodating for a mixed
use of all the above given time schemes.
The "granulepos" of the (set of) page(s) holding an anchor frame of
the annotation stream has to signify the start time of that anchor
frame. E.g. if the "granulerate" of the annotation bitstream is
1000, the "timebase" is 0, and an anchor is to be inserted at
npt=12.020, its "granulepos" will be 12020. Anchors can be repeated
in the ANNODEX(TM) format bitstream, which will be signified by
having the same "track" attribute and the same page_sequence_number
as the previous anchor frame.
Pfeiffer & Parker Expires December 7, 2003 [Page 18]
Internet-Draft ANNODEX June 2003
7. MIME media type registration for 'application/annodex'
This section contains the registration information for the
'application/annodex' media type. While this media type is not
approved by the IANA, 'application/x-annodex' may be used.
To: ietf-types@iana.org
Subject: Registration of MIME media type application/annodex
MIME media type name: application
MIME subtype name: annodex
Required parameters: none
Optional parameters: none
Encoding Considerations: the ANNODEX(TM) enables encapsulation of any
type of encoding format. The authoring software has to provide for
the encoders, while the client software has to look out for the
browsers.
Security considerations: see next section.
Interoperability considerations: the ANNODEX(TM) bitstream format is
a free specification that is independent of any media encoding
format. It is designed to provide interoperability with the existing
World Wide Web. Its specification is not patented and can be
implemented by third parties without patent considerations.
Additional information:
Magic numbers: "OggS" identifies an Ogg page, "Annodex" identifies
an Ogg page with an ANNODEX(TM) format bitstreams, and "AnxData"
signifies an Ogg page with media or annotation bitstream
File extension: .anx
Macintosh File Type Code: "ANDX"
Intended usage: COMMON
Fragment identifiers: Any named element, i.e. element that contains
an "id" attribute, may be referenced through a fragment identifier of
a URI. However, the values of the id attribute of the anchor tags
are the most important ones used for addressing the identified "a"
tags in the ANNODEX(TM) format bitstream. Also, the generic temporal
Pfeiffer & Parker Expires December 7, 2003 [Page 19]
Internet-Draft ANNODEX June 2003
URI fragment addressing scheme [13] can be used as a fragment
identifier on ANNODEX(TM) format bitstreams and then relates to that
specific time offset in the ANNODEX(TM) format bitstream, calculated
with respect to the "timebase" of the ANNODEX(TM) bos page.
An example for a URI to a named media fragment is the following:
http://www.foo.bar/matrix.anx#no_spoon
Examples for URIs to temporal fragments are the following:
http://www.foo.bar/matrix.anx#@npt=21.4
http://www.foo.bar/matrix.anx#@smpte-25=01:00:21:10
http://www.foo.bar/matrix.anx#@utc=20030601T240000Z
Pfeiffer & Parker Expires December 7, 2003 [Page 20]
Internet-Draft ANNODEX June 2003
8. Security considerations
ANNODEX(TM) format bitstreams contain several multiplexed binary
media and one XML annotation bitstream. There is no generic
encryption or signing mechanism provided for the complete bitstream
or anyone of its parts. As the format of the encapsulated media
bitstreams is not prescribed and is identified through the "mimetype"
field in that bitstream's bos page, it is possible to encrypt or sign
that media bitstream and then mark it accordingly with a mime type
that signifies the encryption. It is up to the applications that use
this bitstream to provide an appropriate codec to handle such
bitstreams.
As ANNODEX(TM) format bitstreams contain binary media bitstreams, it
is possible to include executable content in them. This can be an
issue with applications that decode these bitstreams, especially when
they are used in a network scenario. Such applications have to
ensure correct handling of manipulated bitstreams, of buffer overflow
and the like.
Pfeiffer & Parker Expires December 7, 2003 [Page 21]
Internet-Draft ANNODEX June 2003
References
[1] World Wide Web Consortium, "Extensible Markup Language (XML)
1.0", W3C XML, October 2000, .
[2] World Wide Web Consortium, "HTML 4.01 Specification", W3C HTML,
December 1999, .
[3] World Wide Web Consortium, "XHTML(TM) 1.0 The Extensible Hyper
Text Markup Language", W3C XHTML, January 2000, .
[4] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
Resource Identifiers (URI): Generic Syntax", RFC 2396, August
1998.
[5] Fielding, R., Gettys, J., Mogul, J., Nielsen, H., Masinter, L.,
Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol --
HTTP/1.1", RFC 2616, June 1999.
[6] Schulzrinne, H., Rao, A. and R. Lanphier, "Real Time Streaming
Protocol (RTSP)", RFC 2326, April 1998.
[7] Alvestrand, H., "Tags for the Identification of Languages", RFC
1766, March 1995, .
[8] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046, November
1996, .
[9] Whitehead, E. and M. Murata, "XML Media Types", RFC 2376, July
1998, .
[10] Pfeiffer, S., "The Ogg encapsulation format version 0", RFC
3533, May 2003, .
[11] The Society of Motion Picture and Television Engineers, "SMPTE
STANDARD for Television, Audio and Film - Time and Control
Code", ANSI 12M-1999, September 1999.
[12] ISO, TC154., "Data elements and interchange formats --
Information interchange -- Representation of dates and times",
ISO 8601, 2000.
[13] Pfeiffer, S. and C. Parker, "Syntax of temporal URI fragment
specifications (work in progress)", I-D
draft-pfeiffer-temporal-fragments-00.txt, Feburary 2003,
Pfeiffer & Parker Expires December 7, 2003 [Page 22]
Internet-Draft ANNODEX June 2003
.
[14] Pfeiffer, S. and C. Parker, "Specification of the Continuous
Media Markup Language (CMML), Version 1.0 (work in progress)",
I-D draft-pfeiffer-cmml-00.txt, June 2003, .
Authors' Addresses
Silvia Pfeiffer
Commonwealth Scientific and Industrial Research Organisation CSIRO, Australia
Locked Bag 17
North Ryde, NSW 2113
Australia
Phone: +61 2 9325 3141
EMail: Silvia.Pfeiffer@csiro.au
URI: http://www.cmis.csiro.au/Silvia.Pfeiffer/
Conrad D. Parker
Commonwealth Scientific and Industrial Research Organisation CSIRO, Australia
Locked Bag 17
North Ryde, NSW 2113
Australia
Phone: +61 2 9325 3133
EMail: Conrad.Parker@csiro.au
URI: http://www.cmis.csiro.au/Conrad.Parker/
Pfeiffer & Parker Expires December 7, 2003 [Page 23]
Internet-Draft ANNODEX June 2003
Appendix A. Head frame DTD
Pfeiffer & Parker Expires December 7, 2003 [Page 24]
Internet-Draft ANNODEX June 2003
Pfeiffer & Parker Expires December 7, 2003 [Page 25]
Internet-Draft ANNODEX June 2003
Pfeiffer & Parker Expires December 7, 2003 [Page 26]
Internet-Draft ANNODEX June 2003
Appendix B. Anchor frame DTD
Pfeiffer & Parker Expires December 7, 2003 [Page 27]
Internet-Draft ANNODEX June 2003
Pfeiffer & Parker Expires December 7, 2003 [Page 29]
Internet-Draft ANNODEX June 2003
Appendix C. Definitions of terms and abbreviations
Anchor frame: XML data containing information on a fragment of a
time-continuous bitstream.
Fragment: a subpart of a media document covering some temporal
interval.
Mark-up: XML tags and their content used to describe a media
document.
ANNODEX(TM) media: encapsulated time-continuous bitstream with Head
and Anchor frames.
Annotating: the task of giving textual descriptions to fragments of
media documents.
Indexing: the task of identifying index points for media documents or
fragments thereof.
Hyperlinking: the task of linking from one Web resource to another.
If a link has an offset into the resource, this is sometimes
called deep hyperlinking.
Head frame: XML data containing information on an ANNODEX(TM)ed media
file.
media packet: a block of digital data that represents a temporal
subpart of a stream of continuous media. Media packets of one
continuous media file do not overlap in time.
bitstream: a sequence of time-continuous data.
Pfeiffer & Parker Expires December 7, 2003 [Page 30]
Internet-Draft ANNODEX June 2003
Appendix D. Glossary of acronyms
CMML: Continuous Media Markup Language.
DTD: Document Type Declaration.
XML: eXtensible Markup Language.
CMWeb: Continuous Media Web.
Web: World Wide Web.
URI: Unified Resource Identifier.
Pfeiffer & Parker Expires December 7, 2003 [Page 31]
Internet-Draft ANNODEX June 2003
Appendix E. Acknowledgements
The authors greatly acknowledge the contributions of: Andre Pang,
Andrew Nesbit, and Simon Lai in developing this standard..
Pfeiffer & Parker Expires December 7, 2003 [Page 32]
Internet-Draft ANNODEX June 2003
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
intellectual property or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; neither does it represent that it
has made any effort to identify any such rights. Information on the
IETF's procedures with respect to rights in standards-track and
standards-related documentation can be found in BCP-11. Copies of
claims of rights made available for publication and any assurances of
licenses to be made available, or the result of an attempt made to
obtain a general license or permission for the use of such
proprietary rights by implementors or users of this specification can
be obtained from the IETF Secretariat.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights which may cover technology that may be required to practice
this standard. Please address the information to the IETF Executive
Director.
Full Copyright Statement
Copyright (C) The Internet Society (2003). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assignees.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
Pfeiffer & Parker Expires December 7, 2003 [Page 33]
Internet-Draft ANNODEX June 2003
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Pfeiffer & Parker Expires December 7, 2003 [Page 34]