An introduction to networked media concepts and services

This page describes the basic technical issues behind networked media services (particularly audio/video or A/V services delivered over computer networks).

The presentation begins with descriptions of activities which may be enhanced by networked media services, and then moves on to describe the basic and specific services that may be of use in more detail. These descriptions are followed by a short discussion of a simplified approach to strategic planning.

Table of Contents
         

Possible audiences include

Audio/video services can be of value to many members of the broad educational community, as well as to participants outside that community who wish to interact with individuals or groups within that community:

Activities to be facilitated and/or enhanced:

Technology, in general, and audio/video technology, in particular, can play, and has played, a role in the following general activities related to the educational mission: These activities are simple rubrics for collections of component activities that vary greatly from environment to environment and practitioner to practitioner. Often, however, they entail the same or similar more specific component activities. For example, "teaching" and "learning" usually include a "collaboration" component along with various data and lecture material presentation components, so that it may be more useful to categorize community activities in the following manner:

Services to be provided

Audio/video technology can make three basic contributions to the participant experience in these activities. First, it can enable participation across distance. Second it can enhance participant experience by adding multiple supportive communications media, and third, technology can enable "asynchronous" interaction.

For example, audio/video conferencing can bring teachers and students together across vast distances, audio/video demonstrations are often used to enhance lectures, and lectures may easily be recorded for viewing at a later date.

The biggest problem facing educators and technologists is not whether to use technology but which technologies, how, when and in what combinations. Each of the activities above may derive some benefits from the application of various technologies, but participants must determine which tools to employ and in what combinations and modes.

The next sections will describe the very basic audio/visual tools available to educators and then discuss combinations of these basic tools and finally specific implementations of such combinations. Note that it will be assumed that non-AV tools will be integrated with these video tools as needed. For example, the Web, middleware facilities, desktop applications, phones and faxes may all be used in concert with AV tools, but will be treated as ancillary technologies in this discussion.

Basic audio- and video-related capabilities:

Although there exists a plethora of products designed to facilitate the activities listed earlier, these products all represent different combinations of a few basic capabilities expressed in slightly different ways and with differing degrees of quality.

For example, here is one sequence of events that can occur while moving a video signal from one point to another. To begin with, a scene is shot using a video camera which converts it to an S-Video signal stream that is converted to digital form (or "captured") in an H.261 storage format (among many others), conveyed across a network embedded in some transport protocol such as RTP (the Realtime Transport Protocol), and converted back into S-Video at the receiving end where it can be displayed by a standard video projector. The process of conversion to some storage format (as H.261) usually involves some loss of image quality to accomodate limited disk storage and/or limited bandwidth during transport.

In general, such processes provide renditions of the original sounds and images with varying degrees of verisimilitude, which correspond to variations in product "quality." For example, an NTSC camera scans a video image as 525 separate scan lines, which can be captured at relatively low or high video "resolutions," usually specified as a rectangular collection of picture elements, or "pixels". The NTSC image is considered to have a quality roughly equivalent to 680 by 483, whereas one popular video conferencing format (H.261) converts NTSC images into 352 by 288 (SDTV Common Image Format) or 176 by 144 (QCIF) collection of pixels. MPEG-1 streams typically encode CIF images considered similar to the quality yielded by VHS recording technology, but can produce an image of up to 4095 by 4095 pixels. MPEG-2 is typically used to produce 720 by 576 images, but can be used to create 1920 by 1080 images when fed a High Definition input video signal (2200 horizontal samples by 1125 scan lines).

H.261 used for video conferencing typically generates 64kbps to 1.5Mbps traffic. MPEG-1 streams use up to 1.5Mbps. MPEG-2 is used by cable and satellite systems, generating 4 to 9Mbps for SDTV and 19.2Mbps for HDTV over cable, and for DVDs stored at 720 by 576.

In addition, image capture can occur at differing frame rates (3 or 4 frames per second up to 32 or so), and frames are sometimes lost during transport, all of which can significantly reduce video quality.

Ancillary services

To help users make contact with one another, a variety of whitepages schemes have been developed. For example, the H.323 suite includes directory capabilities that enable users to "call" one another by clicking through a contact list.

To help users find each other in a mobile environment with multiple communications devices under their control, the Session Initiation Protocol (SIP) has been designed to keep track of potential endpoint devices and, along with the H.350 LDAP schema, even facilitate device configuration and password management. In effect SIP can act like a standardized "buddy list," and at least one observer (Larry Amiott of Northwestern) has suggested that managing users' availability for communication may itself be a "killer app," even without managing communication devices.

Others have suggested that truly effective SIP-like services could finally propel both VoIP and video conferencing into common use.

Instant messaging offers both an alternative channel for communications while attending a video conference, and also a "back channel" for dealing with technical difficulties. It can also assist video conferees during cross cultural and/or multilingual exchanges.

In addition, several groups are trying to develop "middleware" products that will support user authentication that can be useful in many video applications. The Internet2 MiddleWare Initiative is the most relevant to Kansas universities, but other initiatives will be necessary for different types of organizations. Authentication and security requirements for video services appear to be somewhat different from other applications, mostly due to the (possibly multi-party) real-time nature of video and VoIP.

Other ancillary services include Whiteboards, desktop sharing, as well as shared virtual- or mediated-reality applications.

Combinations of basic technologies

Some of these basic services may be requested and delivered as is. For example, some groups (such as commercial sports or news networks) affiliated with campus offices may wish to transport analog audio/video signals over the campus fiber infrastructure without any other processing. In addition, some legacy video equipment still relies on such transport.

Video conferencing

More often, however, these basic services are deployed in a wide variety of combinations. For example, so-called "video-conferencing" is actually a bi-directional combination of synchronized audio- and video-capture and subsequent streaming to remote sites where it is displayed in an appropriate form. In addition, audio and video employed for video conferencing will usually be augmented by text-based chat, whiteboard sharing, web document sharing, desktop application sharing (PowerPoint, spreadsheet, etc.).

Furthermore, different combinations of these basic tools will be integrated to different degrees. Some products will include all of these tools as a collection of completely free-standing desktop applications, for example, while others (such as Click To Meet Express may integrate them as a single application, providing a (usually) easier-to-use interface that readily exploits synergies resulting from the combination of tools. Such products are sometimes referred to as "rich presence" solutions, though there are certainly degrees of "richness" and different combinations of tools.

Another recent interpretation of integrated communications tools was demonstrated at a recent Internet2 Joint Techs conference which included "SIP-based voice, video, and instant messaging over wireless fidelity (WiFi), and SIP voice conferencing - all in the context of rich presence derived from WiFi location service and enterprise calendaring." Participants were able to place SIP voice calls to any user at a SIP.edu-enabled institution (http://voip.internet2.edu/SIP.edu/) and were able to eavesdrop on meeting sessions by calling special "room buddies."

This demonstration involved contributions from the Internet Real-Time Lab (IRT) in the Department of Computer Science at Columbia University, the Internet2 Presence and Integrated Communications Working Group, and several other groups. For more information on SIP see this.

The use of multiple devices in a dynamic environment like a conference venue also illustrates the need for databases listing devices assigned to each user, as well as their operating characteristics and configurations, including authentication information and preferences. The ViDe Consortium in cooperation with the Southern Universities Regional Association (SURA) has recently spearheaded the development of the H.350 LDAP schema which provides a standardized storage format for such data. Used with SIP, H.350 should make it possible for device manufacturers to design devices that will interoperate seamlessly in demanding mobile environments. For more information on H.350 see this.

These issues hint at the need for centralized organizational support for video conferencing. Users can get good mileage out of simple video conferencing tools such as NetMeeting and MicroSoft Messenger, but campus administration must consider

Certain kinds of integration probably require central support because they require special hardware, centralized databases, and or other facilities likely to be used by a large proportion of the user community. For example, probably require some sort of centralized organizational support.

The ClickToMeet Express server also demonstrates a different kind of integration. ClickToMeet Express is a web-based service that allows (Wintel) users to simply start their desktop browsers and connect to a ClickToMeet server which downloads the video conferencing client to their browsers in the form of a web "plug-in". No software beyond the browser is required to use this tool.

Tool integration is more readily apparent on the desktop side of AV, it occurs on the server side as well. For example, there exist tools for conducting video conferences that record designated portions of a conference (or the entire exchange), automatically archive it, and then stream it to users on demand after the event, as well. This kind of capability could be used to provide review materials to students who attend such a conference, or to "include" students who miss the original.

Approaches to integration constitute much of the thrust of research and innovation in the area of audio-video product development, deployment and use. They also tend to inject some confusion into the choice of tools, as users are tempted to apply tools specialized for one purpose to another, or as purchasers attempt to decide among competing products with ill-defined sets of goals or expectations.

One way this plays out in the video conferencing arena is that video conferencing products are sometimes designed around collaboration groups of particular sizes, distribution, and technical ability of the users. For example, most desktop tools are designed for face-to-face meetings. On the other hand, one complex system, the the Access Grid, developed at Argonne National Laboratories, was explicitly designed to bring several (up to 10 or so) small groups together in an environment minimally disruptive to the participant experience, but amply supported by technical staff who could handle the details.

Video streaming

Products in the video streaming arena also incorporate basic capabilities in different combinations and/or in different ways. The major streaming services utilize different formats for stream encoding and delivery, as well as different systems for embedding video content within text documents. One example is the Real Media approach (SureStream), which provides streams that present the same material simultaneously at different levels of quality that can be selected dynamically by receiving clients.

Video streaming is used most frequently to display events of interest such as athletic events or lectures that have a relatively short duration. However, streaming can also be used to drive "video signage," when a continuous stream is used to feed one or more video displays located in high-traffic areas where they are encountered by numerous passers-by.

Video archiving and Video on Demand (VoD)

As an alternative (or aid) to video streaming, captured video material may be stored after capture rather than streamed to other systems. It may then be provided "on demand" to streaming clients who are responsible for display.

This combination of basic functionalities provides an opportunity to develop a variety of asynchronous services, particularly systems for embedding video content within text documents, but also including various kinds of navigational aids, associated descriptions of content ("metadata"), synchronized close-captioning, scene analysis, etc.

Note that video streaming and archiving may require ancillary services as part of the delivery infrastructure. The distribution of commercial video content over IP connections, for example, must usually include tools for discovery, authentication, access control and accounting.

Also some streaming systems might employ computers to display content, while others allow users to view content on standard televisions by using a set-top-box to convert incoming signals to a form suitable for televisions. Evaluating the alternatives and choosing among them can be a daunting task.

Examples of VoD site delivering education content would include Georgia Public Television (900 hours of educational content recorded at 384Kbps) and ResearchTV (with 2000 hours of MPEG-2 content).

There is at least one OpenSource initiative in this area. The SURF/net Video Portal system supports a unified video database serving multiple streaming servers which, in turn, serve user client requests.

Image analysis

Image analysis offers interesting future applications. Video streams have been used for facial recognition, product quality assurance, automobile license identification, etc. At KU work as been done to build tools that split video streams into component scenes that can be indexed for later asynchronous access.

As a more esoteric example, there exist tools for converting facial images into a shorthand representation of facial gesture. Streams of facial shorthand can then be shipped across a network and used to drive an avatar image presented to collaborators. Such an approach could allow video conferencing at much lower bandwidths than normally required for an effective exchange.

Image analysis also finds use in data reduction, surveillance, navigation and remote instrumentation applications.

Cataloging and Indexing

Video capture can generate large amounts of data that is sometimes difficult to organize for subsequent retrieval. For example, one recent pilot project produced approximately 9GB of video data spread over about 40 files retrievable only by lecture date. This material would have been much more useful if indexed by lecture topic, keywords, etc.

There have been numerous attempts to develop database systems to simplify retrieval of stored video content. The KU Digital Jayhawk employs an innovative, automated indexing system that receives audio/visual information in the form of the daily KUJH televised newscast, breaks the AV stream into parts using scene analysis as described above, stores each portion on a web site for later use, and indexes each part using words appearing in text broadcast as a close-captioned stream or displayed in the teleprompters read by on-screen announcers.

Nationwide there are several projects underway that aim to provide cataloged and/or indexed video materials for networked distribution: Georgia Public Television, Wisconsin Public Television, KCPT's Chalkwaves, etc. This approach appears to be replacing satellite as the preferable method of video delivery. Georgia has T1 or fiber to every K-12 in the state, over which they hope to deliver targeted video material. Such customization would be prohibitively expensive over satellite.

There are also several standards for cataloging video resources: MPEG-7, MARC, Dublin Core, MODS, LOM, etc. though most video collections seem to be using modified versions of "standard" approaches.

At least one "meta-collection" or collection of collections exists; the Moving Image Collections portal provides access to the catalogs describing multiple archives using disparate cataloging schemes. MIC is a collaborative effort of Rutgers, Georgia Tech, and the University of Washington.

Integrating video materials with other documents

Some video encoding formats, such as MPEG-4, allow ancillary textual information to be encoded along with the video stream. This kind of capability can allow users to select hyperlinks within a video display just as they select hyperlinks within text or single image documents displayed by Web browsers. Similar linkages can be constructed using streaming tools such as the SMIL language in association with media streaming.

Animation and hybrid video/animation

Although not explicitly AV traffic, some animation software generates animation imagery so similar to AV traffic that it should probably be grouped with AV services. For examples, some game simulations and even 3-D imaging applications may fall in this category.

In addition, there are some instances where video and animation streams might be merged to produce an hybrid result, as with combining a real-time video "head shot" with an animated or goniometer-driven avatar body, or projecting directions or descriptions onto video images in real time, so-called "augmented reality". For example, some groups are experimenting with "eyeglasses" that project video streams captured by a camera mounted on the eyeglass frame pointed at the scene that would normally be seen by the unaided eye. Overlays show directions to a destination site, captions on objects in the scene, and (woe is me) advertisements. Other applications include virtual furniture or laboratory instruments, and internal component views of bodies, machines, geologic formations, buildings, etc. For example, X-ray and/or ultrasound images of internal organs could be laid over real-time patient views.

Other work has projected costumes and/or additional appendages onto a stick-figure model of a subject put together from video cature images. This approach has already been used for dramatic effect in theatrical performances demonstrated by Internet2 member institutions.

(Long-term) archival storage

Much video material is of transient value, but some has at least the potential for long-term use. For example, many historians of dance and theatre value cinematic records of various aspects of dance, and are interested in long-term archival storage of video records and/or the conversion of other cinematic records into digital form for storage.

Using digital formats and media for such purposes continues to be debated. Some formats, such as MPEG-2 have been well standardized and commoditized to the point where they may be able to resist various ravages of time and therefore be safely used for long-term archiving.

Not all commentators agree however. For a dissenting (or at least cautionary) opinion see this article by VidiPax who focuses on the need to make high-quality digital versions of analog materials. Also, even if the digitized form of a video record survives, it is unclear whether software required to read and display the digitized format will be available (although it seems highly likely at this point in time that such software would be reconstructable if not continuously available).

Activities and equipment required to deliver A/V services:

Specialized equipment, software, and network connectivity are required to provide these services. Specialized event production, computing, logistics and operations skills are required to provide a complete suite of video services, though there is considerable overlap among services, which makes it important to match services and resources so as to waste neither.

The following list of resources required to provide services listed above has been prepared to give readers a general sense of the requirements:

Video conferencing can require:

Video streaming can require:

Video capture can require:

Video transport can require:

Video archiving and Video on Demand can require:

Commercial video delivery

Commercial video will involve the same kinds of staffing and equipment issues involved in video archiving and VoD, but overall resource requirements are more demanding for several reasons. First, commercial video must be delivered with relatively high quality. Second, video streams must be carefully restricted to authorized users and usage accounting is probably required. This requires access control arrangements not necessarily required for serving non-commercial content.

Adjustments must be made to the networking infrastructure to deliver high volume traffic protected from congestion, and video streams must probably be accessible through commodity display systems (such as standard television sets controlled by hand-held remotes). Using commodity display equipment requires the acquisition of Set Top Boxes (STBs), which can convert IP streams to TV-ready signals and assist with access control.

STBs can add expense, but also add capabilities that may be of interest to users. In particular, they may be used to pause programs during viewing, record programs for later viewing, and access the web as simple web browser.

Alternative organizational structures for A/V service delivery

The video and video-related services listed above may be provided via a number of organizational approaches. For example, they may be centralized under the control of a single organization entity or they may be federated into organization departments and/or individual groups within departments.

Some activities, such as video conferencing, will probably be used widely enough and have been commoditized to the point that they are suitable for federation. The role of central IT groups in supporting such distributed services could be limited to consulting about network provisioning and equipment acquisition, basic training in the use of conferencing equipment, and occasional emergency assistance.

Other services, such as streaming, could probably better be provided by by servers operated by centralized IT groups in protected environments with high speed network connectivity, high reliability and secure, settings.

Keep in mind that there will always be some customers who would rather hire outside providers or central IT groups to provide video conferencing services straight away, and that some departments will be (and already are) equipped to operate their own stream servers. Presumably there will always be a varied mix of customer needs.

Also, there will always be reasons to implement portions of services within different groups. For example, streaming video might be captured by camera operators and technicians in any campus department, but streamed from servers managed and operated by central IT services . And, of course, other partitions can be easily imagined.

Outline of a strategic approach

Establishing a long-term strategy for audio/video services will entail a number of steps. What follows is an outline of activities required to generate a broad plan.

Summary

There exist several basic processes for capturing and transporting audio/video information among remote locations. These processes can be used alone or in various combinations to construct tools useful to the wider educational community.

This paper has attempted to prepare a semi-technical groundwork for discussing, choosing, and implementing audio/video related services in educational settings.

Michael Grobe
July 2004