Scientific Computing and Visualization Group
Boston University

DAFFIE is a software package and network architecture for easily building distributed/collaborative tele-immersive environments from pre-existing data which supports dynamic behaviors through the use of networked agents. DAFFIE provides participant representations by avatars, multiple virtual spaces, animated models, 3D localized sound using 2-8 audio channels, and network-based telephony. It currently runs on SGI workstations and supports ImmersaDesks, CAVEs, SGI graphics workstations, and head-mounted displays. DAFFIE is being ported to run on a number of other Unix and NT platforms. DAFFIE is designed to run on high-speed, low-latency networks, such as the vBNS and Internet2.

The DAFFIE environment consists of virtual spaces, called rooms, populated with compound virtual objects. The virtual world is manipulated and/or presented to participants by networked clients. Each object consists of a number of components, including a set of geometric representations, positional information, and auditory information. The values of these components represent the state of the virtual world. Clients manipulate the world by acquiring ownership of a particular component and then broadcasting value changes across the network to other clients.

DAFFIE includes a simple scripting language for building and loading a virtual environment. The scripting language allows one to specify the various parameters for objects to be loaded into the virtual world. Some of the key elements contained within an object are a number of geometric model files, animation parameters (e.g. timing information) and sound files. At any point in time, an object is represented visually by one of its geometric models, and animated effects may be created by sequencing through a set of geometric models. Each model may have an associated sound file which is played whenever that model is displayed. The geometry and sound files may be in any of a number of standard formats, allowing these to be created in any convenient application outside of the DAFFIE environment. Typical tools that have been used to build DAFFIE virtual environment s include Power Animator, Maya, 3D Studio, Lightwave, and ProAudio.

The virtual world is organized into discrete spaces called rooms. Rooms are used to constrain the objects with which one can interact and to reduce the visual complexity, thereby reducing the computational demands on the display generators. Avatar objects move between rooms through a special type of object called a portal, which may interconnect any two spaces.

A running DAFFIE world includes an event server, immersive viewer clients, sound (e.g. telephony) generator and sound player clients. Additional sites may participate using viewer clients, sound clients, and autonomous agents, joining or leaving at any time. The clients communicate through a message passing system which provides reliable, sequenced message broadcasting.

A participant interacts with the virtual environment by using specialized output hardware for visual display and audio presence, including CAVES, ImmersaDesks, head-mounted displays, and workstation monitors. Specialized input hardware is used for head and hand tracking and user interaction, including the CAVE/ImmersaDesk wand, and Ascension Flock-of-Bird trackers and 6DOF mice.

The software for visual immersion reads in a large number of geometric models, maintains a database of all objects’ geometry and position information, adjusts the simulated view based on the viewer’s head position and navigation input, and renders the images which the viewer sees. Currently, all geometry and sound files reside on the viewer’s local disk. The next version will support transparent downloading. Tracked head and hand data, along with button presses, allow movement through the virtual scene and interaction with objects. Users can essentially fly, as the 6DOF tracking data is translated into velocity, giving arbitrary heading and motion control. Animated sequences of models are played locally and messages are sent to an associated audio program to provide synchronization between visual and audio events. Avatar and objects’ location and other state are continually updated and broadcast to all the other participating sessions. The viewers use the protocol to be described below to request and release ownership of objects to be moved, triggered, or otherwise modified. The viewer software is built on top of the CAVE library and SGI Performer.

Sound player clients receive commands and state information from the local viewer software as well as from the event server. Commands supported by sound players include the loading of sound files, the triggering of localized sounds, the alteration of sound parameters such as pitch and amplitude, etc. Location information for avatars and other objects in the environment is monitored and used by sound players to localize sound via amplitude control, which gives listeners cues about object direction as well as distance. Because this form of localization is computationally inexpensive, many localized sounds may be played at once. Speaker arrays containing as few as two and as many as eight speakers are supported; in the larger configurations, sounds may located above or below the listener. An approach based on the HRTF (Head Related Transfer Function) is currently under investigation.

Sound generator clients produce live audio streams which can be used for networked telephony. Digitized audio input is sent to the event server software and broadcast to all sound player clients. Telephony streams are “attached” to avatars via the local viewer. Sound players monitor avatar position information and use this to localize telephony streams (as above).

The networked clients use an API and an application level network protocol to manipulate the virtual world, and all clients are equally empowered to do so. This allows one to develop specialized applications to control the behavior of various objects within the virtual world.

Clients communicate by sending messages, also called events. Events may be sent point-to-point between two clients or broadcast to all clients. Each event has an event type and the sequence of events of a given type constitutes an event stream. Clients may dynamically select which event streams they are interested in receiving. For example, audio data is sent as a particular event stream and any client may choose to receive that stream or not. Each event stream has an associated service class. In the current version only one service class, reliable delivery, is support. However, work is underway to support multiple service classes to take advantage of differentiated network services (i.e. QoS).

Clients do not directly communicate over the network. Instead, the clients maintain point-to-point connections with an event server. The event server is responsible for managing the event streams, sequencing the events and broadcasting event streams to those clients which have subscribed to them. The server guarantees that clients receive all the events within a given event service class in exactly the same order. This serialization of events is the basis for guaranteeing state consistency across distributed clients.

Clients use a simple locking scheme to maintain a consistent, distributed state. The state variables are the various components of the objects in the world. To modify a state variable, a client first obtains a lock on that component by broadcasting a request to acquire the lock. All clients maintain the state of any locks in which they are interested and grant the lock of an unlocked variable upon receipt of the first acquisition request. Since all clients see all such request in exactly the same order, all clients will grant the lock to the same client. The client requesting the lock only needs to see its own request, without intervening requests from other clients, to know that it has the lock. All clients ignore state change messages from a client who does not own the lock on the particular component. This prevents errant clients from creating inconsistencies and allows a client to immediately start broadcasting state changes after it has requested a lock, with the caveat that these changes may be ignored if another client obtains the lock before it does.