![]() |
Industry Information |
Virtual Reality |
State-Of-The-Art and Key Challenges |
WESCON '95 |
San Francisco
November 9, 1995
© 1995 General Reality Company
Virtual Reality (VR) is the popular term for a class of technologies that promise to dramatically redefine the ways humans interact with computers. The simplest way to conceptualize VR is to think of VR's ultimate goal: first-person point-of-view networked simulations in which humans navigate about a 3D environment, interact with the environment and with computer-generated human forms, and in which users can not differentiate between the simulation and the real world. Perhaps the best known example of such an experience is the "Holodeck" on Star Trek's Starship Enterprise.
To achieve such a goal, major engineering advances are required in a variety of disciplines, in addition to a few basic scientific breakthroughs which may never occur. However, as the tools of VR inexorably advance, more and more applications will be found where even current "imperfect" VR systems provide order-of-magnitude improvements over conventional approaches.
This paper is intended to present a brief overview of current VR technology and applications, key areas of near-term research, and key scientific challenges.
What is Virtual Reality
Virtual reality is nothing more than an experientially-immersive 3D graphics simulation. Today's typical VR system consists of a powerful graphics computer running a real-time 3D visual simulation program, input peripherals to control the simulation, and output peripherals to display the simulation.
What sets VR apart from desktop visualization systems is that the input and output peripherals are designed to be immersive and intuitive, providing a means for interacting with the simulation in natural, human ways rather than computer-centric ways. Thus VR can be described as a human interface paradigm which changes the interface from using your fingertips and 2% of your visual field to using your entire body and all of your senses.
VR emerged from the military simulator industry over thirty years ago, when flight-simulator pioneer Evan Sutherland first proposed that simulations could be made more experientially-immersive by mounting a video display on the user's head, and sensing the head's orientation as a means to generate real-time 3D computer graphics imagery appropriate for each instantaneous orientation. So emerged the world's first head-mounted display, or HMD.
Over the years, an entire industry has formed around the VR concept, driven by advances in a variety of high-technology industries such as computer graphics, displays, 3D modeling, computer peripherals, and behavioral dynamics software. While in technological terms the quality and performance of many VR simulations still leave a lot to be desired, there is already a sizable and growing list of applications where today's VR systems add value significantly greater than their cost. This indicates that VR is not a fad, but a true paradigm shift in human/computer interaction.
Computer Graphics Hardware
Since the eye is the most capable information capture sensory organ in most humans, the most basic requirement for simulating reality is a powerful graphics computer.
Graphics computing power is typically specified in terms of polygons/sec and frame rate, where the simulation is rendered using a graphics pipeline that 1) constructs a 3D model from individual polygons, 2) arrays those polygons in a series of 3D meshes that fill the display's instantaneous field-of-view, 3) eliminates polygons or portions of polygons that are hidden by others, 4) fills the faces of remaining polygons with texture-map images and/or shading, 5) lights the polygons as would light sources in a real environment, 6) clips the edges of the scene to fit the selected display device, and 7) saves the resulting data as a bit map in the computer's display buffer. Polygon rendering rates are usually lower for textured polygons, which are critical for creating rich, realistic looking worlds.
To perform the rendering process for the highest-possible image fidelity, a number of advanced real-time image processing techniques are required, including bit map interpolation, texture map scaling/warping, partially-transparent overlays, antialiasing, goraud shading, and Z-buffering. While significant argument exists about the performance levels required for ultimate fidelity, many in the VR field believe that reality can be visually simulated with a 1,000,000 polygon textured model, updated at 30 frames per second, for a total of 30,000,000 polygons/sec. In comparison, current and near-term graphics computer performance spans a range of 25,000-750,000 polygon/sec performance, as shown in Figure 1.
| Vendor & Model | Performance
(textured polys/sec) |
Price
(Approx) |
Available |
| Silicon Graphics
Reality Engine II |
220,000-300,000 | $250,000 | 1994 |
| Creative Labs
Reality Blaster |
25,000-300,000
(3D Labs Glint IC) |
$300 | 1995/6 |
| Lockheed Martin
R3D-PRO-1000 |
750,000 | $37,500 | 1996 |
Given Moore's Law, which suggests that semiconductor performance doubles approximately every eighteen months, it is not difficult to project consumer-priced computers capable of generating graphics indistinguishable from reality within ten years. In the meantime, when selecting an image rendering solution, beware of the fact that each vendor specifies performance differently, and some systems are optimized for large numbers of small polygons while others are optimized for small numbers of large polygons. In addition, some systems require the host computer to transform polygon vertices, which can become a bottleneck no matter how many polygons the graphics system can process.
Head-Mounted Displays
Evaluating the state-of-the-art for devices that deliver such performance from the frame buffer to the eyes is not as straightforward as projecting computer advances.
The most popular way to view VR environments is the head-mounted display (HMD). Conventional HMDs utilize a display source such as a cathode ray tube (CRT) or liquid crystal display (LCD) mounted on the user's head, with conventional glass or plastic optics used to form an optical image of the display that appears distant from the wearer. By using a separate display for each eye and slightly varying the horizontal position of each simulated object's image within the display field, a sense of depth, or stereoscopy results.
The performance of such HMDs is limited by both the display source and the optics. First, the human visual system provides us with a field-of-view approximately 180 degrees horizontal by 130 degrees vertical, over which the eye can resolve approximately 64 million pixels of information. Currently, small CRTs are limited to about 1.2 million pixels, while small LCDs are limited to about 300,000 pixels. This limited number of pixels can either be arrayed over a narrow field-of-view to provide minimally-immersive high-quality images, or may be spread out over a wider field-of-view to provide very immersive but low-resolution images.
While display sources (especially LCDs or analogous solid-state display devices) may or may not follow Moore's law and achieve reality-level resolutions in a decade, the optics for relaying photons from the display to the eye are not advancing as quickly. In fact, unless new laws of physics are discovered, conventional geometric optics simply cannot provide a 180 degree field-of-view from a small display in a form factor suitable for wearing on a human head. As a result, the current state-of-the-art in HMDs ranges from either a relatively wide field-of-view, high-resolution display costing $95,000 and so heavy that it must be mounted on an articulated boom, to a low-cost, attractive form factor device providing just 60,000 pixels over a 30 degree field of view.
Perhaps twenty companies currently offer HMDs for VR applications, with three example offerings shown in Figure 2.
Vendor & Model |
Resolution
|
FOV
|
Price
|
|
Fakespace
Boom 3C |
1280 x 1024 | 90o | $95,000 |
|
General Reality Co.
CyberEye 200 |
789 x 230 | 40o | $1,995 |
|
Virtual I/O
i-Glasses |
789 x 230 | 30o | $799 |
A key challenge for advancing VR is thus developing a new approach to the HMD, that does not depend on conventional optics. Perhaps the only active effort in this area is being pursued by the Human Interface Technology Lab (HITL) in Seattle, which is demonstrating HMD devices that scan a small laser across the user's retina at high-speed, and modulate the laser spot to generate an image directly on the retina. This approach, if successful and proven safe, could mitigate the current bottleneck in VR display performance within approximately five years.
3D Audio
Often considered as an afterthought, 3D audio has been shown to significantly increase a user's sense of immersion in the virtual environment. 3D, or spatialized audio, consists of sounds played through two speakers (or a pair of headphones), but which appear to come from any source direction desired by the virtual environment designer.
The conventional approach to generating 3D audio in virtual environments has been the use of a high-end spatial audio board such as the Convolvotron or Alphatron by Crystal River Engineering. These systems take conventional sound sources (such as a wave file), and generate spatialized audio in real time through methods such as acoustic raytracing of virtual rooms, addition of reverberation and doppler shifts, and delaying one ear's audio slightly to simulate an actual sound wave traversing across the user's head. For high-end applications, the user's head-related transfer function (HRTF, which represents subtle audio changes induced by head and ear shapes) can be incorporated in the analysis for even greater realism.
Over the past two years, several vendors have released low-end, software-driven audio spatialization technologies, which can now be utilized on delivery systems as low-cost as the popular Creative Labs Soundblaster board. In addition, Microsoft plans to incorporate software-only 3D spatialization designed by Crystal River in Windows 95. These low-end systems provide adequate spatialization for most applications, being limited primarily in the number of simultaneous sounds that can be spatialized and the lack of HRTF support.
As a result of the "consumerization" of 3D audio, spatialized audio should cease to be a significant barrier to most high-performance VR solutions within months.
Trackers
Before the computer can render an image corresponding to the user's instantaneous line-of-sight in the virtual environment, the computer has to know what that line-of-site is. This requires a three-degree-of-freedom (3DOF) tracking device attached to the HMD. If the user is also free to move about in the physical world and corresponding movements in the virtual world are required, then a six-degree of freedom (6DOF) tracker is needed. Additional trackers are often required if the user's hand or body positions are to be sensed and rendered within the virtual environment. Without experiencing it for yourself, it is difficult to appreciate how important seeing your own hands becomes when a sense of immersion in the virtual world is desired.
For any number of trackers and degrees of freedom, each tracker must perform its sensing, data processing, and output in a fraction of the time the graphics computer requires to render one frame of imagery, or else head tracking lag will be apparent to the user. Such lag shows up as "swimming" of the image when the user turns his/her head, and depending upon severity, effects can range from interactivity clumsiness to outright nausea.
To respond to this challenge, several tracking techniques are currently available. Historically, the most popular trackers use an active electromagnetic system, in which a fixed transmitter generates a pulsed magnetic field, and one or more small receivers sense changes in the field when they move through it. Such systems are provided commercially by Polhemus and Ascension, and provide 6DOF data at up to 100 frame/sec update rates. Unfortunately, such systems are priced at $1,000 and up, and are quite susceptible to jitter, plus interference from metallic items as seemingly unobtrusive as corner braces for wooden table legs. In addition, wireless versions are not yet available, resulting in inconvenient tethers between the computer and the user.
With the advent of VR-based point-of-view games such as "Rise of the Triad", many low-cost consumer HMDs and arcade systems have incorporated a lower-cost 3DOF tracking technique that requires no transmitter. This technique senses the earth's magnetic field using a fluxgate compass or 3-axis magnetometer to provide yaw data, and detects pitch and roll data by sensing the earth's gravitational field. The latter is performed using a liquid-filled capsule containing an electrode array that detects changes in the liquid angle. While these 3DOF systems are now manufacturable at low cost in high volume consumer-priced devices, most suffer from high lag times and spurious readings when the head is accelerated rapidly. For lower volume, higher-performance applications, General Reality Company offers the $850 CyberTrack(tm) 3DOF sourceless tracker, which avoids these problems by averaging 15,000 samples/sec before outputting data at a 30hz update rate.
A third tracking technology is currently under development using inertial devices such as solid-state gyroscopes and accelerometers. These devices show promise as a means to provide low-cost, wireless 6DOF tracking at very high update rates, but suffer from typical inertial device problems such as drift, hysteresis, cross-axis coupling, and temperature instability. Several groups are working to resolve these issues and provide a complete tracking solution, but none have yet demonstrated success, and no such VR trackers are yet commercially available.
Looking ahead, research has indicated that to effectively simulate reality, a total lag of approximately 40msec from head movement to image display is required. This 40msec budget must be distributed between tracking latency, model updating, and image rendering, and can only be approached once each of these three tasks can be performed at rates approaching 100 frames/sec.
Controllers
Given systems for tracking head/hand locations, rendering virtual environments at high speed, and displaying them at high-resolution and wide field-of-view, it becomes important to allow users to interact with objects residing in the simulated environment. Historically, this has been performed with a mouse, which is fundamentally a 2DOF control tool. Numerous attempts have been made to scale the mouse peripheral to 6DOF. A typical example is the Spaceball, which resembles a tennis ball mounted on a flexure capable of sensing forces in 6DOF. By grasping and applying forces to the ball, a cursor or viewpoint can be made to move about in 6DOF. Then, a momentary switch such as a mouse button can be used to select objects.
While such devices enable navigation and interaction within a 3D environment, they are obviously a long way from providing the natural, intuitive interface that characterizes the ultimate VR experience. The next step along this quest is undoubtedly the DataGlove(tm) finger-bend sensing glove, first pioneered in the late 1980's by VPL Research, and now available commercially from General Reality Company at prices as low as $495 per hand. Glove peripherals sense finger bend angles using fiber optics or strain gauges, and transmit the data to the host computer, which can then render a polygonal hand representative of the user's hand. When the user makes a gesture in the real world, the corresponding gesture is rendered in the virtual one. By adding collision detection capabilities to the software and mounting a tracker on the glove, hand movements and gestures can then be used to grasp virtual objects, rotate them, move them, and release them, thus creating a more natural means for interaction.
While low-cost, high-performance gloves are now a reality, significant obstacles remain. First, it is desirable to sense every motion possible in a real hand, which has never been done. Today, a $500 glove senses only gross bend of each of five fingers, while a $10,000 glove provides bend data for two of each finger's three joints and adds abduction (finger spreading). Meanwhile, another manufacturer provides a glove that senses only wrist motions, for disability rehabilitation applications.
Over the next five years, additional degrees of freedom can be expected within a single low-cost glove, while the obtrusiveness of early gloves will subside. In addition, entire body suits have been prototyped, and may emerge as commercial products. Further out, it would be desirable to sense human hand and body positions without encumbering the user with gloves or suits, but such techniques have yet to be developed to the commercial level.
Tactile & Force Feedback
Today's gloves also fail to mimic the force and tactile sensations a human perceives when grasping a real object. This tends to make virtual object control a less than satisfying experience, where even successfully grasping a virtual object can take several attempts. While alternative forms of feedback such as playing a tone to denote a collision are feasible, such workarounds do little to improve the intuitive nature of the control interface beyond the simple glove.
To address this issue, a number of researchers are exploring tactile and force feedback interfaces. These differ in that tactile feedback applies tiny forces over a small area (such as pressure and/or friction on a fingertip when an object is touched), while force feedback applies larger forces over larger parts of the body when objects are moved or grasped. A typical force feedback system resembles a robot arm, with computer controlled actuators operating to resist movement of a user's arm, and today costs several tens of thousands of dollars. One system on the horizon fits inside the palm of the user's hand, and provides resistance against closing the hand, in order to mimic an object being grasped.
Unfortunately, the area of tactile and force feedback is still in its infancy, and may never achieve the levels of fidelity projected for visual and audio displays. This is because force requires mechanical systems, and each force applied in each direction at each point of application requires not only an actuator, but a physical anchor opposing the force. Until the concept of a force field moves from science fiction to reality, generic-use force feedback for VR will therefore involve large, complex, ungainly, and potentially dangerous mechanisms. In the meantime, we can look for emergence of special-purpose commercial systems, such as feedback to two or three fingers for telesurgery applications.
Other Sensory Inputs
To completely simulate reality, a wide range of additional sensory inputs are desirable, including smell, taste, and heat. Example applications that require such inputs include fire fighting simulations (is the doorknob hot?) and telesurgery (does the patient's liver smell right?). Research is just beginning in these areas, and it is too early to predict when, if ever, useful, unencumbered systems will evolve.
A more critical near-term issue is acceleration sensing. The human body is quite adept at sensing accelerations generated by actions such as falling, turning, and running into objects. Today's motion simulators attempt to fool the body into accepting small amplitude motions as large ones, but it is well accepted that mismatches between the user's visual and haptic systems are a primary cause of simulator sickness. Unless a means for simulating gravity is invented, these mismatches will always prevent ultimate fidelity.
Virtual Reality Software
All of the processes and devices described above require software to operate, and the field of VR software is advancing rapidly.
Originally, VR software emerged from simulation software, in which a 3D model can be manipulated in real-time based on control device input, and then rendered in visual and audio displays for the user. A number of commercial software packages now routinely provide such capability for VR applications, although a lack of standards still leaves problems such as the need to write a new software driver for every peripheral/software combination. This leaves standards as a key challenge to advancing the state-of-the-industry, if not the state-of-the-art. Leading commercial applications include WorldToolKit (Sense8), Superscape (Superscape), and VRCreator (VREAM).
Remaining challenges in the software arena abound. For example, most current demonstrations do not include collision detection, which requires significant processing horsepower in all but the simplest environments. More challenging yet is real-time object behavior modeling, so that (for example) when an object is dropped, it falls at a rate governed by gravity and wind resistance, then deforms, bounces, or splats depending upon object composition, packaging, and mass. Additional challenges arise in arenas such as military simulation, where a complex object may need to explode not into its constituent polygons, but into many odd-shaped smaller objects with independent trajectories.
In addition to improving fidelity of model objects and behaviors, significant room exists to improve developer interfaces. Current development tools favor creation of either simple, unrealistic worlds using friendly GUI interfaces, or creation of complex, interactive, dynamic worlds using line-editor-based programmer interfaces. In the long term, combining these two approaches to enable the average computer user to create rich virtual environments is necessary, and will require incorporation of a great deal more artificial intelligence into VR software tools.
Communications
One of the key attractions of VR is the ability to interact with other humans within virtual environments. To do this requires two or more separate virtual environment simulations running on two or more computers, with an appropriate communications interface between them to transmit data such as telemetry. All communications must occur at a sufficiently high rate to avoid latency induced artifacts such as one person's image walking through another's before the computers realize that they have collided.
This area is now developing at an exploding rate, driven in large part by rapid growth of the Internet. Over the past year, a new standard called Virtual Reality Markup Language (VRML) has rapidly emerged to enable distribution of 3D object-based information over relatively low Internet bandwidths. With addition of dynamic and behavior information expected over the next couple of years, it is likely that the ability to naturally interact with others in computer-generated 3D environments is not far off.
While the above discussion is of necessity just a brief overview of the current and future state of the art, the promise of VR can best be appreciated by an even briefer overview of currently emerging applications.
The most widely publicized application for VR is entertainment, which today is spelled VIDEO GAMES. Because donning an HMD to immerse oneself in a simulated environment creates a new and exciting interface for gaming, 3D point-of-view games are now the fastest growing segment of the video game market, both at the consumer and arcade levels. As of Christmas 1995, a complete home game system including dedicated high-performance 300,000 poly/sec image rendering, HMD, and several game experiences can be had for under $1,000, with prices likely to drop by a factor of two during 1996. A key reason for gaming to emerge as the first major VR application is the relatively low fidelity required for entertainment versus higher-end commercial applications.
On the commercial side, a number of applications are now making the transition from demonstrations and prototypes to routine use in the course of day-to-day business. These include architectural walkthroughs, medical imagery visualization, factory worker training, immersive kitchen remodeling design, visualization of complex communications networks, and simulation of dismounted infantry exercises.
Conclusions
To summarize, the key component technologies of visual and audio VR have been evolving at a rapid and increasing rate, with recent emphasis on delivering basic but compelling experiences at consumer price points. As a result, effective tools are now available not just for games, but for delivering a wide range of valuable commercial applications. Many of these applications have been demonstrated and are now being commercialized at high price points, suggesting that continued advances in the state-of-the-art will quickly lead to high-volume distribution on cost-justifiable platforms. Advanced component technologies such as tactile/force feedback, smell, taste, and acceleration are not proven, so the Holodeck will likely have to wait for the next century to find widespread application.