4.1 Introduction
A broad, scattered set of interface designs and experimental results influenced the design of the props-based interface described in the previous chapter. My experiences from implementing the system and performing user observations suggest some general issues which span this set of research. The present chapter integrates these issues into a survey of free-space 3D input techniques. This survey also suggests some general research areas which interface designers and researchers have only begun to explore and which are in need of further work.
The design issues presented in this chapter are not scientifically demonstrated principles of design or ready-to-go solutions. Rather they are issues to be aware of and some different approaches to try. I explore some of these design issues, particularly those involving two-handed interaction, in the context of formal experiments described in subsequent chapters of this document, but the other design issues are supported only by possibly unrepresentative user observations. Nonetheless, this chapter serves as a useful guide for the community of designers and researchers who wish to explore spatial input techniques.
4.2 Understanding 3D space vs. experiencing 3D space
Anyone who has tried to build a stone wall knows how difficult it is to look at a pile of available stones and decide which stone will best fit into a gap in the wall. There are some individuals, such as experienced stone masons, who have become proficient with this task, but most people simply have to try different stones until one is found that fits reasonably well.
From a perceptual standpoint, one could argue that our difficulty in building stone walls, and in performing abstract 3D tasks in general, results from a distinction between creating and manipulating images (or objects) versus mentally conjuring images and mentally transforming them. For example, the Shepard-Metzler mental rotation study [151] suggests that for some classes of objects, people must mentally envision a rigid body transformation on the object to understand how it will look from different viewpoints; that is, humans must perceive the motion to understand the effect of the transformation.
Badler repeated the experiment with a real (as opposed to imaginary) object. He digitized a plastic spaceship and allowed the user to specify the virtual camera view of the corresponding wireframe spaceship by positioning and orienting the wand relative to the real-world plastic spaceship. With this single change, Badler's "consciously calculated activity" suddenly became "natural and effortless" for the operator to control.
4.4 Relative gesture vs. absolute gesture
In Galyean's 3D sculpting interface [61], the user deforms a 3D model by positioning a single tracker in an absolute, fixed volume in front of a monitor. This leads to an interface which is not entirely intuitive. Galyean reports that "controlling the tool position is not easy. Even though the Polhemus pointer is held in a well-defined region, it is often difficult to correlate the position of the pointer in space with the position of the tool on the screen."
Compare this to Sachs's 3-Draw computer-aided design tool [141], which allows the user to hold a stylus in one hand and a palette in the other (both objects are tracked by the computer). These tools serve to draw and view a 3D virtual object which is seen on a desktop monitor. The palette is used to view the object, while motion of the stylus relative to the palette is used to draw and edit the curves making up the object.
3-Draw's use of the stylus for editing existing curves and Galyean's use of the "Polhemus pointer" for deforming a sculpture represent nearly identical tasks, yet the authors of 3-Draw do not report the difficulties which Galyean encountered. This difference may result from the palette-relative gesture employed by 3-Draw, as opposed to the abstract, absolute-space gesture required by Galyean's sculpting interface. As Sachs notes, "users require far less concentration to manipulate objects relative to each other than if one object were fixed absolutely in space while a single input sensor controlled the other" [141].
Thus, users may have trouble moving in a fixed, absolute coordinate frame. A spatial interface could instead base its interaction techniques upon relative motion, including motion relative to a spatial reference or the user's own body.
4.5 Two-handed interaction
Enabling the use of both hands can allow users to ground themselves in the interaction space; in essence the user's own body becomes a spatial reference. Regarding two-handed interaction in free space, Sachs observes that "the simultaneous use of two [spatial input] sensors takes advantage of people's innate ability--knowing precisely where their hands are relative to each other" [141]. For example, during informal user observations of a virtual reality interface, our user interface lab noted that users of two-handed interaction are less likely to become disoriented versus users who interact with only one hand [130]. Even when manipulating just a single object in 3D, using two hands can be useful and natural: in a classic wizard-of-oz experiment, Hauptmann [76] observed test subjects spontaneously using two hands for single-object translation, rotation, and scaling tasks.
Based on an analysis of human skilled bimanual action [67] Guiard has proposed an insightful theoretical framework and principals governing two-handed manipulative action, as discussed in section 2.6.1 on page 31. Note, however, that the application of Guiard's principles to bimanual interface design have not been formally demonstrated, and may also represent an incomplete set of conditions for usable two-handed interfaces. For example, Kabbash [95] describes a two-handed interface (the "palette menu") where the user moves an opaque menu using a trackball in the left hand and a selection cursor using a mouse in the right hand. Although this interface apparently conforms to Guiard's principles, Kabbash's results suggest that the palette menu interface may induce a cognitive load. Because the palette is opaque, it occludes the context of the working area, and this apparently causes users to be uncertain of what strategy to use when placing the palette. The transparent menu used by the two-handed ToolGlass [15] technique, which Kabbash also analyzed, does exhibit this problem.
4.5.1 Working volume of the user's hands
Guiard's observations of subjects performing writing tasks [67] as well as my own observations of the props-based interface [80] suggest that people tend to move their hands in a surprisingly small working volume. This volume is not only small, but also tends to move over time as the user changes body posture. For example, Guiard's analysis of handwriting tasks suggests that the writer tends to define an active volume relative to his or her non-dominant hand. Guiard also reports that "the writing speed of adults is reduced by some 20% when instructions prevent the nonpreferred hand from manipulating the page" [67]. This suggests that users of a spatial interface which requires movements relative to a fixed frame-of-reference in their environment may experience reduced task performance due to cognitive load, fatigue, or both.
4.6 Multisensory feedback
A key challenging facing spatial interaction is identifying aspects of the proprioceptive senses that designers can take advantage of when interacting in real space. Interacting with imaginary, computer-generated worlds can easily bewilder users; presumably, providing a wide range of sensory feedback might help the user to more readily perceive their virtual environment. Psychologist J. J. Gibson has long argued that information from a variety of feedback channels is crucial to human understanding of space [62].
Brooks [19] discusses interfaces which employ multisensory feedback techniques, including force feedback [21][90][120], space exclusion (collision detection), and supporting auditory feedback. I add physical manipulation of tools with mass to these techniques.
For example, in our user interface lab [173] we have experimented with a virtual reality interface for positioning a virtual flashlight using a glove, which users can use to grab and position the virtual flashlight. However, during public demo sessions, we found that users have inordinate difficulty grasping and manipulating the virtual flashlight using the glove. By replacing the glove with a tracked physical flashlight, we found that users could position the virtual flashlight with ease. For this application, physical manipulation of a flashlight worked well, while glove-based manipulation of a virtual flashlight was a disaster.
For example, Schmandt describes an interface for entering multiple layers of VLSI circuit design data in a 3D stereoscopic work space [142]. The user enters the data by pressing a stylus on a stationary 2D tablet; the user can adjust the depth of the image so that the desired plane-of-depth lines up with the 2D tablet. Versions of the interface which constrained the 3D stylus position to lie on grid points via software mapping were less successful; the physical support of the tablet proved essential. Other useful 2D constraining surfaces include the physical surface of the user's desk, the glass surface of the user's monitor, or even a hand-held palette or clipboard.
4.8 Head tracking techniques
In a non-immersive spatial interface, desktop-based head tracking can allow the interface to "give back" some of the information lost by displaying 3D objects on a flat display, via head motion parallax depth cues. Previous research [118][46][178][111] discusses the advantages of head tracking and the implementation issues. An additional user study [131] shows performance improvement for a generic search task using an immersive head-tracked, head-mounted display vs. a non-head-tracked display.
4.9 Related versus independent input dimensions
The Jacob and Sibert study [91] compares user performance for two tasks: the first asks the user to match (x, y, size) parameters of two squares, while the second task requires matching (x, y, greyscale) parameters of two squares. Both tasks require the control of three input dimensions, but Jacob reports that user task performance time for the (x, y, size) task is best with a 3D position tracker, while performance for the (x, y, greyscale) task is best with a mouse (using an explicit mode to change just the greyscale).
Jacob argues that the 3D tracker works best for the (x, y, size) task since the user thinks of these as related quantities ("integral attributes"), whereas the mouse is best for the (x, y, greyscale) task because the user perceives (x, y) and (greyscale) as independent quantities ("separable attributes"). The underlying design principle, in Jacob's terminology, is that "the structure of the perceptual space of an interaction task should mirror that of the control space of its input device" [91].
This result points away from the standard notion of logical input devices. It may not be enough for the designer to know that a logical task requires the control of three input parameters (u, v, w). The designer should also know if the intended users perceive u, v, and w as related or independent quantities. In general it may not be obvious or easy to determine exactly how the user perceives a given set of input dimensions.
4.10 Extraneous degrees of freedom
Most spatial input devices sense six dimensions of input data, but this does not mean that all six dimensions should be used at all times. If, for example, the user's task consists only of orienting an object, it makes little sense to allow simultaneous translation, since this only makes the user's task more difficult: the user must simultaneously orient the object and keep it from moving beyond their field of view. Extraneous input dimensions should be constrained to some meaningful value; designing an interface that is useful does not necessarily require realistically imitating the behavior of objects in the physical world.
4.11 Coarse versus precise positioning tasks
In two dimensions, the direct manipulation paradigm allows rapid, imprecise object placement. But to perform useful work in the context of a complex application such as a document editor, direct manipulation often needs to be constrained by techniques such as gridding or snap-dragging [13][14]. Corresponding three-dimensional constraint techniques and feedback mechanisms need to be developed.
Users may have difficulty controlling an interface which requires simultaneous, precise control of an object's position and orientation. The biomechanical constraints of the hands and arms prevent translations from being independent of rotations, so rotation will be accompanied by inadvertent translation, and vice versa. Even in the real world, people typically break down many six degree-of-freedom tasks, such as docking, into two subtasks: translating to the location and then matching orientations [21].
The design hurdle is to provide an interface which effectively integrates rapid, imprecise, multiple degree-of-freedom object placement with slower, but more precise object placement, while providing feedback that makes it all comprehensible. As Stu Card has commented, a major challenge of the post-WIMP interface is to find and characterize appropriate mappings from high degree-of-freedom input devices to high degree-of-freedom input tasks.
4.12 Control metaphors
Ware [177] identifies three basic control metaphors for 3D interaction:
Eyeball-in-hand metaphor (camera metaphor): The view the user sees is controlled by direct (hand-guided) manipulation of a virtual camera. Brooks has found this metaphor to be useful when used in conjunction with an overview map of the scene [18][19].
Scene-in-hand metaphor: The user has an external view of an object, and manipulates the object directly via hand motion. Ware suggests this metaphor is good for manipulating closed objects, but not for moving through the interior of an object [177].
Flying vehicle control (flying metaphor): The user flies a vehicle to navigate through the scene. Ware found flying to be good for navigating through an interior, but poor for moving around a closed object [177]. Special cases of flying include the "car driving metaphor," as well as the "locomotion metaphor," where the user walks through the scene [18].
A fourth metaphor can be appended based on subsequent work:
The selection of an appropriate control metaphor is very important: the user's ability to perform 3D tasks intuitively, or to perform certain 3D tasks at all, can depend heavily on the types of manipulation which the control metaphor affords. Brooks addresses this issue under the heading "metaphor matters" [19].
4.13 Issues in dynamic target acquisition
The term dynamic target acquisition refers to target selection tasks such as 3D point selection, object translation, object selection, and docking. There are several issues related to dynamic target acquisition tasks:
Other example uses of transparency to aid target acquisition include use of a 3D cone for object selection [111], use of a semi-transparent tool sheet in the Toolglass interface [15], or the use of the semi-transparent cutting plane in the props interface [80].
The 3D points selectable by casting a ray are constrained to lie on the surface of virtual objects in the scene. In many circumstances this is exactly what is desired. If it is necessary to select points on objects which are inside of or behind other objects in the scene, the ray casting can be augmented with a mechanism for cycling through the set of all ray-object intersection points. For disconnected 3D points, 3D snap-dragging techniques [14] can be used if the disconnected points are related to existing objects in the scene. If the disconnected points are on the interior of objects, ray casting can be combined with a "cutting plane" operator, which is used to expose the interior of the objects [80][111].
Digitizing points on the surface of a real object is an instance where ray casting may not be helpful. In this case, the real object provides a spatial reference for the user as well as physical support of the hand; as a result, direct 3D point selection works well [129].
4.13.3 Cone casting versus ray casting
For gross object selection, ray casting may become less appropriate, especially if the object may be distant. One could alternatively use a translucent 3D cone to indicate a region of interest; distance metrics can be used to choose the closest object within the cone. Note that "spotlighting" visual effects afforded by many graphics workstations can provide real-time feedback for this task. An implementation of this strategy is reported by Liang [111].
4.14 Clutching mechanisms
Most spatial interfaces incorporate some type of clutching mechanism, that is, a software mode which allows the spatial input device to be moved without affecting the 3D cursor. In my experience, some of the most confounding (for the user) and hard-to-fix (for the implementor) usability problems and ergonomic difficulties can arise due to poor clutch design. Section 3.8 of the previous chapter discusses some of the issues I have encountered in designing clutching mechanisms.
4.14.1 Recalibration mechanisms
At a low level, all spatial input devices provide the software with an absolute position in a global coordinate frame. The user interface should provide a recalibration mechanism for mapping this absolute position to a new logical position, which allows the user to specify a comfortable resting position in the real world as a center point for the interaction space. There are at least three basic recalibration strategies:
Ratcheting: Many spatial interfaces (e.g. [41], [176]) utilize the notion of ratcheting, which allows the user to perform movements in a series of grab-release cycles. The user presses a clutch button, moves the input device, releases the clutch button, returns his or her hand to a comfortable position, and repeats the process.
Continuous: In some cases recalibration can be made invisible to the user. For example, in a virtual reality system, when the user moves his body or head, the local coordinate system is automatically updated to keep their motions body-centric. Another example is provided by my props-based interface, where the nonpreferred hand is used to define a dynamic frame-of-reference relative to which other tools may be moved with the dominant hand, as discussed in section 3.6.2 ("The natural central object") of the previous chapter.
These strategies can be composed. In a virtual reality application, for instance, the position of the hands will be continuously recalibrated to the current position of the head, but an object in the virtual environment might be moved about via ratcheting, or brought to the center of the user's field of view by a homing command.
4.15 Importance of ergonomic details in spatial interfaces
Manipulating input devices in free space can easily fatigue the user. The designer of a spatial interface must take special pains to avoid or reduce fatigue wherever possible. A poor design risks degraded user performance, user dissatisfaction, and possibly even injury to the user. Some issues to consider include the following: