[Top] [Prev] [Next] [Bottom]

Hitching our research to someone else's driving problems, and solving those problems on the owners' terms, leads us to richer computer science research.

Fred Brooks [20]



Chapter 3

System Description


3.1 Overview

I describe a 3D neurosurgical visualization system which incorporates a 3D user interface based on the two-handed physical manipulation of hand-held tools in free space. These user interface props facilitate transfer of the neurosurgeon's skills for manipulating tools with two hands to the operation of a user interface for visualizing 3D medical images, without need for training.

From the surgeon's perspective, the interface is analogous to holding a miniature head in one hand which can be "sliced open" or "pointed to" using a cutting-plane tool or a stylus tool, respectively, held in the other hand. The interface also includes a touchscreen which allows facile integration of 2D and 3D input techniques. Informal evaluations of over fifty neurosurgeons, as well as hundreds of non-neurosurgeons, have shown that with a cursory introduction, users can understand and use the interface within about one minute of touching the props.

3.2 The application domain: neurosurgery and neurosurgeons

Neurosurgeons are driven by a single goal: deliver improved patient care at a lower cost. While improving quality of care and reducing costs might seem to be at odds, in practice one can achieve both ends by reducing the time required to perform surgery. Operating room time itself is of course very expensive. But more importantly, the longer a patient's brain is exposed during a procedure, the greater the chance for expensive and life-threatening complications.

The key to reducing operating room time is superior surgical planning. Neurosurgery is inherently a three-dimensional activity; it deals with complex structures in the brain and spine which overlap and interact in complicated ways. To formulate the most effective surgical plan, the neurosurgeon must be able to visualize these structures and understand the consequences of a proposed surgical intervention, both to the intended surgical targets and to surrounding, viable tissues.

A certain class of procedures known as stereotaxy, which use a metal stereotactic frame bolted to the patient's skull, are particularly amenable to computer-assisted surgical planning because the frame provides a known and fixed reference coordinate system. The frame also serves to guide surgical instruments to the target with millimeter accuracy. For procedures which cannot use a frame, such as aneurysm or blood vessel surgeries, the surgeon typically views the surgical theatre through a microscope while operating directly on the brain with hand-held tools.

3.2.1 Traditional practice

Traditionally, neurosurgeons have planned surgery based on 2D slices, acquired through scanning techniques such as Computed Tomography (CT) and Magnetic Resonance Imaging (MRI). This restriction to 2D slices is not necessarily by preference. MRI is acquired as 3D volumetric data, and its presentation as a set of 2D slices is an artifact of limited computer technology. Even CT is usually acquired as multiple slices that are closely spaced, and thus often can be treated as volumetric data.

The 2D slice paradigm has imposed a further restriction. The images are typically restricted to appear in planes orthogonal to canonical axes through the patient's head. These orthogonal planes are known as the sagittal, coronal and axial planes and are shown in figure 3.1, reading clockwise from the top. The orthogonal planes form the frame of reference in which medical students learn their anatomy, and as such they are the planes in which physicians can best understand and reason about the anatomy. But many structures within the brain, and many surgical paths to these structures that are clinically useful, are oblique to these canonical views. For example, to reach a deep-seated structure, the neurosurgeon might follow a fold of the brain to reduce the amount of transected brain tissue. Such an approach is often not possible using the canonical planes.

Even for experienced neurosurgeons, the development of a three-dimensional mental model from 2D slices of the anatomy remains a challenging task. Traditionally, following an oblique trajectory to a target has been risky since it has been difficult or impossible to produce appropriate visualizations. This is why visualization of oblique slices is so important: it is difficult to understand the anatomy at oblique angles, so the surgeon wants to be able to see these views and relate them back to the more familiar canonical views.

Figure 3.1 The canonical sagittal, coronal, and axial orthogonal planes.

3.2.2 Computer-assisted surgery

Neurosurgeons have recently become increasingly interested in computer-based surgical planning systems which allow them to quantify and visualize the three-dimensional information available from medical imaging studies. By making better use of this three-dimensional information, and by allowing the surgeon to quickly and intuitively access it, computer-based visualization and planning systems can positively impact both cost-of-care and patient outcome.

A typical computer-assisted surgery consists of the following elements:

Stereotactic frame placement: The stereotactic frame is bolted to the patient's skull prior to imaging, to ensure that it stays rigidly fixed in place during imaging and surgery. At the University of Virginia, we use the Leksell stereotactic frame [109]. The frame serves two main purposes: it provides a coordinate system for the patient's head, and it carries mechanical guides that serve as a rigid mounting platform for surgical instruments. During medical image acquisition, the frame is fitted with a system of fiducial markers. These markers form a distinctive pattern in the digital patient images which can be used to calculate a transformation between the digital image coordinates and the stereotactic frame coordinates, allowing transfer of the computer-based surgical plan to the operating room.

Medical image acquisition: Our neurosurgeons principally depend upon MRI (fig. 3.2, left) for anatomic information due to its high soft tissue contrast and its capability for true 3D data acquisition. When necessary, the surgeon may also request digital subtraction angiography images (fig. 3.2, center), which detail the vascular structures (veins or arteries) within the brain, or CT (fig. 3.2, right), which is sometimes needed for bony detail or visualization of calcifications. In general, however, acquiring multiple image modalities is expensive, time-consuming, and stressful for the patient, so the surgeon will only order the minimum necessary set of image modalities.

Figure 3.2 Example MR, angiography, and CT images.

Image segmentation and classification: We employ computer algorithms to identify the surface and major structures of the brain, to delineate pathology, and to identify major blood vessels [92][97][157][158][159]. Robust identification of these structures is often helpful during visualization and is necessary for quantitative evaluations (such as tracking the volume of a tumor over time). Manual approaches are possible, but are time-consuming and error prone, making them impractical for surgical planning.

Pre-surgical planning: The user interface discussed in this dissertation focuses on the pre-surgical planning phase, which usually takes place on the morning of surgery. Planning is typically done in a separate surgical planning room rather than in the operating room itself. To develop a plan, the surgeon uses visualization, measurement, and other planning tools to select the surgical targets and to select a path to those targets that produces the least possible damage to viable tissue. To plan the best possible trajectory, the surgeon needs to understand the spatial relationships between the targets, surrounding structures, functional areas which must not be damaged, and feasible entry points. Feasible entry points must avoid cortical vessels or skin surface areas which are inaccessible due to mechanical limitations of the stereotactic frame. Visualizations of the cortical surface, the stereotactic frame, proposed surgical trajectories, and volume cross-sections at both orthogonal and oblique angles can all help the neurosurgeon to make informed decisions. These visualizations also serve as visual checks that there has not been human or software-generated errors during imaging, frame registration, or image segmentation. For example, if the visualization of the stereotactic frame does not line up with the registration markers in the image, then the registration is clearly incorrect.

Surgery: Finally, the surgery itself is ready to proceed. At this point, the surgeon has carefully studied the intended target(s) and the surgical trajectory, so the main difficulty is accurate navigation along the selected trajectory. Intra-operative guidance for surgical navigation is a current research area [48][106], but not a topic of this dissertation. In some rare cases, the surgeon may elect to modify the plan intra-operatively based on information which was not visible in the medical images.

3.2.3 Some system requirements

All the activity leading up to the surgery itself must occur during a span of approximately 3-4 hours on the morning of surgery1. To be clinically useful, a computer-based surgical planning system must be able to produce all of its results within this time window. Since the principal neurosurgeon is extremely busy and may be providing care for several other patients, the actual time available for planning may be as little as fifteen minutes for the more straightforward cases.

Thus the user interface for a neurosurgical planning and visualization system must permit the surgeon to work quickly: the morning of surgery is perhaps the least optimal time for a surgeon to be fussing with a maze of slider bars and command prompts. Also, the surgeon must cope with frequent distractions, and therefore must be able to quickly detach from the user interface, both physically and cognitively. Thus, the interface must not encumber the surgeon with devices such as gloves or head-mounted displays that will be difficult to remove, and it must not have explicit modes that are easily forgotten during a phone call or a discussion with a colleague.

Software usability is crucial to get neurosurgeons to actually use advanced visualization software in the clinical routine. I have sought to design interaction techniques which facilitate use of the software by surgeons, without need for technical assistance. The manipulative capabilities of input devices such as mice and keyboards are poorly matched to the volumetric manipulation and visualization tasks of interest to the neurosurgeon. Rather than typing in commands or moving sliders with a mouse, the neurosurgeon thinks in terms of real objects in real space; a three-dimensional user interface should allow the neurosurgeon to work and think in these same terms. As one surgeon put it, "I want a skull I can hold in my hand."

Our laboratory has worked closely with the neurosurgeons at the University of Virginia throughout the design of our 3D neurosurgical planning system and the associated 3D user interface. My work on the user interface, in particular, has necessarily been heavily collaborative, relying on the advice and opinions of neurosurgeons to provide goals and specifications throughout the design process.

Note that the neurosurgeon's existing visualization and planning tools are almost exclusively two-dimensional. This is an artifact of historical technological limitations rather than preference; computer-assisted three-dimensional surgical planning can allow neurosurgeons to view and explore the individual patient's anatomy in ways that previously have not been possible. Our laboratory's initial clinical experience suggests that computer-assisted three-dimensional surgical planning can allow surgeons to approach old problems in new and more efficient ways, and to treat borderline cases that may have been considered too risky to treat with traditional techniques.

3.3 System design philosophy

In our everyday lives, we are constantly confronted with tasks that involve physical manipulation of real objects. We typically perform these tasks with little cognitive effort, with both hands [27], and with total confidence in our movements. For many applications, a three-dimensional user interface should offer equally facile interaction.

I propose a 3D interface which permits the user to manipulate familiar objects in free space. These passive interface props act as tools which help users reason about their tasks. With six degree-of-freedom magnetic trackers [137] unobtrusively embedded within the props, the computer can observe the user's gestures. This results in a human-computer dialog where the system watches the user [132], in contrast to the traditional approach where the user generates input tokens in a contrived dialog.

picture of user holding props

Figure 3.3 User selecting a cutting-plane with the props.

An interface which requires the neurosurgeon to wear an instrumented glove and make grabbing gestures to manipulate imaginary objects would not offer this style of interaction. No matter how realistic the on-screen graphics are, the user does not experience the visceral kinesthetic and tactile feedback which comes from grasping a real-world object. When the user holds a physical tool, he or she has passive haptic feedback to guide the motion of the hand, allowing all the degrees-of-freedom of the fingers, thumb, and palm to participate in the manipulation of the tool.

Compared to "3D widgets" [43][77][78] (as shown in figure 2.1 on page 15) a props-based interface offers several advantages. With props, there is no need to make a widget's behavior explicit or to make the user realize the widget is an active interface component. The appearance of the props indicates their use and their palpability makes users immediately and continuously aware they exist. Drawing a widget without cluttering the scene becomes trivial, since there is no widget. Also, for casual users such as surgeons, manipulating a real tool is familiar and natural, whereas an abstract widget, no matter how well designed, is not. Instead of having to expend cognitive effort on the basic acts of manipulating objects, my approach allows the user to employ the normal perceptual and motor systems to manipulate objects, so that the user can properly focus on the intellectually challenging portion of the task.

In the domain of neurosurgical planning and visualization, the props-based interface has proven successful and has elicited enthusiastic comments from users. Approximately 50 neurosurgeons have tried the interface, representing about 1% of the total user population of 4,000 neurosurgeons in the United States2. With a cursory introduction, neurosurgeons who have never before seen the interface can understand and use it without training.

3.4 Real-time interaction

Without the ability to render and manipulate images of the brain in real time, my approach to the interface would be infeasible. The system software has been designed to achieve high performance: typical interactive update rates are approximately 15-18 frames per second3. A frame rate of 10 frames per second is usually considered to be the minimum update rate at which humans can still fuse separate frames into apparent motion [38].

During each frame, the system renders a simplified brain surface representation consisting of approximately 9,000 polygons and displays a volumetric cross-section from data which typically consists of 256 x 256 x 128 voxels (volume elements), each 2 bytes wide, for a total of 16 megabytes of volume data. To support selection of points on the brain surface, during each frame the system must also compute the intersection of a ray with the 9,000 polygon brain surface. And of course, the system must also communicate with the actual input devices, calculate matrix transformations, and compute geometrical relationships as a precursor to rendering the actual graphics for a frame.

3.5 Props for neurosurgical visualization

3.5.1 Viewing patient data with a head prop

The surgeon uses a head prop to manipulate the individual patient's head data. The prop is a small doll's head which can be held comfortably in one hand. I have tried head props of various sizes: if the input device is too small, it is easy to drop it by mistake, and the connecting cable significantly impedes rotating the device. If the input device is too large, it becomes difficult to tumble with the fingers of one hand. The right size seems to be roughly 2.0 to 2.5 inches in diameter.

I have also tried using a small rubber ball, but users prefer the doll's head because it is much richer in tactile orientation cues. The orientational cues help users to better understand what the input device does, and suggests appropriate behavior for three-dimensional manipulation: people's first instinct is to roll a ball on the desk, but they will pick up the head. The doll's head itself also provides a certain amount of "marketing appeal" and serves as a memorable icon for the interface.

The doll's head prop is an absolute rotation controller: rotating the doll's head always causes a polygonal model of the patient's brain to rotate correspondingly on the screen. The user can control the image zoom factor by moving the prop towards or away from his or her body. Note, however, that the software does not in fact know where the user is sitting, so the zoom factor is actually based on the distance between the doll's head and the front of the screen. Also, since the angle subtended by the virtual object on the screen grows geometrically with linear translation towards the user, the virtual distance moved is actually a log transform of the physical front-back translation. Without this log transform, a front-back translation near the screen produces almost no effect on the zoom factor, whereas a translation near the user's body suddenly shoots the zoom factor beyond a useful range.

The doll's head provides only four degrees-of-freedom: three degrees-of-freedom for rotation plus one degree-of-freedom for the zoom factor. In the context of surgical visualization, moving the object left-right or up-down is typically not useful, so it is helpful to constrain the polygonal brain to appear at the center of the screen. This simplifies the task and users find it natural.

The original interface design envisioned a realistic skull-shaped prop, but retreated from this approach for the following reasons:

3.5.2 Slicing the patient data with a cutting-plane prop

The surgeon can also employ a cutting-plane prop to specify the position and orientation of an arbitrary slice through the patient's anatomy. The prop itself is a rectangular plate with a housing for the tracker (fig. 3.4, left). Users can spread their fingers across the plate to get a direct haptic sense of how it is oriented in space. The appearance of the cutting-plane prop differentiates it from the head prop and makes its purpose immediately obvious.

Note that the cutting-plane prop is used in concert with the head prop rather than as a separate tool. The user holds the cutting-plane against the head to indicate a slice through the brain data. The computer shows a corresponding virtual tool intersecting the virtual head, along with a cross-section of the volumetric head data (fig. 3.4, right). The reader can easily approximate this interface. Seat yourself in a chair with armrests. Grasp a ball in one hand and a small book in the other. While supporting your elbows with the armrests, hold the book up to the ball, and orient each as deemed necessary. This is all that the interface requires for 3D manipulation.

Figure 3.4 User indicating a cross-section.

There are three distinct clinical uses for the cutting-plane prop as I have implemented it:

Volume Exploration: The user can interactively sweep the cutting plane through the volume. Because of the interactive update rate, users can quickly develop a sense of the objects embedded in the volume by sweeping the plane back and forth. For example, structures which are difficult to visualize when viewing orthogonal slices can now be easily found and inspected: figure 3.5 shows a user moving the cutting-plane prop, over a period of a few seconds, to expose the optic nerves.

Volume Dissection: Once the plane is selected, a portion of the volume can be permanently cut away. I have implemented a version of the system with texture-mapping hardware which allows the polygonal object to be "capped" with a texture map of the exposed data. The texture map affords selection of a further cross-sectioning plane which passes through structures revealed by the initial cut, as further discussed further in section 3.7.2 of this chapter.

Measuring Distances: A grid pattern on the computer rendering of the plane can be used as a ruler. I had not anticipated that the cutting-plane prop could be used as a ruler, but much to my surprise some neurosurgeon test users started employing it for this purpose. When the user manipulates real objects in real space, new or unusual ideas can readily be expressed; the user is not artificially bound by an abstraction or a metaphor.

Figure 3.5 User positioning the cutting-plane prop along an oblique plane.

To provide visual correspondence, the virtual representation of the cutting-plane prop mirrors all six degrees-of-freedom of the physical tool. But several of these degrees-of-freedom do not affect the cross-section of the object, because (mathematically) the resulting plane has only four degrees of freedom. For example, rotation about the axis normal to the cutting-plane does not affect the cross section. Similarly, if the tool is moved left-to-right or front-to-back in the current plane, this does not affect the resulting plane equation. In this regard, the cutting-plane prop acts a bit like a flashlight: the virtual plane is much larger than the physical cutting-plane prop, so when one holds the input device to the side of the doll's head, on the screen the plane still virtually intersects the brain, even though the two input devices don't physically intersect.

The question of exactly how many degrees-of-freedom the user is manipulating at once is somewhat ill-defined. It can be as many as 12 or as few as 8, depending on how one likes to count. As mentioned above, the visual representation of the plane moves in 6 degrees of freedom, but the resulting cross-section has only 4 degrees of freedom. The doll's head has at least 4 degrees of freedom (rotation plus the zoom factor), but in fact all 6 degrees of freedom of the doll's head are used to compute the mapping of input device motion to virtual object motion (as described further in section 3.6.2). Thus, all 12 degrees of freedom from both input devices influence the display, but mathematically only 8 degrees of freedom are important.

The virtual representation of the cutting-plane prop is a semi-transparent rectangle. The transparency helps users to acquire a desired target: it provides a simple occlusion cue while maintaining the context of what is in front of or behind the plane [188]. This task is much more difficult if the plane is opaque.

3.5.3 Indicating surgical paths with a trajectory prop

The trajectory selection prop is a stylus-shaped tool (fig. 3.6) that allows the surgeon to specify 3D vectors and points. Moving the trajectory prop relative to the head prop specifies the position and orientation of a cylindrical virtual probe relative to the polygonal brain model. In previous work, Chung has implemented an interface for a similar task (radiotherapy treatment planning) using a head-mounted display, but his results were inconclusive: using a head-mounted display to select the trajectory of the radiotherapy beam did not have clear task performance advantages over hand-guided rotation, which is more similar to my approach.

Figure 3.6 User selecting a trajectory.

In neurosurgery, a trajectory is defined as a three-dimensional path from the exterior of the head to a surgical target inside the brain. A linear trajectory is adequate for most cases, but occasionally a nonlinear trajectory is required to avoid vasculature or healthy brain tissue. Our laboratory's surgical planning software does not currently support nonlinear trajectories, nor does the props-based interface.

A linear trajectory consists of a target point inside the brain and a vector to that point. The trajectory prop indicates the vector by its orientation relative to the head prop. The target of the trajectory is indicated by the intersection of a ray cast from the virtual probe and the brain model's surface. Points which lie on the interior of the brain model can be selected by first bisecting the volume with the cutting plane to expose the contents of the volume, and then selecting a point on the exposed surface. Note that in this case the plane not only exposes the interior of the data, but it also expresses constraint of the point indicated by the trajectory prop to a plane, without requiring an explicit mode to do so.

3.6 Two-handed interaction

Guiard proposed that humans use the preferred and nonpreferred hands to control frames of reference which are organized in a hierarchy. For right handers, the left hand specifies a base frame of reference relative to which the right hand expresses a second active frame of reference. The props interface assigns the base frame of reference to the doll's head and the active frame of reference to the cutting plane. Since the neurosurgeon's task is to specify a cutting plane relative to a particular desired view of the brain, the interface's frames-of-reference assignment matches the surgeon's mental model of the task, resulting in an easily understood two-handed interface.

During the early stages of the interface design, I felt some concern that users might not be able to effectively control the four degrees-of-freedom provided by the doll's head using only their "weak" hand. In practice, however, informal evaluations have confirmed that the non-dominant hand is well suited to this task. The nonpreferred hand is not merely a poor approximation of the preferred hand, but can bring skilled manipulative capabilities to a task [94], especially when it acts in concert with the preferred hand.

Traditionally, two-handed input has been viewed as a technique which allows the user to save time by performing two sub-tasks in parallel [27]. For 3D input, however, two-handed interaction may be of even greater importance. Most everyday manipulative tasks, such as peeling an apple or cutting a piece of paper with scissors, involve both hands [67]. Previous work [76] has also shown that people often naturally express spatial manipulations using two-handed gestures. Based on my user observations and design experience, I can suggest some additional potential advantages for using two hands in 3D:

3.6.1 Two-handed input and the task hierarchy

One might argue that using two hands to operate the interface only adds complexity and makes an interface harder, not easier, to use-- after all, it is difficult to "rub your head and pat your stomach at the same time." Rubbing your head and patting your stomach are independent subtasks which bear no relation to one another. There are many compound tasks, however, such as navigation and selection in a text document or positioning and scaling a rectangle, which users perceive as integral attributes [91] that are aspects of a single cognitive chunk [28]. When designed appropriately, a two handed interface for integral compound tasks does not necessarily impose a cognitive burden, and can help users to reason about their tasks.

Figure 3.7 illustrates how the props-based interface simplifies the compound task of selecting a cutting-plane relative to a specific view of the polygonal brain. Cutting relative to a view consists of two sub-tasks: viewing and cutting. Viewing can further be subdivided into orienting the brain and specifying a zoom factor, and so forth. At the lowest level, there are ten separate control parameters (yaw, pitch, roll, and zoom for the view; x, y, z, yaw, pitch, and roll for the cutting tool) that can be specified. In a sliders or knob-box implementation of this interface, the user would have to perform ten separate one-dimensional tasks to position the cutting plane relative to a view, resulting in a user interface which is nearly impossible for a surgeon to use. Using the props with both hands, however, reduces this entire hierarchy into a single transaction (cognitive chunk) which directly corresponds to the task that the user has in mind. As a result, the user perceives the interface as being much easier to use.

Figure 3.7 Task hierarchy for selecting a cut relative to a specific view.

This framework, suggested by Buxton's work on chunking and phrasing [28], is useful for reasoning about the differences between one and two-handed interfaces. With a unimanual interface, View and Cut would always have to be performed as purely sequential subtasks. There is also the need to switch back and forth between viewing and cutting, so this implies a third sub-task, that of changing modes. Changing modes might involve acquiring another input device, speaking a voice command, or moving the mouse to another region of the screen -- the exact interface is irrelevant to this discussion -- but all of these mode switching techniques take a non-zero amount of time. This process can be modelled as a simple state diagram (fig. 3.8).

Figure 3.8 State diagram for unimanual subtasks.

A two-handed interface changes the syntax for this task. Under bimanual control, a new meta-task with a single Cut Relative to View state becomes possible. The simultaneous Cut Relative to View task is not the same thing as the serial combination of the sub-tasks. The simultaneous task allows for hierarchical specialization of the hands, and there is no cost (or need) to switch between View and Cut subtasks. Thus, there is the potential for bimanual control to impact performance at the cognitive level: it can change how users think about the task. Since the View and Cut subtasks can be integrated without cost, this encourages exploration of the task solution space. And since the user never has to engage in a Change Modes sub-task, there is no possibility for this extraneous sub-task to interfere with the user's primary goal of viewing and cutting. Chapter 8, "The Bimanual Frame-of-Reference," further explores the hypothesis that bimanual control can impact performance at the cognitive level.

3.6.2 The natural central object

As mentioned earlier, the polygonal brain is constrained to appear at the center of the screen. The brain is the natural central object of the manipulation and exploration supported by the interface. This design decision interacts with two-handed control and leads to an interaction technique which does not strictly copy physical reality, yet nonetheless seems quite natural. The key design principle is not to maintain a direct 1:1 correspondence between physical and virtual object motion, but rather it is to maintain the nonpreferred hand as a dynamic frame-of-reference, as described below.

Users do expect the real-world relationship between the props to be mirrored by their on-screen graphical representations. Simplifying control of the virtual brain by centering it on the screen, however, requires a software mapping of its real-world position to its centered position by constraining the x and y translations (note that no such mapping is required for the orientation of the prop). Define the position of the head prop in the real world as (HRx, HRy, HRz). If the center point of the screen is defined as (Cx, Cy), then the virtual constrained head position is given by (HVx, HVy, HVz) = (Cx, Cy, HRz)4.

When the user moves the cutting plane prop relative to the doll's head, the user expects to see this relative motion mirrored on the screen. This implies that the virtual representation of the cutting plane prop is drawn relative to the virtual position of the head prop. That is, the virtual position of the plane is equal to the virtual position of the head plus the real-world offset between the head prop and the cutting plane prop. Define the position of the cutting plane prop in the real world as (PRx, PRy, PRz). The offset is:

The virtual position of the plane is then given by:

This mapping results in the following non-correspondence artifact: if the user holds the cutting-plane prop still and translates only the head prop, the polygonal brain will remain centered and the virtual plane will move in the opposite direction. This violates the generally accepted design principle that an interface should always maintain a direct 1:1 correspondence between physical and virtual object motion. But it adheres to the design principle that the object in the nonpreferred hand (the doll's head) should form the base frame of reference. In hundreds of informal user trials, I have found that users almost never discover this artifact, because they typically hold and orient the head prop in a relatively stable location while moving the cutting plane prop relative to it. The net effect is that the interaction behaves as users expect it would; the mapping is the software embodiment of Guiard's principle that the nonpreferred hand sets the frame of reference while the preferred hand articulates its motion relative to the nonpreferred hand.

Centering the reference object also has some other subtle effects on the interaction techniques and resulting user behavior. Since the nonpreferred hand now defines a dynamic frame of reference relative to which all manipulation occurs, this means that the user is not forced to work relative to the screen itself or relative to some center point within the environment, as is required by unimanual desk-top 3D interfaces [46][111]. Users are free to shift their body posture, to hold their hands on the desk surface, or to hold them in their laps. There is also no need for a "homing" or re-centering command to move the center point, since the nonpreferred hand automatically and continuously performs this function just by holding the doll's head.

In a scenario where the user is working with multiple objects, and not a single natural central object as is the case for neurosurgical visualization, I speculate that the natural central object technique described above could still be used if the notion of a current central object were introduced. For example, the user might hold a clipboard in the nonpreferred hand that represents the current object of interest. Whenever the user selects a new object for manipulation from the environment (using an object selection technique suited to the application), the selected object might fade in and appear on the clipboard. When the user finishes manipulating an object, the object could be detached from the clipboard, saved for later use, or perhaps replaced by the next object for manipulation.

3.7 Interactive volume cross-sectioning

There are several possibilities to consider for presenting an interactive update of the volume cross-section. The data from the volume cross-section could be superimposed on the polygonal view of the brain model, allowing the slice data to be seen directly in the context of the 3D object. The slice data could also be shown in a separate window off to the side of the polygonal graphics display area, which removes the slice from the context of the 3D object, but which thereby allows for a "map view" or "bird's eye view" of the slice from directly above. The selection of an appropriate technique depends on the specific task as well as the practical constraints of what can be implemented with real-time interaction.

3.7.1 Frames-of-reference for the cross section display

In the original concept of the interface, the surgeons wanted to interactively clip away a portion of the object and paste the volumetric cross-section data on to the resulting capping polygon, so that the cross-section would always be seen directly in the context of the polygonal brain model. Without texture mapping hardware, the implementations I attempted could only render the volume slice in context at a maximum of about 2-3 frames per second, compared to about 15-18 frames per second for a display in a separate window. Thus I pursued a separate, out-of-context display approach.

The separate display can take the slice data that would have been superimposed on the polygonal model and draw it in a separate window, giving a perspective view of the slice data, or alternatively the previously mentioned map view from directly above could be used. Figure 3.9 compares these approaches by showing how the separate cross-section display behaves for the perspective view and map view techniques as the user changes the view of the polygonal brain over time.

Figure 3.9 Comparison of perspective and map views of the cross-section data.

As seen in the figure, when the user tilts or moves the plane, the perspective view changes accordingly. Thus, if the user is primarily visually attending to the cross-section display (and not to the 3D view of the polygonal objects), the perspective view technique provides useful visual feedback of the motion of the plane. However, if the user holds the plane so that it is seen roughly edge-on in the 3D view of the polygonal objects, the perspective view conveys essentially no information. This imposes certain restrictions on how one can hold the cutting plane prop: it has to be held roughly vertical so that the cross-section data can be seen during interactive manipulation. Some users initially find this requirement to be confusing, though it is easy to adapt to.

The map view technique (fig. 3.9, bottom row) does not impose any restrictions on the orientation of the cutting plane prop relative to the view-- even when the user holds the plane edge-on, he or she can still see the data resulting from the cross-section. But the map view technique lacks the motion cues provided by the perspective view; as seen in figure 3.9, the map view essentially does not change at all over the first three frames of the time sequence, despite a large angle of rotation. Thus, each technique can be considered to have its own strengths and weaknesses. For the dynamic view of the cross-section, the current version of the props interface defaults to the perspective view and has the map view available as an option. When the user selects a plane (by releasing the clutch button on the cutting-plane prop), a separate static view of the cross-section is saved for subsequent 2D manipulation, and this static view is always displayed using the map view technique.

3.7.2 Texture mapping hardware

Texture mapping hardware allows for the possibility of in-context viewing of the cross-section data, as seen in the prototype implementation of figure 3.10. Standard texture mapping hardware does not help to calculate the cross section itself; this still must be calculated in software to compute the initial texture map. Unfortunately, selecting an arbitrary plane through a volume is probably a worst case for traditional texture mapping. Loading a texture map to texture memory is an expensive operation5. Texture mapping hardware typically assumes that most textures are known when an application begins, and that they will be used repeatedly, so the system performs pre-computations and creates data structures when a texture is first loaded. But when users move the cutting plane prop to interactively select a volume cross-section, the texture map changes every frame and the exact same plane is not usually accessed more than once.

Figure 3.10 Texture mapping shows a slice in the context of the 3D brain model.

Thus, commonly available texture mapping hardware still does not allow an in-context presentation of the cross-section while the cross-section itself is changing6. However, once the user selects a cross-section, this cross-section can be turned into a texture map and integrated with the view of the polygonal objects. With the cross-section available in context, users find it much easier to select a subsequent plane which passes through a target revealed by a previous cut. For example, it would be easy to select a cut which passes through the lens of the eye already revealed in figure 3.10.

3.7.3 Disappearing object problem

When clipping away the polygonal model is enabled, it is possible to position the cutting plane such that it slices away the entire object (fig. 3.11, top), which leads to confusion. One possible solution is to draw a wireframe wherever the object has been cut away (fig. 3.11, bottom). This solution works well with a separate cross-section display. With an integrated display using texture mapping hardware, however, this solution is not ideal because the wireframe obscures the cross-section.

Figure 3.11 The disappearing object problem and wireframe solution

3.8 Clutching mechanisms

A mechanism is often needed to tell the computer to "stop watching" a particular prop. This allows the surgeon to freeze the image in a desired configuration and put down the props. In the original design, a foot pedal was provided to "clutch" the head prop and a thumb button to clutch the cutting-plane prop. The foot pedal behaved like a gas pedal: the user held it down to move. Similarly, the cutting-plane prop only allowed motion while the thumb button was held down. Sellen [145] has shown such tension can reduce mode errors.

I have also experimented with voice control of the clutch. Saying move {head | plane} enables motion, while saying stop {head | plane} disables motion. Since the user is engaged in a real-time manipulation task, the time to speak and recognize a voice command causes an irritating delay. It is not clear if this problem would persist with a more sophisticated voice recognizer than the low-cost unit [174] which I used on an experimental basis, but I expect it would; the delay introduced by speaking the command might itself prove intolerable. Under some conditions voice input can also interfere with short term memory [96], which poses another possible difficulty.

Further user testing has suggested that the interface is easiest to use when there is no clutch for the doll's head. In the current design, the doll's head is always allowed to move. Freezing the polygonal brain in place seems like a useful thing to do, but again the most important design principle is to maintain the nonpreferred hand as a dynamic reference for the action of the preferred hand. If the doll's head is "clutched" so that it cannot move, it is no longer useful as a reference and the preferred hand again must move relative to the environment. I have watched many users clutch the head and then become confused as they subconsciously begin to move their nonpreferred hand to aid the action of the preferred hand, only to have no effect. After gaining some experience with the interface, users generally tend to constantly hold down the footpedal for the doll's head anyway. This questions whether the head clutch serves any real purpose other than to initially confuse users.

The interface does provide the ability to generate a detailed volume rendering of the head; this acts very much like a clutch in the sense that it generates a still image "snapshot," but it does not have the side-effect of interfering with further manipulation. With this capability, there is no apparent need for the head clutch except in unusual circumstances, such as when trying to photograph the interface, or when a user without the ability to use both hands wishes to operate the interface. For such situations, it is still possible to use a footpedal for clutching the doll's head, but this behavior is not enabled by default.

3.9 Touchscreens for hybrid 2D and 3D input

The 3D interface props excel for 3D manipulation, but when 2D tasks such as loading a new patient from an image database arise, there is an awkward pause in the human-computer dialog while the user must put down the props to move the mouse or to use the keyboard. The problem lies in a 3D input versus 2D input dichotomy: some tasks are best done in 3D, others are better suited to 2D, and users need an intuitive and consistent mechanism for switching between the different styles of input. Users are distracted from the focus of their work because they must decide which device to acquire for a given input task.

To address this shortcoming, we added a touchscreen sensor to the monitor used with the interface props. This hybrid interface combines 3D input with more traditional 2D input in the same user interface. Note the ergonomic facility with which a touchscreen can be used: the surgeon can move in 3D using the props; then, without having to put the props down, the surgeon can reach out and touch the screen to perform 2D tasks, since the hand is sufficiently free to extend a finger or knuckle (fig 3.12). This provides a consistent input medium for both 2D and 3D tasks, since the user always interacts gesturally with objects in the real environment: one interacts gesturally with the props to perform 3D operations; one interacts gesturally with the touchscreen to perform 2D operations.

Figure 3.12 User employing the touchscreen in combination with the props.

3.9.1 Previous techniques for hybrid input

Most previous work in 3D interaction has either ignored the problem of adding 2D input or has integrated it in an ad hoc manner, but there are a few notable exceptions. Feiner [55] has constructed a system which combines a 3D augmented reality head-mounted display with a standard 2D desktop display. This allows the user to see 3D virtual objects arranged around his or her head, but does not block out the user's view of the real world, allowing standard mice, keyboards, and CRT displays to be used.

Shaw's approach [149] is to put as many 2D capabilities as possible into the 3D environment. Shaw's system for polygonal surface design and CAD tasks uses two small hand-held "bats" for input devices. Each bat is a six-degree-of-freedom magnetic tracker augmented with three small push-buttons, which can be pushed to indicate various discrete commands. Shaw's interface also incorporates a ring menu [111], which is a pop-up menu which can be accessed in 3D by rotating the bat. The menu appears as a ring surrounding the 3D cursor on the screen; by rotating the bat, the user can select different items which are highlighted in the ring. This works reasonably well when only a few items (no more than about 15-20) are displayed in the ring.

The Polyshop system [1] (fig. 2.5 on page 18) uses a drafting board as a 2D constraint surface. Since the user is wearing a head-mounted display, graphics can appear to lie on a virtual representation of the drafting board, allowing it to be used in a manner analogous to a touchscreen. The drafting board does not actually detect finger touches; it must rely on hand position sensing to estimate when the user is trying to work with the 2D surface. Thus, while promising, this approach lacks the reliable feedback of knowing that an action occurs if and only if one's finger physically touches the display surface.

Work at Brown University [43][77][78] has looked at ways of using mouse-controlled "3D Widgets" for 3D interaction (fig. 2.1 on page 15). Here the problem of combining the 3D and 2D interfaces is obviated, as the mouse is used for all input tasks. Similarly, Brown's Sketch system [186] (fig. 2.3 on page 17) has demonstrated that a 2D interface based on some simple gestures and well-chosen heuristics can be very powerful for sketching 3D objects or scenes.

3.9.2 Description of the touchscreen interface

The surgeon uses a combination of 3D and 2D manipulation facilities when planning a surgery. There is not a distinct 3D interaction phase followed by a distinct 2D interaction phase. Rather there is a continuous dialog which combines 2D and 3D visualization tools to accomplish the surgical plan. Using a mouse for 2D input does not facilitate this style of dialog. In my experience surgeons are hesitant to use mice in the first place; when manipulating the props in addition to the mouse, the surgeon would typically rely on someone else to move the mouse.

The touchscreen graphical user interface (GUI) divides the screen into a set of tiles (fig 3.13) which contain different views of the same volumetric data set. These tiles are interchangeable; for example, to increase the screen real estate for the sagittal view, the user can drag it with his finger into the large area on left side of the screen. The region in the lower right hand corner of the screen acts as a general purpose control panel for all tiles. When the user touches a tile, it becomes selected and a miniature copy of the tile appears in the control panel. The control panel widgets can then be used to interactively manipulate the miniature copy, and after a brief pause, the changes are propagated to the original tile. The control panel includes controls for image contrast and brightness, zooming and panning, browsing or precisely stepping through parallel slices, saving the current image, and resetting the default viewing parameters.

Figure 3.13 Touchscreen graphical user interface for use with the 3D props.

The brightness, contrast, zoom, and slice navigation touchscreen controls in the control panel (fig. 3.14) were suggested by the physical thumb-wheels which are found on many 2D medical image viewing consoles. When the user touches and drags the touchscreen thumb-wheels, the background textures slide up and down, giving immediate visual feedback. The touchscreen thumb-wheels were designed to be used without looking directly at them, because the user is typically focusing on the image being modified, and not the widget itself. The moving background textures can be seen even with peripheral vision, and so meet this "eyes-free" requirement effectively. Using a standard scrollbar on a touchscreen completely fails in this regard, since the user's finger occludes the thumb of the scrollbar. An experimental implementation of the touchscreen thumb-wheels with nonspeech audio feedback has suggested that the technique would be even more effective with appropriate audio feedback.

Figure 3.14 Close-up of touchscreen control panel.

Rather than having constraint modes for the 3D devices, I found that users were more comfortable with expressing constraints using the naturally constrained dialog afforded by the touchscreen. Thus, the tiles which show the standard sagittal, axial, and coronal slice orientations act as subtle constraint modes; all operations on these tiles are naturally constrained to the appropriate axis of the volume. Similarly, once an oblique slice has been selected with the props, this becomes a tile (seen in the upper right of figure 3.13) which expresses constraint along the normal of the currently selected oblique cutting plane.

The touchscreen provides access to a couple of other important facilities. First, at any time, the surgeon can generate a high-quality volume rendering (fig. 3.15), which takes approximately five seconds to generate, by touching a button labelled Volume Render in the interface. The volume rendering can show the cortical surface at the full resolution of the MR scan (rather than at the approximate resolution of the polygonal model), and as such is essential to the surgeon's end goals. Second, to load a new data set, the surgeon presses a button labelled Load which brings up a database browsing dialog (fig. 3.16). This provides facilities for browsing the database, picking patients from an index, and selection of a specific study for a patient. The interface maintains a cache of recently accessed patients, as the surgeon may need to frequently switch between a small number of cases.

Figure 3.15 Volume rendering showing the brain and skin surface.

Figure 3.16 Touchscreen interface for selecting patient images from a database7.

Informal observation of users of the touchscreen and props hybrid interface suggest that touchscreens offer not only intuitive 2D input which is well accepted by physicians, but that touchscreens also offer fast and accurate input which blends well with 3D interaction techniques.

3.9.3 Limitations and proposed enhancements

While promising, the implementation of the hybrid interface concept has several shortcomings. The major usability problems which I encountered were related to the surface acoustic wave (SAW) touchscreen technology which our laboratory chose to use. Disadvantages (for this application) include parallax errors and a limitation on the type of materials which can be used to touch the screen. Parallax errors result from a gap between the touch screen and the actual monitor; these errors make the touchscreen difficult to use if the user is standing or not directly in front of it. Further, since it works by detecting sound waves, the SAW touchscreen can only detect the presence of soft materials such as a finger or an eraser tip. This means that the interface props cannot be used to directly touch the screen, something which users would naturally and repeatedly try to do. I attempted to implement "soft" versions of the props, but it was difficult to find durable materials that were soft enough to do the job, and none of them worked as effectively as a finger.

A resistive membrane touchscreen may provide a more suitable technology for this application. Any type of material can be used to touch resistive membrane touchscreens, and since they consist of a film directly on the monitor glass, parallax errors are reduced. They also respond to a lighter touch than SAW touchscreens, making "drag and drop" style interactions easier to perform. Potential disadvantages include lower transmissivity (they attenuate more of the light coming from the screen) and the inability of currently available commercial models to detect pressure.

Liang [112] has suggested using a graphics tablet, such as those manufactured by Waccom [175], in a similar capacity, though the idea has not been tested with an implementation. Although a tablet would not have the ability to act as a display, this may provide an alternative approach for some applications.

Another promising avenue for exploration is a multimodal combination of voice, touchscreen, and two-handed 3D manipulation. Voice recognition would allow the user to perform discrete commands when both hands are busy manipulating the props, or it could be combined with touch to disambiguate a gesture, as demonstrated by the "Put that There" system [16].

3.10 Informal evaluation: notes on user acceptance

Without the cooperation and close collaboration of real neurosurgeons, this work would not have been possible. The design of the interface has been an iterative process where I slowly learned what interaction techniques neurosurgeons could use to accomplish the goals they had in mind, and the surgeons learned what possibilities and limitations the computer offers as a surgical planning and visualization tool. Watching a surgeon become frustrated with an inadequate interface design that seems "intuitive" to computer scientists is a very strong motivation to discover how to improve the interface.

3.10.1 User observations

The most valuable tool for evaluating and improving the interface design has been informal observation of test users. This section focuses on my user observations of neurosurgeons, but I should point out that I have tested the interface with many other types of physicians who work with volumetric data, such as neurologists, cardiologists, and radiologists. I have also performed many informal user observations during demonstrations to the general public and during debriefing sessions with experimental subjects.

According to my collaborators in neurosurgery, there are currently about 4,000 neurosurgeons practicing in the United States. Over the history of the project about 50 neurosurgeons have tried the props-based interface during informal observation sessions8. Surgeons who tried the system typically included attending neurosurgeons or neurosurgical residents from UVA, prestigious visiting neurosurgeons from other institutions, and interviewees for neurosurgical residencies. Some additional neurosurgeons tried the system during demonstrations at academic neurosurgery conferences.

The methodology for testing was simple but effective. On some occasions I set up the system in the hospital or another location convenient for the surgeons, but more often surgeons would visit our laboratory for an informal demonstration. I almost always began by briefly showing the system to the visiting neurosurgeon. Many surgeons were eager to try it and would jump in themselves without needing an invitation. Then I would stand back and just watch the neurosurgeon operating the interface. If the surgeon seemed to be having some troubles, rather than immediately intervening with advice or suggestions, I would wait for the surgeon to ask a question or make a comment. In this way, I could understand a problem in the terms the surgeon was using to think about it. I found that a good question for stimulating discussion of the interface was: "Can you show me what you did during your most recent surgical procedure?"

Surgeons usually would offer opinions, advice or suggestions for the interface on their own without any need for prompting. This would start a discussion where I would also ask the surgeon about particular problems I had observed, what they saw as capabilities or limitations of the interface, and how the prototype interface might be augmented to become a clinical tool. Another question which helped me to see what the interface looked like from the surgeon's perspective was: "How would you explain the interface you just saw to a colleague?"

Neurosurgeons have been very enthusiastic about the props-based interface. All of the neurosurgeons who have tried to use the interface were able to "get the hang of it" within about one minute of touching the props; many users required considerably less time than this. This clearly demonstrates that with a cursory introduction, neurosurgeons who have never before seen the interface can rapidly apply their existing skills for manipulating physical objects with two hands, and can understand and use the interface without training.

In addition to the neurosurgeons, approximately 50 physicians from other specialities have tried the interface. The overall response has been similar to that of neurosurgeons, particularly from specialists who commonly deal with volumetric MRI data, such as data representing the knees, shoulders, or heart. From talking with these physicians, it is clear that oblique plane visualization problems commonly occur in these fields as well; for example, when visualizing the knee, oblique cuts along the ligaments are clinically valuable to assess the severity of an injury.

One exception has been with radiologists, where the response has been more variable. Radiology deals primarily with diagnosis, not planning surgical interventions, and furthermore radiological training emphasizes the mental 3D visualization of structures from 2D information. Thus, the interface does not support tasks which are of interest to most radiologists, and therefore it does not represent a useful tool to most of these individuals. This is not to say that all radiologists found the interface useless-- on the contrary, several radiologists with an interest in the issues posed by digital medical imaging have been quite enthusiastic.

Finally, I estimate that I have given or observed hands-on demonstrations to over 1,000 users from a broad sample of the general public, ranging from small children to elderly university benefactors. These informal demonstrations have been useful for testing some of my ideas on a larger sample of test users than is possible with physicians. Furthermore, these observations strongly suggest that people in general, and not just skilled and dexterous surgeons, can use both hands to perform 3D manipulation tasks.



1 It is important to minimize the time window between imaging and actual surgery so that the digital images do not become a stale representation of the brain (the brain can change over time, especially in a sick or injured patient), and of course because the patient (who must be conscious for some procedures) may experience considerable pain while wearing the stereotactic frame.

2 The estimate of 4,000 practicing neurosurgeons in the United States was provided by Dr. Neal Kassell of the Neurosurgery department.

3 The software runs on a Hewlett Packard J210 workstation with the "Visualize" hardware polygonal and texture mapping acceleration.

4 The system actually performs a log transformation on the zoom factor. Thus HVz is mapped to a function of log(HRz). As long as a corresponding log transform is also applied to the Z coordinate of the cutting plane, PRz, this log transform does not affect the mapping described in this section.

5 This is changing on high-end graphics workstations. The new SGI architecture is designed to provide high bandwidth to texture memory. Hewlett Packard is planning to make several improvements to their texture-mapping facilities based on feedback from my project.

6 Some high-end SGI graphics supercomputers do allow definition of "3D texture maps" or "voxel maps" which can perform this function in hardware.

7 To maintain patient confidentiality, the actual patient names in this figure have been blurred out.

8 Only about 5 neurosurgeons have tried the touchscreen interface, since it is a fairly recent enhancement, and my research has since shifted away from this avenue.



[Top] [Prev] [Next] [Bottom]

Copyright © 1996, Ken Hinckley. All rights reserved.