Jakob Nielsen's Alertbox for December
1995:
Guidelines for Multimedia on the Web
Multimedia is gaining popularity on the Web with several technologies to
support use of animation, video, and audio to supplement the traditional
media of text and images. These new media provide more design options but
also require design discipline. Unconstrained use of multimedia results in
user interfaces that confuse users and make it harder for them to
understand the information. Not every webpage needs to bombard the user
with the equivalent of Times Square in impressions and movement.
Notes about this month's column:
This column is longer than usual and much longer than recommended for a web
page.
I am doing this on request because many people have asked for advice on how
to design for the new dynamic web media.
Some of the links in this column point to Javatized pages and will not show
anything interesting if your browser does not support the version of Java
used on the pages.
Animation
Moving images have an overpowering effect on the human peripheral vision.
This is a survival instinct from the time when it was of supreme importance
to be aware of any saber-toothed tigers before they could sneak up on you.
These days, tiger-avoidance is less of an issue, but anything that moves in
your peripheral vision still dominates your awareness: it is very hard to,
say, concentrate on reading text in the middle of the a page if there is a
spinning logo up in the corner.
Never include a permanently moving animation on a web page since it will
make it very hard for your users to concentrate on reading the text.
Animation is good for:
- Showing continuity in transitions.
When something has two or more states, then changes between states will be
much easier for users to understand if the transitions are animated instead
of being instantaneous. An animated transition allows the user to track the
mapping between different subparts through the perceptual system instead of
having to involve the cognitive system to deduce the mappings. A great
example is the winner of the first Java programming contest: proving the Pythagorean
theorem by animating the movement of various squares and triangles as
they move around to demonstrate that two areas are the same size
(unfortunately, this otherwise good page uses animated text
inappropriately: the text moves constantly and is hard to relate to the
events in the main animation).
- Indicating dimensionality in transitions.
Sometimes opposite animated transitions can be used to indicate movement
back and forth along some navigational dimension. For example, paging
through a series of objects can be shown by an animated sweep from the
right to the left for turning the page forward (if using a language where
readers start on the left). Turning back to a previous page can then be
shown by the opposite animation (sweeping from the left to the right). If
users move orthogonally to the sequence of pages then other animated
effects can be used to visualize the transition. For example following a
hypertext link to a footnote might be shown by a "down" animation and
tunneling through hyperspace to a different set of objects might be shown
by an "iris open" animation.
One example used in several user interfaces is the use of zooming to
indicate that a new object is "grown" from a previous one (e.g., a detailed
view or property list opened by clicking on an icon) or that an object is
closed or minimized to a smaller representation. Zooming out from the small
object to the enlargement is a navigational dimension and zooming in again
as the enlargement is closed down is the opposite direction along that
dimension.
- Illustrating change over time.
Since an animation is a time-varying display, it provides a one-to-one
mapping to phenomena that change over time. For example, deforestation of the rain forest can be illustrated by
showing a map with an animation of the covered area changing over time.
- Multiplexing the display.
Animation can be used to show multiple information objects in the same
space. A typical example is client-side
imagemaps with explanations that pop up as the user moves the cursor
over the various hypertext anchors. It is also possible to indicating the
active areas by having them shimmer or by surrounding them with a marquee
of "marching ants". As always, objects should only move when appropriate
(e.g., when the cursor is over the image).
- Enriching graphical representations.
Some types of information are easier to visualize with movement than with
still pictures. Consider, for example, how to visualize the tool used to
remove pixels in a graphics application. The canonical icon is an eraser as
shown on the left in the following figure, but in user testing I have
sometimes found that people think that the icon is a tool for drawing
three-dimensional boxes. Instead, one can use an animated icon as shown on
the right in the figure: when the icon animates, the eraser is moved over
the background and pixels are removed, clearly showing the functionality of
the tool.
In icon design, it is always easier to illustrate objects (a box) than
operations (removing pixels), but animation provides the perfect support for
illustrating any kind of change operation. In an experiment reported at the
CHI'91 conference, Baecker, Small, and
Mander increased the comprehension of a set of icons from 62% to 100% by
animating them. Of course, an icon should only animate when the user
indicates a special interest in it (for example, by placing the mouse
cursor over it or by looking at it for more than a second if eye-tracking
is available). Especially considering the preponderance of toolbars in
current applications it would be highly distracting if all icons were to
animate at all times.
- Visualizing three-dimensional structures.
Since the computer screen is two-dimensional, users can never get a full
understanding of a three-dimensional structure by a single illustration, no
matter how well designed. Animation can be used to emphasize the
three-dimensional nature of objects and make it easier for users to
visualize their spatial structure. The animation need not necessarily spin
the object in a full circle: just slowly turning it back and forth a little
will often be sufficient. The movement should be slow to allow the user to
focus on the structure of the object. Three-dimensional objects may be
moved under user control, but often it is better if the designer determines
in advance how to best animate a movement that provides optimal
understanding of the object: this pre-determined animation can then be
activated by the user by simply placing the cursor over the object, whereas
user-controlled movements require the user to understand how to manipulate
the object (which is inherently difficult with a two-dimensional control
device like the mouse used with most computers - to be honest,
3D is never
going to make it big time in user interfaces until we get a true 3D control
device).
- Attracting attention.
Finally, there are a few cases where the ability of animation to dominate
the user's visual awareness can be turned to an advantage in the interface.
If the goal is to draw the user's attention to a single element out of
several or to alert the user to updated information then an animated
headline will do the trick. Animated text should be drawn by a one-time
animation (e.g., text sliding in from the right, growing from the first
character, or smoothly becoming larger) and never by a continuous animation
since moving text is much harder to read than static text. The user should
be drawn to the new text by the initial animation and then left in peace to
read the text without further distraction.
Video
Due to bandwidth constraints, use of video should currently be minimized on
the web.
Eventually, video will be used more widely, but for the next few years most
videos will be short and will use very small viewing areas.
Under these constraints, video has to serve as a supplement to text and
images more often than it will provide the main content of a website.
Currently, video is good for:
- Promoting television shows, films, or other non-computer media that
traditionally have used trailers in their advertising.
- Giving users an impression of a speaker's personality. Unfortunately,
most corporate executives project a lot less personality than, say, Captain
Janeway from Star Trek, so it is not necessarily a good idea to show a
talking head unless the video clip truly adds to the user's experience.
- Showing things that move. For example a clip from a ballet. Product
demos of physical products (e.g., a coin counter) are also well suited for
video, whereas software demos are often better presented as a series of
full-sized screendumps where the potential customer can study the features
at length.
A major problem with most videos on the web right now is that their
production values are
much too low. User studies of CD-ROM productions have found that users
expect broadcast-quality production values and that users get very
impatient with low-quality video.
A special consideration for video (and spoken audio) is that any narration
may lead to difficulty for international users as well as for users with a
hearing disability. People may be able to understand written text in a
foreign language because they have time to read it at their own speed and
because they can look up any unknown words in a dictionary.
Spoken words are sometimes harder to understand, especially if the speaker
is sloppy, has a dialect, speaks over a distracting soundtrack, or simply
speaks very fast. Poor audio quality may contribute to the difficulty of
understanding spoken text: it is recommended to use professional quality
audio equipment and/or lavaliere microphones when recording a narrator. The
classic solution to these problems is to use subtitles but as shown in the
following figure, subtitles require special attention on the web.
The figure shows a subtitled frame from Sun's Starfire video.
The small subtitles (left image) look good on the
original
video tape (JPEG, 197 K) but are virtually unreadable on the smaller
image size currently used for computerized videos. Using bigger subtitles
that have been anti-aliased for computer viewing (middle image) improves
readability significantly, but the best results are achieved by the
letterbox format (right image). In this example, the subtitles in the
letterbox are constructed by enlarging the video area for the movie file
with a 24-pixels high black area. Doing so does not increase the file size
proportionally since the black area compresses very nicely. Even so, it
would be better to transmit the subtitles as ASCII (or Unicode) and have
them rendered in the letterbox on the client machine: a perfect job for an
applet. It would even be possible to have the user select the language for
the subtitles through a preference setting or a pop-up menu
(JPEG, 206 K).
Audio
The main benefit of audio is that it provides a channel that is separate
from that of the display. Speech can be used to offer commentary or help
without obscuring information on the screen. Audio can also be used to
provide a sense of place or mood as done to perfection in the game
Myst.
Mood-setting audio should employ very quiet background sounds in order not
to compete with the main information for the user's attention.
Music is probably the most obvious use of sound. Whenever you need to
inform the user about a certain work of music, it makes much more sense to
simply play it than to show the notes or to try to describe it in words.
For example, if you are out to sell seats to the La Scala opera in Milan, Italy, it is an
obvious ploy to allow users to hear a snippet of the opera: yes, Verdi
really could write a good tune (AU file, 1.4 MB), so maybe I will
go and hear the opera next time I am over there.
In fact, the audio clip is superior to the video
clip from the same opera which is too fidget to impress the user and
yet takes much too long to download (QuickTime, 3.6 MB).
Voice recordings can be used instead of video to provide a sense of the
speaker's personality
(AU file, 1.4 MB): the benefits are smaller files, easier production, and
the fact that people often sound good even if they would look dull on
television. Speech is also perfect for teaching users the pronunciation of
words as done by the French
wine site: it used to be the case that you could buy good wine cheaply
by going for chateaus that were hard to pronounce (because nobody dared ask
for them in shops or restaurants) -- no more in the webbed world.
Non-speech sound effects can be used as an extra dimension in the user
interface to inform users about background events: for example, the arrival
of new information could be signaled by the sound of a newspaper dropping
on the floor and the progress of a file download could be indicated by the
sound of water pouring into a glass that gradually fills up. These kinds of
background sounds have to be very quiet and nonintrusive. Also, there
always needs to be a user preference setting to turn them off.
Good quality sound is known to enhance the user experience substantially so
it is well worth investing in professional quality sound production. The
classic example is the video game study where users claimed that the
graphics were better when the sound was improved, even though the exact
same graphics were used for the poor-quality sound and the good-quality
sound experiments. Simple examples from web user interfaces are the use of
a low-key clicking sound to emphasize when users click a button and the use
of opposing sounds (cheeeek chooook) when moving in different directions
through a navigation space.
Response Time
Many multimedia elements are big and take a long time to download with the
horribly low bandwidth available to most users. It is recommended that the
file format and size are indicated in parentheses after the link whenever
you point to a file that would take more than 15 seconds to download with
the bandwidth available to most of your users. If you don't know what
bandwidth your users are using you should do a survey to find out since
this information is important for many other page design issues. At this
time, most home users have at most 28.8 Kb, meaning that files longer than
50 KB need a size warning. Business users often have higher bandwidth, but
you should probably still mark files larger than about 200 KB.
The 15-second guideline in the previous paragraph was derived from the
basic set of response time values
that have been known since around 1968. System response needs to happen
within about 10 seconds to keep the user's attention, so users should be
warned before slower operations. On the web, current users have been
trained to endure so much suffering that it may be acceptable to increase
the limit value to 15 seconds. If we ever want the general population to
start treating the web as more than a novelty, we will have to provide
response times within the acceptable ranges, though.
Design of client-side multimedia effects has to consider the other two
response time limits also:
- The feeling of directly manipulating objects on the screen requires
0.1 second response times. Thus, the time from the user
types a key on the keyboard or moves the mouse until the desired effect
happens has to be faster than 0.1 seconds if the goal is to let the user
control a screen object (e.g., rotate a 3D figure or get pop-ups while
moving over an imagemap).
- If users do not need to feel a direct physical connection between their
actions and the changes on the screen, then response times of about
1.0 second become acceptable. Any slower response and the
user will start feeling that he or she is waiting for the computer instead
of operating freely on the data. So, for example, jumping to a new page or recalculating a spreadsheet
should happen within a second. When response times surpass a second, users
start changing their behavior to a more restricted use of the system (for
example, they won't try out as many options or go to as many pages).
Next month: Relationships on the Web (no, not about
dating.)
See Also:
List of other Alertbox columns