[ Main Page ] [ About ] [ Interesting ] [ Design & Development ] [ Mailing Lists ] [ Downloads ] [ brainjuice ]

Ideas & Writings

Next Generation User Interfaces: Emerging Technologies and Directions

Author: Mike Bennett (smoog at techie dot com)
Last Update: 18/09/2000

Abstract

The user interface on desktop computers has remained rather static over the years with small evolutionary advances that tend more towards eye candy than actual usefulness. Most advancements that do occur are merely implementations of old ideas and don't adequately take into consideration modern user requirements or the fact that people have a greater familiarity with technology so are more comfortable using it. Nor do the interfaces fully utilise the power of modern computers that have memory and CPU cycles to waste.

To fully grasp the possibilities for user interfaces we look at on going research. While examining how the research can possibly be deployed in real world desktops and what advantages and disadvantages this could possibly lead to. Some of the concepts require further research, such as 3 dimensional interfaces, while others are ready now, such as multi-scalar interfaces.

Once the research is out lined speculation is made on possible ways of fitting the most useful aspects together. Various walk throughs of potential interfaces are presented, which helps highlight some of the stronger and weaker interface methods.

The Research

This section outlines the various research fields that I believe have a lot of potential to make important contributions to user interfaces. Some of the projects are not directly usable but they provide a good framework upon which to build, others are already seeing some early stage deployment in commercial products.

Calm Technology (Weiser and Brown, 1996 [1])

Also known as "Ubiquitous Computing (Wesier [2])" is the concept of non-intrusive computing, "where the technology recedes into the background". It envisions a world where certain types of technology can sit on the "periphery" and do not require our full focus to be of use. This is achieved through utilising the human ability to notice things while we are paying attention to other tasks, i.e. when driving notice that something is wrong with the car if the sound of the engine changes, even though we're focused on the road ahead and surrounding traffic.

Another interesting example is the "Dangling String" [1], created by the artist Natalie Jeremijenko, where a piece of plastic spaghetti is hung from a motor on the ceiling. The speed of the motor is proportional to the amount of traffic passing through the local ethernet network. The more traffic on the network the faster the plastic spaghetti spins, transforming the spaghetti into a certain shape and creating a distinctive sound, if the motor spins slower then the shape of the spaghetti is different, as is the sound. This can be hung within view of many people but it wouldn't have to be intrusive, therefore becoming part of an individual's environment. If it stopped individuals in its presence would feel something is wrong with their environment, notice what has changed and seek to rectify the problem. The problem in this case would be something which causes the network to stop sending traffic.

Multi-Scalar Interfaces

Multi-scalar interfaces don't force applications to assume certain predefined scales nor points of reference. They free the user to completely define the scale and position of objects and data on the graphical user interface, in most cases the desktop. Therefore allowing users to better utilise screen space, application relationships and the visualisation of data.

In "Pad" (3, Perlin et al) this is build upon with the concept of portals which can bring differently scaled applications and data together in various views, e.g. have a graphing package and a spreadsheet sharing the one view, while the same spreadsheet is sharing another view with a mathematical package and data import package. A view can be quite intelligence and could be used not to merely group applications but also to act in a transformative manner, e.g. a graphing view could act on a pre-generated image, which is displayed in a web-browser, and convert the information into different formats for people with other preferred ways of visualising information, i.e. pie charts instead of bar charts, etc.

Xerox Parc researchers have also worked on the concept of views, without the multi-scalar aspects, which they have called "Magic Lens and Toolglasses" (4, Bier et al, 1993). Apple's research group has also worked on a form of "Magic Lens" but they've approached it from a different perspective and call them "Data Detectors" [5].

Realistically Apple's concept isn't as fully developed and is more of a solution to a particularly problem, i.e. how can you make the cutting and pasting of text more intelligent. With their solution when you highlight text it is automatically parsed to find URL's, email address's, etc. When matches are found context sensitive menus are displayed, e.g. if the highlighted text has URL then there's an option to display it in Netscape, etc.

3 Dimensional Interfaces

3 Dimensional interfaces have gotten a lot of attention from diverse groups, ranging from sci-fi authors to groups research and developing potential ones such as 3Dsia [6], Verse [7], 3Dwm [8] and Microsoft's Task Gallery [9].

The definition of a 3D interface varies from research group to research group. The types under development, but not limited to, include:

fully immersive worlds and operating environments, 3Dsia [6] and Verse [7]
the standard desktop moved from 2D to 3D, 3Dwm [8] and Microsoft's Task Gallery [9]

Other variations on the theme include augmented realities which allow the real time projection of virtual environments onto the physical world. An example of its possible use is a plumber, while wearing special glasses, could look at a room and see where all pipes (virtually overlaid) are running even when they're hidden behind walls, etc.

3D environments also require a rethinking of how to we interact with the desktop, i.e. what type of replacement is used for the mouse - since the mouse if very much a tool for a 2D environment. A discussion of this can be found in "Elements of a Three-dimensional Graphical User Interface" (10, Leach et al, 1997).

Visual Relationships

User interfaces are used to convey information so the nature of how the information is displayed is important. The key problem is how to visually represent a lot of, often vaguely, related data so that it can be navigated and understood reasonably easily.

Clearly with the advent of the Web hypertext has become in vogue as a possible answer, but that has serious limitations such as keeping track of where you were, allowing the inclusion of notes, etc. An interesting solution to both those problems is the concept of "Fluid Documents" [11] where the content of related links can be insert into the middle of the currently view web page. This wouldn't really work with the Web as a whole but it certainly has its applications within a well structured and designed web site.

Another important aspect to consider when designing good visual systems is the fact that people have different ways of modelling data relationships, so what could work very well for one person may be an utter mess for another.

Some interesting developments in this area include the "Visual Thesaurus" [12] which is a very graphical way of demonstrating the relationships between words while also allowing easy navigation, i.e. you enter a start word which then appears in a 3 dimensional space surrounded by related words that are linked by thin lines to the parent. The words can be made to spin around the parent which helps when there are a lot of relationships, the emphasis on word types (verb, noun, etc) can be represented by the strength of the lines, etc. Another implementation of the same concept can be found at "The Brain" [13] but this is a commercial 2 dimensional version that isn't restricted to representing word relationships.

Gesture Based Interfaces

Gesture based computing is still in its infancy, for a variety of reasons including the complexity of training a computer to recognise 2 and 3 dimensional gestures.

At the moment most PDAs (personal digital assistants) provide pen based gesture input, i.e. you draw a certain type of 'g' to input a 'g'. How the gesture is drawn, the directions taken, etc are all very relevant to enable character recognition. An evolutionary development is to association shapes/gestures with actions as part of the standard desktop, which is under development as part of wayV [14], e.g. draw an N and Netscape starts, draw a C and a calculator starts, etc.

An interesting area is the inputting of gestures via other means, especially cameras. This is becoming more possible, especially with the advent of cheap cameras that have strong market penetration, i.e. webcams. Matthew Turk of Microsoft's research group has some interesting work going on in this area [15]. Of course one of the questions that arise is what exactly is the usefulness of such developments?

Voice Interfaces

One of the fields that presented a lot of challenges over the years is the area of voice recognition. While its not completely perfect yet, and won't for a considerable time to come, it has more than reached the stage where its useful and used in real world situations, ranging from mobile phones to voice dictation in word processors.

Since the technology is reaching a mature stage a lot of consideration needs to be paid as to how it should be integrated, used and deployed. One group considering this is the Speech Interface Group at MIT [16].

Visual Programming

Visual programming is a newly emerging field, so much so that there isn't even a consistent name for it. It also goes by the names of "Programming by Demonstration" [17], "Programming by Example" [18] and "Demonstrational Interfaces" [19].

The concept is simple but the implementation aspects are very complex. The idea breaks down into two main parts:

computers should be able to learn by looking at repeat actions
repeat actions may be represented via graphics and icons which can be used to build up complex macros and programs

An example of the first part occurring is if you delete a .pdf file in a folder and follow that by the deletion of another .pdf in the same folder the computer should be intelligent enough to ask would you like to delete all the .pdf files in that folder.

The second part is where the idea is really revolutionary, basically you can develop computer programs by demonstrating how actions should occur and if need be build animated graphical representations that are computer programs themselves. "Pictorial Janus" [20] and "ToonTalk" [21] are implementations of these concepts and are worth looking at.

Miscellaneous

There are many other fields in user interface research at the moment but the ones above are the areas I feel have the most potential to change the way we understand computers and how we perceive and interact with data.

Other obvious areas include eye aware interfaces (Edwards, 1998 [22]), which is more of a sub-set of gesture based computing and would have limited applications. Hardware devices for other forms of input while simultaneous giving tactile feedback and 3 dimensional display devices [23].

Advantages and Disadvantages

In this section we deal with each of the above mentioned areas, we try and apply some critical evaluation and envision what impact they could have on the desktop experience.

Calm Technology

This is more of a concept than a particular application, its a philosophy on how we should possibly design interfaces. Though saying that there have been some attempts to develop "Calm" applications, in particular I draw attention to "LavaPS" [24] written by John Heidemann.

"LavaPS" displays running processes on a Unix computer via a graphical interface. The processes are represented with blobs of colours, the size of the blobs, the speed at which they move and the colour all tell you information about the state of the process each blob is associated with. From the personal tests I've carried out I do find that "LavaPS" runs on your desktop in a non-intrusive manner, but when you pay attention to it understanding what is going on turns out to be hard at anything but the most basic level.

Advantages:

non-intrusive which could use our much neglect peripheral senses
allows the simultaneous monitoring of a greater number of data sources

Disadvantages:

requires a lot more thought to be applicable
it would seem to be easier to create a badly designed calm application than a normal application

Multi-Scalar Interfaces

This is a technology which we should really see as part of the desktop now. People often run many different applications simultaneously but don't require all of them to be full sized the whole time, but they do require the ability to keep an eye on what's going on with them.

Advantages:

screen estate could be managed much better
the ability to watch what's happening in a few scaled applications rather than the current situation of having to flick between many overlaid applications
multi-scalar interfaces are a form of zoomable interface and are therefore advantageous to vision impaired people
transformative views would allow individuals to shape data the way they wish, rather than the way it was given to them
views would also allow people to group applications in a more natural way, so a single running instance of an application could be in two or more groups if required

Disadvantages:

a considerable change to the desktop which would present a whole new learning curve to people
views could give the user too many ways of viewing information resulting in a lack of consistency over time and data
powerful computers would be required to handle all the real-time scaling issues along with more finely grained scaling algorithms

3 Dimensional Interfaces

At the moment, and for the next while, 3 dimensional user interfaces as part of the desktop are going to remain beyond a lot of users. There are a variety of reasons for this, including, but not limited to:

too many possible directions they can be developed in
a lack of clear methods and best approach

Advantages:

more like the real world so theoretically less of a learning curve if designed with that in mind
the ability to display a greater amount of information without having to increase screen size
possibly more immersive therefore allowing easier collaboration between geographically disparate individuals

Disadvantages:

3 dimensional graphics require a lot of processing power which is only now becoming available as part of standard desktop machines
a completely different form of computer interaction requiring people to relearn their basic computing skills
if a 3 dimensional environment is completely immersive and non-collaborative it could possibly lead to a sense of isolation for the user

Visual Relationships

Data visualisation has clearly come of age, the rapid growth of the Web demonstrates this. Using the techniques mention above we should move onto more graphical ways of representing data, for navigating our file systems, our collection of documents, etc.

How we perceive data shapes how we think and if we all are forced to perceive data the same way we simply will miss some obvious solutions to problems.

Advantages:

could encourage a greater variety of approaches to problem solving
allows at a glance understanding of complex relationships between data

Disadvantages:

as mentioned above different people have different ways of visualising data so what works for one won't work as well for another
finding which visual representations work best will require a lot more research and the answers will be domain/data specific

Gesture Based Interfaces

There is a market for various forms of gesture interfaces, primarily PDA's but the desktop offers a lot of potential development in this area. It's much neglected, even from the point of view of just using the mouse as the gesture input device.

As a research area it offers a lot of interesting problems that offer challenges ranging from what are optimal gestures for the symbolic representation of actions to trying to recognise 3 dimensional gestures.

Advantages:

as part of the desktop allows fast command execution
allows people to using "writing" as a form in computer input, rather than typing
could potentially do away with the mouse and keyboard and allow free forms of human computer interaction

Disadvantages:

its not very obvious as to how it should be integrated with existing technology
recognising the transforms a hand goes through while making 3 dimensional gestures is an extremely hard problem

Voice Interfaces

Humans are obviously used to speaking as one of the primary communication tools so there should be a wide spread adoption of voice interfaces in all types of technology and business situations. Growth should be especially obvious in devices where keyboard and pen input would be clumsy or problematic, this includes PDAs, ATM, video recorders, etc.

Advantages:

voice input doesn't require typing
it a more natural form of interaction
the technology is reasonably mature

Disadvantages:

having an open plan office with lots of people simultaneously attempting voice input would be rather loud and extremely distracting
its open to a lot of potential abuses, e.g. monitoring vast numbers of telephone calls watching for keywords, using voice recognition to track an individual's conversations as they use different phones, etc.

Visual Programming

As it currently stands visual programming could be applied to the standard desktop but the capabilities and possibilities have still to be discovered and defined so it would be rather premature.

Advantages:

reduce the amount of tedious repeat actions
allow people with less of a development background to create computer programs
may suit people who are more visually rather than mathematically inclined

Disadvantages:

tries to be too clever and would end up annoying rather than useful, i.e. like the "Paper Clip" in Microsoft Office
requires a lot of work to develop consistent visual syntax out of which computer programs could be built

Miscellaneous

Eye aware applications are currently used as part of user interface usability testing. Whether it could have wide spread applications is questionable, certainly it would add an extra element to computer games.

Tactile feedback devices will have their time but at the moment further consideration is required to develop concepts of where they'd be useful for, apart from virtual reality.

Advantages:

figuring out exactly where a user is looking and how long they're looking can lead to better user interfaces if the information is used correctly
possible use of some of our neglected senses to give feedback/input

Disadvantages:

shining lasers directly into peoples eyes for long periods of time may have unforeseen consequences
tactile devices could require the wearing of special equipment which may restrict movement some what

The Future

So where can the desktop go from here? Well if some of the above concepts where deployed together things could get very interesting and useful.

Imagine a few possible user desktop environments that are like the standard Windows 98 graphical user interface, except each version has the extra attributes listed below with an example situation of each in use.

Multi-scalar [3] graphical user interface with voice recognition

the user sits down to his computer, which he has called "Babble"
he says "Babble start-up"
the computer is constantly monitoring for voice activation while in a low power mode and when it receives the right voice signal it starts up
it logs the user onto the network without forcing the user to type in his password, instead it generates a user unique reproducible key based on the user's voice
the user says "Web profile"
the computer brings up a few sessions of a web browser, it opens them on various websites including a stock ticker. The stock ticker is actually quite big and would take up a lot of screen space but it is rescaled to the user's previously set preferences. The rescaling wasn't originally build into the stock ticker and is possible because the UI is multi-scalar
the user creates 2 views [3][4] of a single web browser side by side on his desktop, this means the same web page is displayed twice side by side
he places a second view on one of the web browser views. This is a tranformative view which (as best it can) translates all the English which appears below it into Japanese. The user's first language is English but he's trying to learn Japanese. By using the transformative view he can browse the web in English on one of the web browser views while seeing a Japanese version of the same web page in the other web browser view.

Visual relationships [12] and gesture recognition via a camera [15]

the user sits down at his computer and looks at the monitor for 10 seconds
the computer is in low power mode but it constantly monitors its camera input and notices the user is looking at it for more than 10 seconds so it starts up
it logs the user into the network with the password either pulled from a file, which is looked up by recognising the user's face, or by generating a user unique reproducible key based on the user's face and body
the user makes a 'w' gesture with one of his hands
the computer recognises it and starts up a web browser
the user makes a pulling apart motion with his hand
the computer find which application is active, in this case the web browser, and creates a second version of it on screen
the user starts up a email client by makes an 'e' gesture
the computer has a small application running in the lower right hand corner which shows all the applications running. The display of the application uses 3 dimensional visual relationships like in the "Visual Thesaurus". The applications are grouped in hierarchies which are navigated via gestures.
the user has been using the computer for a while, he decides to switch back to the web browser so actives the visual application browser via a gesture
he then navigates around it with gestures and quickly finds the web browser he wishes and switch to that
he pulls up the history of web sites he has visited, the history is also displayed via a visual relationships tool like the "Visual Thesaurus", and goes back to a previous web site and begins viewing a side branch (related link)

Calm computing [1] with demonstrational interface [19] form of visual programming

the user sits down at his computer and logs in
the computer automatically brings up his preferred applications
it checks to see what time of day it is and if its in the morning it starts up a few web browsers with his favourite web sites
note that he didn't set-up the computer with his preferred applications or web sites, it has "learn" what he likes via the demonstrational interface
while he's using the computer the bar at the top of every application changes colour in a non-drastic manner (calm computing), the various colours indicate whether he has new mail, whether a friend has logged onto ICQ, what the load is on some servers, etc
he begins writing up some C computer code, the demonstrational interface watches keyboard input for certain patterns that it has learnt how to complete, e.g. when its see "if(" it automatically suggests ")\n{\n }\n". By watching the keyboard it is application independent.

3 dimensional graphical user interface, a cross between 3Dwm [8] and Microsoft's Task Gallery [9], with an eye tracking abilities [22]

the user put on some virtual reality glasses
the user starts up the computer and logs onto the network
he pulls up a menu with a list of available applications
his eyes focuses on various applications for a certain length of time, these applications are started up in response to that
he wants to change the preferences of his web browser to use a different web proxy
to change the proxy he twists the web browser so he's looking at the back of it, i.e. like looking inside something
he flicks a few virtual switches and enters text for the new proxy
he twists the web browser around so he's looking at its front and begins to use it
there's a small transparent 3 dimensional cube hanging close by which he grabs
the cube, in miniature, shows the user his world and where all the applications are in relation to his current position, the user is represented by a small green dot in the cube
by moving the green dot around the user changes what applications he is interacting with or viewing
the user creates a "wormhole" (like a view [3][4]) which are shortcut connections between different parts of the 3 dimensional space, or even connections between different 3 dimensional spaces which could be on other computers
the wormhole allows him to work with a friend on preparing a document even though both of them are in separate parts of the world

The outlines above should help you see how many positive differences and improvements there can be made to the current desktop experience. Not all the possibilities will work straight away, they certainly require a lot more research and a strong emphasis should be placed on the development of prototypes.

Conclusion

As I've shown there is a great deal of on going research into user interfaces, of which I've only covered a limited amount, but the research is not just re-inventing the wheel. Significant advances are been made, its anyone's guess what the future holds but its definitely going to be interesting.

A big question is why has the user interface remained static for so long? Why must we continue waiting for Microsoft and co to introduce the above research into the real world? Yes, some of it would involve the end user learning new things but the transition would be nothing when compared to learning the user interface differences between Windows 3.1 and Windows 95, which many non-technical did very successfully.

One very important point is worth making: the amount of serious user interface research in Ireland is disappointingly low. There is the "Human Factors Research Group" [25] in University College Cork dedicated to it but their focus is more on usability testing and metrics than new interface development.

Ultimately the user interface is about how we relate to data, how we perceive it, how we manipulate it and how we transform it.

References

[1] (Weiser and Brown, 1996) "The Coming Age of Calm Technology", Mark Weiser and John Seely Brown, Xerox PARC, 1996, http://nano.xerox.com/hypertext/weiser/acmfuture2endnote.htm

[2] (Weiser) Mark Weiser, http://www.ubiq.com/hypertext/weiser/UbiHome.html

[3] (Perlin et al) "Pad", Ken Perlin, Prof. Jack Schwartz and Jonathan Meyer, New York University Media Research Lab, http://www.cat.nyu.edu/projects/pad.html

[4] (Bier et al, 1993) "Toolglass and Magic Lenses: The See-Through Interface", Eric A. Bier, Maureen C. Stone, Ken Pier, William Buxton, Tony D. DeRose, Xerox PARC, 1993, SigGraph 93, http://www.parc.xerox.com/istl/projects/MagicLenses/93Siggraph.html and http://www.parc.xerox.com/istl/projects/MagicLenses/

[5] Data Detectors, Apple, http://www.apple.com/applescript/data_detectors/

[6] 3Dsia, http://threedsia.sourceforge.net

[7] Verse, http://www.obsession.se/verse/

[8] 3Dwm, Chalmers Medialab, http://www.3dwm.org

[9] Task Gallery, Microsoft's Research Group, http://research.microsoft.com/ui/TaskGallery/

[10] (Leach et al, 1997) "Elements of a Three-dimensional Graphical User Interface", Geoff Leach, Ghassan Al-Quimari, Mark Grieve, Noel Jinks, Cameron McKay, Interact 97, http://goanna.cs.rmit.edu.au/~gl/research/HCC/interact97.html

[11] Fluid Documents, Xerox Parc, http://www.parc.xerox.com/istl/projects/fluid/

[12] Visual Thesaurus, http://www.plumbdesign.com/thesaurus/

[13] The Brain, http://www.thebrain.com

[14] wayV, http://wayv.sourceforge.net

[15] Gesture Recognition, Microsoft's Research Group, http://www.research.microsoft.com/users/mturk/gesture_recognition.htm

[16] Speech Interface Group, MIT, http://www.media.mit.edu/speech/

[17][18] Programming by Example and Programming by Demonstration, MIT, http://lieber.www.media.mit.edu/people/lieber/PBE/

[19] Demonstrational Interfaces, Carnegie Mellon, http://www.cs.cmu.edu/~bydemo/

[20] Pictorial Janus, C-lab and Heinz Nixdorf Institut, http://jerry.c-lab.de/~wolfgang/PJ/

[21] ToonTalk, http://www.toontalk.com

[22] (Edwards, 1998) "A Tool for Creating Eye-aware Applications that Adapt to Changes in User Behavior", Gregory Edwards, Advanced Eye Interpretation Project, Standford University, ASSETS 98, http://eyetracking.stanford.edu/assets/assets.html

[23] Volumetric 3-D Display Technology, Actuality Systems, http://www.actuality-systems.com

[24] LavaPS, http://www.isi.edu/~johnh/SOFTWARE/LAVAPS/index.html

[25] Human Factors Research Group, University College Cork, http://www.ucc.ie/hfrg/

Others

Next Generation User Interfaces: Emerging Technologies and Directions

Point Density Analysis: Shape Recognition (in progress)

Process Programming (in progress)

The Elephant Mouse versus the Fish Mouse (in progress)

wayV Human Computer Interaction (HCI) observations for version 0.1 (in progress)

wayV is hosted on StressBunny.com