‘Talk to Me’ (paper 2012)

Talk to Me

Frode Hegland

Liquid Information @ UCLIC
4th Floor, Remax House
31/32 Alfred Place
LONDON, WC1E 7DP, UK

frode@liquidinformation.org

The ultimate in human-computer-interaction is often seen as ‘talking directly with the computer’ and expecting the computer to reply intelligently. This is more a problem of language than speech.

Interaction, speech recognition, AI, languauge.

1. Introduction

When we say “she said to me…” we don’t necessarily mean that we mean that someone spoke to us, we could mean that the person wrote a letter, emailed, instant messaged or texted.

We don’t tend to question the ‘figure of speech’ indicating ‘speech’- it could mean almost any ‘means’ of transfer of information.

In HCI, this is important as so much emphasis is placed on the ‘means’ or medium, like voice recognition. I will argue that what is crucial is not the means but the flexibility and richness of the interaction – how we can give the computer commands.

When we perform an action using a computer, we tend to say that we did the action, not the computer, for example “I opened the file” or “I wrote the email”.

Unless we are talking to someone very new, or tech-support, we do not state the interaction mechanism we used, like “I double clicked on the file icon using a mouse” or “I put my fingers on the keys and typed in an email”.

However, HCI in the entertainment industry has showered a huge amount of glamour on voice interaction, as if as long as we can speak, using our voice to computers, everything will be magically changes, the computer will really understand us.

To the computer of course, it doesn’t much matter whether commands are entered via keyboard, mouse, voice, eye tracker, a joystick or any other means. The computer gets a string of data, representing commands, much like get similar information from a text, an email or via voice. (Sure, voice may carry a lot of extra information than the words being spoken, depending on the context, but we tend to simplify that later, putting it down to the primary word information and maybe adding a thought of the tone “she said hello to me in a very friendly tone”. This however, is a whole different article)

The main interactions with computes today are using the mouse to point and click and give single commands and as scripting. I won’t add programming as it is used primarily to make later interaction possible. And yes, scripting is a border case between live and for future commands.

If point and click has the real world analog of using a translation book to speak single phrases to someone, then you could say that scripting is using the guidebook to make sentences, by getting one word, then the next – perhaps writing them down as you go. It’s definitively possible to have a conversation this way, especially if it’s via email or letters. It just takes longer.

It just takes longer.

That is key. It is not possible to have a flowing discussion with someone – on anything but the most mundane topics – using a translation book. You have to translate what was said to you and then you have to translate your response.

The flow goes. The rhythm of the discussion is lost.

Richard Young at UCLIC once described using Doug Engelbart’s NLS to me. I have used it myself of course and been impressed. It’s quite clear that he system is for many purposes more powerful for the trained user than our moderns gui’s.

But what Richard told me still manged to surprise me and open my eyes (paraphrased from memory): “Since you enter commands with the keyset with your left hand and tell the computer what you want to enter the commands on with the mouse in your right hand, when you are editing text for example, you don’t need to hit the ‘enter’ key. This gives the distinct feeling of flying.” Wow.

2. Learning a Language

Picture yourself an alien. You learn to communicate with humans using first single phrases of the galactic translator book. You graduate to phrases, carefully looked up before hand and you spent time translating the replies. One day you feel fluent and you can finally have a full, flowing discussion with earthlings.

Then someone shows you a computer, a brand spanking new Apple Mac OS X 10.4 ‘Tiger’ Powerbook with all the bells and whistles. Certainly not as advanced as your alien planets hybrid organic/quantum systems, but impressive for earthlings who just a hundred rotations around the sun ago had no electric computation devices at all.

So you sit down. You first get to use the mouse, which you find quite novel – it’s a bit like when you were communicating with humans in the beginnig though – one word at a time. Slow. But dead easy. You graduate to mid-level UNIX admin geek status (you are after all, a smart alien) and you script away, set cron jobs and generally feel like you did when you learnt how to speak with humans using sentences strung together from that translation book.

communicating with a computer

Learning to communicate with the computer to get it to carry out your instructions turns out to be a lot like learning to communicate with people.

But when you feel that you are ready to communicate with computers to enter commands as fluidly as in conversation with people you find that it cannot be done.

Earth computers do not have this ability.

You wonder if you are bit stupid and you’ve missed something. Asking people if they have really been so insane as not to allow for this natural evolution would just be rude. Surely, there must be a way. Finally you rally your courage and take friendly geek to the nearest pub. A few hours later you emerge with a headache and words like “AI” and “fuzzy logic” and “neural nets” as well as “voice recognition” and “wasn’t Tom Cruise neat in ‘Minority Report’?”

What has all this got to do with interacting with computers as richly as we interact with each other?

You wrack your brains and try to  remember the early history of computing back in your home system in Omega Six. Yes, you are definitively sure, this had not taken all that long to implement. What’s wrong with these humans?

At Liquid Information @ UCLIC we feel that this is amazing. And that there is something we can do to fix it.

It’s time to give users the power of programmers. Live.

Our second year plan at Liquid Information therefore becomes: Facilitating three levels of interaction: 1) Hierarchical menus. 2) With added options. 3) With logical connections between sentences.

Added options will be provided through the user typing in a ‘;’ command, halting the execution of the command and allowing the user to add options.

Logical connections between sentences will be provided in a similar manner, with the user typing ‘>’ to indicate that the result of the last sentence should be what the next sentence should act on.

3. Concluding

To go back to the examples of using voice to interact with computers: When computers can deal with sentences, carrying out the instructions or commands given in those sentences are about the same as carrying out programs. They still won’t be ‘clever’ or ‘self-aware’ or ‘conscious’.  You cannot ask for answers “which book is better” but you can ask for straight forward, explicitly defined tasks to be carried out, if you are very clear: “compare this book and that book and tell me which has more quotes”. Or “find the oldest document which contains this text”.*

Then we will have fulfilled what we feel we see in science fiction movies of people talking to computers.

Without waiting for AI to build computers to do the thinking for you – computers will get to the next level of helping you do your own thinking.

__

* Having been asked what the menu would look like for this item, it’s included here:

The menu itself: [refer to external information] [search] [Google] ; [show only] [oldest document]

The corresponding keys: r s g ; s o

Now, one issue here is how to deal with the ‘;’ character. It is designed to tell the browser code to wait for more options. Can we make a new level of hierarchy appear if the user presses ‘;’?

‘>’ is even harder to present visually. I think that if we do this, we should spawn something like a Mac OS X Tiger Automator workflow. Or simply something which looks like an advanced ‘Find dialog’.

This is a quotation.  This is a quotation.  This is a quotation.  This is a quotation.  This is a quotation.  This is a quotation.

This is a quotation.  This is a quotation.  This is a quotation.  This is a quotation.  This is a quotation.

This is the references title: references.

[1] this is a reference.