Losing Patience

My wife likes to do crosswords. On occasion, she’ll ask for my input, mostly about sports or music trivia. As my memory whirrs away, she might read me the letters she has already in place, or question an interlinking clue.

Our communication is enabled by what Daniel Kahneman described as System 1 and System 2 thinking. System 1 is an instinctive conversation, while System 2 is deeper thought. When searching for the name of an obscure number one from the 1980s, I might say “I should know this” and talk about an occasion I remember hearing the song. System 1 is stalling while System 2 works on retrieving the answer.

This is how we want AI interaction to work. Right now, a chatbot can be fast and dumb, or slow and intelligent. In the first case I get frustrated that my specific questions cannot be answered. In the second, I lose patience waiting for the answer. Effective interaction would mean maintaining conversation while retrieving the information I need. To date, single models have been unable to do that effectively.

The holy grail of voice-based interaction would be intelligence without the wait. It would open up the benefits of AI to a wider audience, including the elderly or those who prefer speaking to typing.

Speak, Search, Answer

The old chatbot model worked on the basis of me speaking, then the chatbot responding. This is not how humans talk. We interrupt each other to add an opinion, offer encouragement, or provide a fact. This is what makes conversation feel natural.

Yet conversation is not enough. The experience also needs to be multimodal. That’s a fancy way of saying it cuts across several technologies. AI may be watching what we are doing and interrupting with advice. At the same time it is planning a training curriculum based on its assessment of our competence. This would work when learning a new skill, such as cooking or woodwork.

A more business-based example is a chatbot that talks to clients in real time, while searching files and finding missing data. The faster System 1 layer would use filler phrases to buy the slower System 2 layer time to retrieve information. This is what I am doing when wittering about my memories, while wracking my brain for the name of a song.

Another practical example is live translation in meetings, which would enable far richer conversations with international business partners. Or a live minute-taking app that also draws graphs from the data being discussed. Here is an example of Thinking Machines AI doing both of these things.

The experience is not yet seamless, but it is close. The nearer it gets, the more questions of cost and security matter. Real-time conversational AI requires extremely fast token generation, which limits the hardware capable of running it smoothly. But we also want the interaction to be local to preserve data security and limit cost.

Olena Zhu is head of AI at Intel’s client computing group. She explains how the company is working on cloud solutions that break up complex tasks. These are then sent to local agents for completion. This hybrid model is a stepping stone to the advent of standalone AI services for businesses. This is when you will buy or sell such things as an AI tax advisor, which runs on a local network without sharing sensitive information with Google or AWS.

Developing Social Skills

An LLM is already better than I am at retrieving minor details of past sporting glories and musical memories. But my wife still likes to do the crossword and I like to look clever when helping her if I can. AI is not changing the nature of human relationships.

AI is already an effective multilingual meeting scribe, which leaves participants free to focus on the quality of the conversation. It is not yet an effective active participant, in either chatrooms or meetings.

We are testing our internal agents at the moment and at times have to turn them off when they become over-eager. Knowing when not to speak is as important a social skill as knowing when your input is timely.

AI chatbots have, by and large, been a disappointment to date. Part of the reason has been the use of a single model to do two different jobs. One is to communicate and the other is to think. AI companies are solving this problem and fluid, genuinely useful AI interaction is now within touching distance.

Questions to Ask and Answer

  1. Where would more natural AI interaction remove friction for customers or staff?

  2. Which conversations rely on people searching for information while they speak?

  3. What sensitive processes can only be run securely on our own systems?

Reply

Avatar

or to participate

Keep Reading