Conversational Interfaces: The Good, the Ugly & the Billion-Dollar Opportunity

rw-book-cover

Metadata

Author: Julie Zhuo
Full Title: Conversational Interfaces: The Good, the Ugly & the Billion-Dollar Opportunity
URL: https://lg.substack.com/p/conversational-interfaces-the-good

Highlights

But what was really so magical about ChatGPT? Certainly much has been said about its intelligence, aka the quality of its answers. It performs at the level of a high schooler college student now junior employee! It generates an endless stream of bedtime stories and marketing copy! It offers a sympathetic ear. But the true magic of ChatGPT as a consumer product goes beyond its encyclopedic knowledge. Its interface — a conversational text box that’s instantly usable by the entire world — quietly ushered in the next era of user experience design. (View Highlight)
Alas, the conversational interface is also where AI design innovation is stuck today. Let us deconstruct and examine closely its strengths and weaknesses. (View Highlight)
The next best thing? Effective translation. Effective translation is the designer’s Holy Grail. The entire design discipline can be distilled into the craft of translating a creator’s intent into a user experience that fulfills the desired intent. This is precisely where AI technology shines. (View Highlight)
And the AI chat box interface, popularized by ChatGPT, feels obvious to use because it banks on two interactions every digital citizen already knows:
1. Conversing in natural language — something we’ve practiced since we were two years old.
2. Using an SMS / messaging interface — 25 billion texts are sent a day. A DAY. So yeah, this pattern is already well ingrained in our mind (View Highlight)
The technology powering ChatGPT had already launched prior to November 30th, 2022. It was just wrapped in a different interface. Very few people paid attention. Why did ChatGPT explode in popularity? Because of its obvious chat interface, which everyone intuitively already knew how to use it. (View Highlight)
. The blank page problem Gartner famously noted when it comes to data that “80% of the value is in asking the right questions.” Even before the era of ChatGPT, the Internet was chock full of ways to get answers. (View Highlight)
The problem with a blank chat box on a blank page is that it violates the first rule of a high-quality user experience: it isn’t obvious what I can do. A blank page box puts the burden on the user to learn what to use it for. This is not a problem for early adopters and high agency people (aka everyone reading this), who love the game of exploration and thrill of discovery. But for the broader world, a blank page box is intimidating. A blank page box is lazy. (View Highlight)
A blank page box reminds the user of Google Search, which famously designed itself to be an efficient router to other destinations. The founders felt people should spend as little time on Google Search as possible, which I doubt is the goal of today’s AI companies. The user experience of a conversational chat interface should help people understand how to get the most out of it. Where are the templates for key use cases? Where are “trending prompts” and highlighted examples so users can learn from the community? Where are suggestions to continue past conversations? (View Highlight)
Today, Twitter operates as the user manual for how to use AI services; there is enormous opportunity for the services themselves to seize the opportunity to grow engagement and discovery. If the success of social media platforms has taught us anything, it’s that starting with something to react to works far better than showing a blank page. (View Highlight)
. The iteration problem You know what’s awesome? Asking an AI agent to create a raccoon battle game and getting something that works in <10 minutes. You know what sucks? Trying to refine that raccoon game to match the vision in your head. (View Highlight)
Nothing wonderful comes out fully formed. A good creator’s journey is one through the dark ravines and jungles of refinement. Conversational UIs are great at quickly getting to the first 70%, but suck at offering easy controls over narrower areas for iteration. (View Highlight)
Typing instructions to my agent “Can you change the title from X to Y?” or “Can you change the raccoon to be cuter?” and then waiting while they execute the change is a frustratingly slow experience. (Not to mention—sometimes they change other elements of the game I didn’t even intend!) (View Highlight)
There’s a reason we’ve invented WYSIWG buttons and selectors and inputs. Let’s not throw that away. Sometimes, it’s faster to click on a button and rapidly change the button radius from 10 to 12 to 16 just to see how it’ll look. I’m heartened to see editable canvases with documents and code emerging as a pattern (although I do hate “modes”). I expect we’ll see more adoption of AI-gives-multiple-variations for creative refinement. (View Highlight)
. The input-output problem Text is an awesome medium because using the written language is obvious to many, and because there’s thousands of years of prior art invested into making language rich and expressive and clear. Text is a limited medium because typing and tapping sucks. And a picture is worth a thousand words. And I’ll know it when I see it was the best a Supreme Court justice could come up with to define obscenity. It’s faster for humans to speak instructions than to type them. It’s faster for eyes to scan a response than to listen to a voice read the same content. (View Highlight)
And yet, the status quo is to assume that input and output modes should be the same. This assumption holds if I’m driving, or there’s multiple people in a room, and I want voice in and out. But if I’m doing solo productive work (which is most of the time), why not default me to the more efficient input / output mode? (View Highlight)
If AI services can understand the user’s intent, it becomes even more obvious what the ideal input and output modes should be. Want to get a team aligned on what we should build? Skip the PRDs and essays and go straight to prototype. Want to help a user redecorate their room? Let the user express what they want via moodboards rather than sentences. Want to cheer a user up? A warm voice does far more to show support and connection than words on a screen. Before we slap a conversational interface on everything, let’s ask ourselves: what are we trying to accomplish? And what is the most obvious method of input and output to achieve that goal? (View Highlight)
The scoping problem Ever had an oblivious friend? Someone who is well-meaning, but has an opinion on everything and has no real clue what they’re actually good or bad at? Ask them about politics and they’ll confidently rattle off what really needs to be done. Complain about a problem and they’ll tell you how to fix it. Wise people know their scope. They can predict with great accuracy what they know better than you, and what they’re ignorant about. Conversational AI today feels more oblivious than wise; it doesn’t know what it doesn’t know. (View Highlight)
When you ask an AI agent to produce something outside its zone of competence, it doesn’t inform the user: “This is beyond my current capabilities.” After repeated attempts and corrections, the agent does not suggest: “Let’s take a step back and try a different approach; this one doesn’t seem to be working.” The AI doesn’t know whether it’s 90% or 60% or 20% confident, it doesn’t yet analyze the trustworthiness of its sources, it doesn’t fess up: “I’m not sure. My opinion leans X because…. but I’m uncertain about….” (View Highlight)
In the human world, we know that good feedback mechanisms not only improve an individual’s work, they also reflect back to that person what their strengths and weaknesses are. With that knowledge, the person can do a better job of selecting the appropriate scope for their work and asking for help when a task is beyond their level. (View Highlight)
5. The personalization problem Golden Opportunity This is the one I get giddy over because it’s the next step function change for user experience. (View Highlight)
Human interactions are slippery, dynamic things, the alchemy of not just each participant’s unique brain but also the group’s shared history and the moment’s specific context. (View Highlight)
The tech world has already embraced personalization, having built an era on recommendation engines from Tiktok’s “For You” to Instagram’s (eerily) relevant ads to Netflix’s “Up Next.” (View Highlight)
The next era of personalization moves beyond what content we show to how we shape its presentation. (View Highlight)
If a service knows that I am a visual learner, who likes metaphors, who prefers an honest-no-bull-shit style, then translate the message to best connect with my brain. (View Highlight)
If I’m asking for an explanation of fluid dynamics, give me an interactive diagram. If I’m struggling through a decision, relate it to a plot point from my favorite movie. If I’m asking for an essay critique, for the love of god do not give me a compliment sandwich. (View Highlight)
Get to know me! Check in with me on how I like things! Ask me questions not because the algorithm is optimizing for more of my time spent, but because it’s trying to get smarter about delivering me a superior experience. (View Highlight)
One particular area of low-hanging fruit that I’m surprised hasn’t gotten more attention is AI-assisted onboarding. I’m not talking about heavy-handed wizards that everyone races to skip past; I’m imagining the feeling of a great first meeting with a new team member. (View Highlight)
Sure, an AI service can quietly observe a user’s reactions and behavior over time to learn how to personalize—we humans do that too, but our most obvious method is asking relevant situational questions. If our boss says, “Hey, I need you to do X,” we might follow up with, “Why is X important?” or “What does success for X look like?” Knowing this makes us more likely to do work that matches or exceeds our boss’s expectations. (View Highlight)
Today, many AI chatbots seem nearly indistinguishable from one another. Without active direction, the answers I get are nearly indistinguishable from the answers the next person gets. The opportunity here is vast. (View Highlight)
Through better questions and better listening, through a deeper understanding of our quirky, distinct brains, the next generation of products and services can 10x their translation efficacy, whether for learning, productivity, entertainment or support. (View Highlight)
Conversational interfaces are magical, yes, but let’s not get stuck at the hype station and forget that the train of quality continues on in its pursuit of ever more obvious interactions enabled by AI’s technological breakthroughs. (View Highlight)

Pelayo Arbués

Explorer

Recent Notes

Self-proclaimed experts

My failure resume

Tres Millones de viviendas

Conversational Interfaces: The Good, the Ugly & the Billion-Dollar Opportunity

Metadata

Highlights

Graph View

Table of Contents

Now Reading

New platform, familiar risks: Zillow and Expedia bet on OpenAI’s ChatGPT apps rollout