I was in the middle of my usual morning routine—40 reps of How Will I Pay These Damn Bills and 30 minutes on the treadmill of Where Did I Go Wrong—when my laptop chimed and an email notification burst onto my screen.
Hi John, Stefanie let me know this meeting will be Tuesday. I’ll go ahead and send out an invite. —Andrew
Then Andrew sent me eight more emails. Apparently I would be meeting with several people, and he was sending me a flurry of invitations.
Then he wrote again, this time to confirm my attendance.
“I can attend at 4 as scheduled,” I responded.
No problem. I’ll send out an invite for Wednesday at 12:30 pm EDT.
“I think you misunderstood,” I responded. “I said I can attend at 4 pm. I don’t want you to reschedule.”
I’m sorry about that, thanks for letting me know. What would you like me to change about this meeting?
“I would like you to change it back to 4 pm.” Now I was anticipating eight more emails to undo the needless swap.
At that point my inner Luddite stirred, because Andrew Ingram—his full name, I soon learned—isn’t an overworked personal assistant whom I should cut a little slack; he’s a scheduling bot powered by artificial intelligence, just one of the many “conversational interfaces” tech companies are throwing at us in their endless quest to maximize efficiency. We’re learning to tell Alexa which songs to play, to ask Nerdify to suggest research materials, to distract our kids with Hello Barbie, and to order pizza by talking to the dashboards of our cars. Last year 8 million people talked to a conversational user interface called Cleverbot for no other reason than they wanted someone, or something, to chat with.
Some market researchers predict that by 2025 more than a billion people will have had an encounter with an AI assistant. And when humans finally rise up against our computer overlords in the decades to come—even if I’m hobbling on a cane with a tireless carebot at my side—I’ll head for the barricades to shout my battle cry: “Remember Andrew Ingram!”
Man, this dude is annoying.
OK, that’s peevish. However trivial it may sound, creating an AI program to successfully schedule meetings is a monstrously difficult challenge, and the people who are trying to perfect Andrew Ingram—the 53 full-time employees of X.ai—are some of the most dedicated nerds you’ll ever meet. Dressed in T-shirts and jeans, they bustle about their Manhattan offices with the intensity of NASA engineers preparing to launch a moon shot.
If they can perfect Andrew Ingram, they’ll put X.ai at the forefront of workplace innovation. Americans schedule approximately 25 million meetings per day. Multiply that by the hourly wages all that scheduling sucks up, and you see how much time, money, and mental energy X.ai could save. As it happens, there’s been fierce competition in the online scheduling niche for more than a decade. First came companies with names like MeetOMatic and MeetMax, where users could enter a few possible times into an online calendar and the other participants would click on the slots that worked for their schedules. But these services all faced the same problem: There’s no time in the lives of busy professionals for yet another finicky computer program. What people really needed was a machine that worked just like a human assistant, something they could tell: “Set up a meeting with Dave Jones next week.”
But until the past few years, AI was still incapable of processing human language accurately enough to do that, so companies emerged with a new hybrid approach, a mixture of machines and humans, where algorithms crunch calendars and meeting locations while human assistants reply to customers. Still, the salaries of the assistants mean that monthly fees for these services can reach hundreds of dollars.
The best way to bring those prices down is to cut out humans altogether and create a fully autonomous AI scheduler, a goal that the AI experts I consulted characterized as ranging from “very, very hard” to “impossible.” Even the most advanced conversational interfaces struggle with “natural language understanding.” (AI code for “So that’s what this moronic human means with all the pop culture references and inside jokes!”) That’s the challenge Dennis Mortensen took up when he started X.ai. An energetic entrepreneur with a craggy action-hero face and a background in computer analytics, Mortensen started carrying around a notebook he calls the List of Hate when he was a teenager in 1980s Denmark—whenever something annoyed him, he’d pull out the notebook and jot down the offense. Why do we have to wait so long for pizza deliveries? Why do I have to stand in line at the bank? When he was ready to start his first company, he sorted the candidates into two piles: solvable and unsolvable. Over the next 20 years, his List of Hate generated two successful analytics startups, Visual Revenue and Canvas Interactive, that gave clients insight into their companies’ web traffic.
In 2013, Mortensen was ready for another session of monetizing his competitive annoyances. This time the hands-down winner was scheduling meetings. For more than half a century, scientists have tried to develop computer programs that can interact like humans with humans—the first chatbot, Eliza, was coded in the ’60s by the big brains at MIT, and it was pretty good at recognizing conversational keywords and responding from a script. (Change the topic of conversation, though, and Eliza was lost.) In 2016, Amazon launched the Alexa Prize, an annual competition to build a bot that can “converse coherently and engagingly with humans on popular topics for 20 minutes”; the prize has now reached $3.5 million. (See “Fighting Words” in issue 26.03.) And since 1991 developers have competed annually for the Loebner Prize, a Turing test competition in which bots try to convince human judges they are human. It wasn’t until the early 2010s, when Siri and other recently launched conversational interfaces began showing varying degrees of promise, that the technology arrived with the potential to make Mortensen’s dream a reality.
Mortensen pitched the idea to VC firms eager to get in on the AI boom, and within a year he hired a team of data scientists and software engineers and started tackling hundreds of early decisions: Should the tone of the assistant’s responses be formal or friendly? (A mixture of both, they decided.) Should it have a gender? (Yes, and users can choose Andrew Ingram or his “sister,” Amy.) Should Andrew and Amy appear in the form of an avatar? (No talking paper clips!) To make sure Amy’s and Andrew’s voices stayed consistent, Mortensen even hired an “AI interaction designer” to study the chatter between the Ingrams and their human correspondents. It seems even machines need speechwriters.
“You think humans are reasonable, but soon you figure out they’re crazy,” Mortensen says.
Refining his algorithms’ ability to respond in ordinary human language took a year and a half. Crunching data such as times, places, and cancellations took a little longer. But teaching the AI to process and interpret human speech turned out to be harder than Mortensen thought. His engineers kept running into what they considered “edge cases,” or unexpected quirks in the way people communicate. What if, say, a human asking for a meeting throws out something irrelevant, like “How great was that wedding in Acapulco?” A human would recognize that as small talk, but a machine might end up scheduling the meeting in Acapulco. If someone says they’re too busy to meet now but “we really should have coffee sometime,” a human would know they’re being brushed off. And what is a machine supposed to make of “Let’s meet in John’s office?” There are so many Johns! Which John does the stupid human want?
As Mortensen puts it, “You think humans are reasonable, but soon you figure out they’re crazy. They say things so ambiguous that even you and I would have a hard time figuring it out. Or they’ll say things that they believe are true but are wrong.”
Mortensen and his programmers saw two ways to solve the natural language understanding problem. They could feed every possible variation of syntax and grammar into a database, which still might not work. Or they could rely on machine learning, which is the agent and engine of advanced artificial intelligence. When you, a human, see a hairless sphynx cat for the first time, your brain summons the platonic cat composite created through observation and experience, and produces an instant response: “Yeah, that weird naked thing that looks like a big rat is actually a cat.” To get AI to make that leap, though, scientists have to start by feeding cat and noncat photos into the AI so the algorithm can compare all the examples and identify all the similarities and differences between the images.
Eventually, with enough cat data and enough corrections on its edge-case mistakes, the AI will create that platonic cat composite and work through the unusual-cat problem on its own. But words like learn and think imply human qualities the computer doesn’t really have. It’s just doing math, running a probability test against the data in its system. That’s why they call it “artificial” intelligence.
Mortensen went the machine-learning route, and after spending $30 million on three years of what he calls “raw R&D,” he reached the point where it was time to put the Ingrams to work with actual customers. He launched the first edition in October 2016, with an entry-level price of $39 a month. It’s now $17 a month. He won’t reveal any sales numbers or customer retention rates, because they’re still in the early stages of the ramp-out, but the figures were healthy enough to draw an additional $10 million in VC funding in August 2017. (Total investment in X.ai is now $44 million.) Mortensen says the Ingrams have handled 10 million emails and signed up employees from companies such as Microsoft, Uber, and Slack. Eventually, he envisions the Ingrams will simply reach into everyone’s calendars and set up meetings effortlessly. “Scheduling nirvana,” he calls it.
In my experience so far, though, enlightenment is still a long way off. That’s because Mortensen faces an even bigger challenge than natural language—human psychology. We get irritated after three scheduling emails, for example, but machines are tireless. “We’ve seen some AI go into thousands of messages,” Mortensen says.
“Speaking of thousands of messages,” I tell him, “Andrew sent me nine emails just to set up this visit.”
“It would be much nicer to do you in one block,” he says. “But we don’t support that yet.”
In the meantime, he has 105 human “trainers” in the Philippines working around the clock to gorge his algorithms with data to improve the efficiency and accuracy of the AI. These employees are not, repeat not, the secret human assistants that some tech journalists accuse him of using to prevent scheduling mistakes. His creation does everything without human assistance, he says. The trainers are just there to teach it how to do everything better.
In a highly secured building on the outskirts of Manila—I had to give the security guard the serial numbers of my phone and laptop and couldn’t even use a pen and paper on the production floor—40 young Filipinos are sitting at tables like travelers checking their Facebook pages in an internet café. They’re mostly in their twenties and early thirties, college graduates or defectors from offshore call centers. Like many Filipinos, they speak perfect English. But my escort only lets me talk to one of them for 10 minutes—the X.ai computers monitor employees for “time spent per task,” she says, and my presence would distract them. She also tells me not to ask for any names, because it would make them uncomfortable.
I sit down next to a young woman and watch her sliding words and numbers into boxes on a template. She tells me she’s studying for a business degree and working here full time, and at the moment she’s working on emails with difficult time zones. Sometimes people just mention the city they’re in, she says, which is a problem because there are so many cities with similar names. Or they’ll spell the name of their location wrong. Or they’ll confuse Eastern Standard Time and Eastern Daylight Time. The X.ai algorithms have to learn how to recognize and account for all of those problems, so the engineers have to break down sentences into carefully crafted data sets and subsets. She spends her workday feeding data to the machine-learning algorithms by highlighting every word that seems to pertain to a time zone and dragging it into the appropriate box on the time zone template. This is called “named entity recognition.”
When my time is up, the supervisor hustles me out of the room.
In a nearby conference room, I meet the training team leader, a cheerful woman who looks like a middle school teacher. My escort introduces her as Zoila—apparently giving me a last name would be another comfort-zone intrusion. Which seems weirdly secretive after they invited me all the way from New York to see how the magic works, and even weirder when I realize that I’m here to watch a video call with X.ai’s chief data scientist, Marcos Jimenez Belenguer, who’s calling in from New York.
For the next hour, as he talks to Zoila and X.ai’s VP of AI training, Liying Wang—I know her full name because I met her in New York—I get a glimpse of the crazy-human problem. For example, this email:
“I can do Monday after 3 pm Hong Kong time, but Tuesday I’m leaving so I can only have a meeting starting Wednesday afterwards, anytime after 3 pm Hong Kong time.”
Zoila says her trainers are stumped. If the human is saying that, after Wednesday, 3 pm is always good, they’re supposed to put it in the “recurring availability” slot. But then what do they do with Tuesday?
Jimenez Belenguer ponders this for a moment. His engineering and data science teams designed the templates to feed the right data to the machine-learning models. They’re constantly adjusting those models and templates to home in on particular language issues or add new features. So the question is whether this email can fit into the model or whether they have to do another redesign.
Yes, he decides, “after 3 pm” is in fact a recurring availability. The problem is that Tuesday is a “hole” in that recurring availability, and they don’t have a way to represent a “recurring time with a hole” in their latest temporal model. “It’s tricky,” Wang says.
Here’s another one: “I’m free most of the week of August 7. Feel free to schedule anytime from 7, 8, 9, or 10, preferably in the afternoon.” The trainers think the last four numbers in the message are dates, but the date template doesn’t have enough boxes for all of them.
It’s another edge case, Jimenez Belenguer says, and if the engineers or trainers make too many mistakes, as humans are prone to do, the machine will learn to make the same mistakes. Sure, they can build a template with more boxes. But at some point they’ll have to stop rewriting the models and instruct the algorithm to ask the customer for clarification. That’s their default fail-safe option, but they try to avoid it as much as possible because customers will get annoyed if Amy or Andrew ask too often. I know the feeling.
Until this point in my reporting, I’d been on the receiving end of the Ingrams’ overtures, but not a user myself. The time had come for me to sign up for an Amy or Andrew of my own. To give myself a basis for comparison, I decided to attempt to schedule meetings using both X.ai and one of its competitors, Clara Labs. Launched nearly three years ago, in the same month as X.ai, it is one of the human-machine hybrid services that Mortensen was trying to undersell and out-innovate. Clara’s approach is known as “human in the loop”—the idea being that humans add value that no machine could ever reproduce. In fact, its founders reject Mortensen’s “fully automated” dream so completely that they put the difference into their scheduling assistant’s first hello: I’m Clara, your human-in-the-loop assistant.
I join X.ai first. The response comes a few minutes later:
I’m Amy and, starting today, I’m your personal scheduling assistant.
All you need to do is CC me (email@example.com) when you’d like to schedule a meeting, and I’ll take over the tedious email ping pong from there.
To get started, she suggests I connect her to my calendar and enter my address and my meeting preferences—time of day, favorite coffee shop, etc. She ends the lesson with a cheery sign-off: Always at your service, Amy Ingram 🙂.
Time to set up my first meeting! I send an invitation to an editor, cc’ing Amy as instructed, testing her with a vague proposition about getting together. “I’m going down to Union Square on Friday for a 2 pm meeting, thought we could do coffee or lunch before—maybe 12 or something?”
Things get complicated quickly, and somehow Amy ends up proposing to my editor that I meet him at his home. Because I’m bcc’d on her emails with him, I see the mistake right away and jump in to correct her.
I sign up for Clara and try a similarly vague message. But instead of engaging in needless back and forth, she responds to me directly right away:
Kindly let me know the exact address of where you’d like to meet.
Tech Giants Want to Chat
How the Big Five are faring in the race to build the best conversational interface.
By Saraswati Rathod
Amazon has partnered with manufacturers like Toyota and Sony to incorporate its AI interface in their devices. Meanwhile, the company is hosting a $3.5 million competition to build a bot that can successfully engage in idle chitchat.
Siri, the most widely recognized virtual assistant, is now also running Apple’s HomePod. Over the past few years, Apple has been reworking Siri so that it not only responds to your needs, it anticipates them.
Facebook users can order flowers or schedule a ride-share by pinging a bot on Messenger. But not every launch has gone so smoothly. Earlier this year, the company shuttered M, its part-AI personal assistant, because it required too much human intervention.
Armed with its database of search results, Google Home is six times more likely than Amazon Alexa to provide an answer to random questions, according to digital marketing agency 360i. But while Google may slay at trivia, the two companies are still racing to integrate into TVs and cars.
Microsoft’s chatbot, Zo, can hold lengthy conversations and play games with users. That tech helped Microsoft develop and improve Cortana, its personal AI assistant, which sends users friendly reminders about promises made in previous emails.
To learn more about Clara—which charges customers anywhere from $99 a month for the Essential package, which includes scheduling 35 meetings, to $399 for the Executive package, with 110 meetings—I call the company’s founders, Maran Nelson and Michael Akilian. In 2014, Nelson was sitting in a San Francisco coffee shop with Akilian, her best friend from high school, telling him about her plan to gather people who were interested in technology and social problems into some kind of think tank. She’d been putting out hundreds of calls and emails to invite people to interview, and as Akilian remembers it, “Her email inbox was totally overwhelmed and overflowing. She was trying to schedule all these people and she said, ‘I wish there was something where I could just say, “Hey, I want to talk to these 50 people in the next three weeks for 30 minutes each,” and that’s it, it’s on the calendar.’ ”
Like Mortensen, Nelson and Akilian set out to program response templates and keyword recognition. But they didn’t try to raise $30 million and spend three years on natural language R&D. “Intelligent interfaces have been the fetish of the entire Silicon Valley community since its inception,” Nelson says. “But natural language processing is really far off, so we conceived of ‘human in the loop.’ ”
That’s where the Clara remote assistants come in. When the Clara AI has a high degree of confidence in its proposed response, it can send the email without bothering a human. But in all other cases, the AI sends the text in question to a CRA like Cat Moore, a 28-year-old neuroscience student from Georgia who works from home. “The first thing we do is read the whole email for context to have an idea about what’s been going on,” she explains. Complications arise with requests like big meetings for 10 people. Those emails can take her 10 minutes to figure out.
Sometimes she customizes the response template a little to add a human touch. It doesn’t seem right to respond to “I can’t make it to the meeting because I was just in a car accident” with “No problem! When do you want to reschedule?” Sometimes the emails say “Sorry, can’t do it, my father died.” That gave Clara’s engineers the idea for an “empathy cues” project. Soon the CRAs had new templates with human touches like “I’m so sorry for your loss.”
“Some things are easier to automate, and some are much harder,” says Jason Laska, who runs Clara’s machine-learning program. “And sometimes you really need a person to do it.”
When I was responding to a message from Clara, I knew there was a human on the other end, so I always started out with “Hi Clara” and thanked her when I was done. But after my first few turns with fully automated Amy, I felt stupid for exchanging pleasantries with a machine and sent back cold and mechanical responses. I couldn’t
help wondering: Does talking to a machine make you act like a machine?
I decided to run another test. I asked four people to sign up for Clara and X.ai and send me invitations for a meeting. When their emails came in, I responded with “Sorry, my dad died.”
Clara offered her “deepest condolences” before offering to reschedule the meeting.
Amy took a different approach: I’m so sorry, but I am unable to respond to your last message. It’s possible that it isn’t related to scheduling a meeting or that I was unable to understand it. If this is a message I should take action on, please try rephrasing your request and emailing me again.
I guess I’d discovered another edge case.
As one of X.ai’s senior engineers admitted, in a rare unguarded moment, “In any logical system that you build to automate anything, there’s always at least one case it should be able to handle but it can’t. Like everything related to human logic, it’s a bottomless pit.”
Joshua Levy, one of the AI engineers behind Siri, is cautiously optimistic that we’ll have a consistently reliable, fully autonomous conversational interface in the not-too-distant future: “I’m not saying we’ll never solve the language problem—probably we will—but right now it’s really not solved.” It’s likely one of the reasons why Facebook recently killed off M, a high-profile virtual assistant beta launched in 2015: Too many of the chatbot’s tasks required expensive human intervention. Chatbots have come a long way since Eliza, but not far enough. At least not yet.
For Mortensen and the globe-spanning X.ai staff, the question is whether Andrew and Amy will frustrate or disappoint too many customers on the way to natural language understanding. Mortensen says the Ingrams are now correctly executing 99 percent of tasks, but a message can’t get much simpler or clearer than “I can attend at 4 as scheduled,” and Andrew screwed that up the first time I used him. Mortensen’s quarantine on his consumer retention rates and revenue is reasonable given that X.ai is both a startup and an active R&D enterprise, but the more important question is whether the company will have enough money to continue iterating, innovating, and keeping customers happy until its technology matures and goes mainstream in however many years.
In the bubbling VC market for AI, a good way to raise money is to call yourself an AI company and hire humans to do much of the work until you don’t need them anymore. But Clara’s founders believe we’ll always need them. “Our highest value is reliability,” Nelson says, and so even as the company’s developers work to improve their natural language AI—roughly a quarter of Clara’s tasks are fully automated—they aren’t planning to sideline the humans who maintain quality control and come up with ideas like the “empathy cues” project.
Which vision will win? Will it be “Let us rise to the stars hand in hand with our loyal AI assistants”? Or that ruthless maxim of modern life, “The company that eliminates the most humans wins”? Mere humans, we’re left to wait patiently while our unlikely champions—two scheduling bots, of all things—march into battle to fight out the architecture of our future.
John H. Richardson wrote about brain-computer interfaces in issue 25.12.
This article appears in the June issue. Subscribe now.
Listen to this story, and other WIRED features, on the Audm app.