Sunday, July 1, 2012

Siri, Intelligent Assistants, Missing Context and Now Google Now?

Lets me ask you... does Siri get what you say? I see! And does she respond you the way a true assistant would do?...

The first time I tried a voice-to-computer interaction was in 1998, the IBM ViaVoice came bundled with IBM PCs by then. It was a commercial product and needed voice training with the user to improve it's accuracy. After the longest dictation session in my life with ViaVoice, it learnt to show me the start menu when I said "Start Menu" at the first attempt with luck. But it never learnt to type what I spoke with my typical Sri Lankan accent and broken pronunciation.

14 years later I tried out Siri. She didn't wanted a voice training, quite nice of her. She caters number of various accents at a pretty sophisticated backend server, where she can learn from thousands if not millions of voices from different regions and accents. Well if she understood what I was asking 5 out of 10 times she gave me an acceptable answer within limits of her capability.

"Why can't she understand?!!" (pardon the pun at common quote!)

Well lets consider how do I understand what you say, when you say something. You say something to me and I usually get what you say even when you were not 100% audible. How? I see your lips moving, but is that it? No. I see the context!

We as living beings, keep updating our context of current existence based on various inputs. Let's figure what kind of inputs these are.

1. Knowledge of where we are
2. What we have seen for past few moments and what we are seeing now
3. What the people involved being talking about past few mins
4. Time of the day
5. General details of people involved with - their general interests, work, relationships, etc.
6. Current affairs - local and international, weather, news
7. Understanding of concepts rather than terms and keywords (think meaning of "being screwed" you get opposed to reference to the keywords "screws, hardware, tools, fixing, department stores")
8. .... etc.

The context that we have built based on these would enable us to recognize the words other person says and predict what he or she means. Therefore if you want to make an app that really understand what you just said, make it context aware... that is the key to make a better responding assistant.

But how can a phone app do this? Except for point 7 and 8 your smartphone more of less already know these. What needs is to aggregate all these in a logical way and have that knowledge as the base that for assistant's intelligence. Too much to communicate with server and handled by backend server? May be, but processing it in the cloud would be the best way to capture and manage more context information about a user.

I've seen few attempts towards reaching this. Few months back these bunch of guys who made Iris (Siri clone for Android) had popped the concept app called Friday App. The idea is to keep track of events in your life through your interactions with mobile and be kind of a journal of your life. And Google just announced "Google Now" at Google I/O 2012 along with offline voice support which is one step ahead towards making the app aware of user's context. Good thing about this is; if anyone to do this better - it would be Google! The reason is Google knows about you, lot more than you think specially if you sport an Android phone. The catch about this is technology would influence your life more than it ever did. The decision of which route to take or which restaurant to go for would no longer be sole decision of yours but highly influenced by your smart(er) phone. Whether its a good thing or a bad thing would be a whole other discussion. But isn't this what you have been asking for all this time?

No comments: