Posts Tagged ‘Arrogance of the Present’

Voice Recognition: NUI’s Unsung Hero

Wednesday, January 11th, 2012

I recently got asked to provide an opinion on “voice recognition”, in particular around our philosophy towards it and how we’ve implemented it across the stack.  If you can stomach it, you can see how it turned out (let’s put it this way, it opens with a comparison to the Hoff’s “Knight Rider”) and it kind of goes downhill from there but regardless, in doing the research, I learnt some really interesting things along the way that I thought I’d share here.

soundwave2First off, let’s start by asking how many of you know how speech recognition works these days?  Well I thought I did, but it turns out I didn’t.  Unlike the early approach, where you used to have to “train” the computer to understand you by spending hours and hours reading to your computer (which always kind of defeated the object to me), today, speech recognition works pretty much the same way they teach kids to speak/read, using phonemes, digraphs and trigraphs. The computer simply tries to recognise the shapes and patterns of the words being spoken, then using some clever logic and obviously an algorithm or two, performs some contextual analysis (makes a guess) on what is the most probable sentence or command you might be saying.

In the early days of speech recognition, the heavy lifting required was all about the listening and conversion from analogue to digital, today it’s in the algorithmic analysis on what it is most likely that you are saying.  This subtle shift has opened up probably the most significant advance in voice recognition in the last twenty years, the concept of voice recognition as a “cloud” service.

A year or so ago, I opened a CIO event for Steve Ballmer, given I was on stage first, I got a front row seat at the event and watched Ballmer up close and personal as he proceeded to tell me, and the amassed CIO’s from our 200 largest customers, that the Kinect was in fact a “cloud device”.  At the time I remember thinking, “bloody hell Steve, even for you that’s a bit of a stretch isn’t it?”.  I filed it away under “Things CEO’s say when there’s no real news” and forgot about it, until now that is when I finally realised what he meant.

Basically, because with a connected device (like Kinect), the analysis of your movements and the processing for voice recognition can now also be done in the cloud. We now have the option (with the consumer’s appropriate permission) to use those events to provide a service that continuously learns and improves.  This ultimately means that the voice recognition service you use today is actually different (and minutely inferior) to the same service that you’ll use tomorrow.   This is incredibly powerful and also shows you that the “final mile” of getting voice recognition right lies more now with the algorithm that figures out what you’re mostly likely to be saying than it does with the actual recognition of the sounds.  MSR have a number of projects underway around this (my current favourite being the MSR’s Sentence Completion Challenge), not to mention our own development around how this might apply within search.

Those of you that have been following these ramblings in the past will know I’m slightly sceptical of voice recognition, thinking that it is technology’s consistent wayward child, full of potential, yet unruly, unpredictable and relentlessly under-achieved.  I’m not saying my view has changed overnight on this, but am certainly more inclined to think it will happen, based on this single, crucial point.

Kinect too provides its own clue that we’re a lot closer than we previously thought to making voice recognition a reality, not just in the fact that it uses voice recognition as a primary mode of (natural) interaction but more in how it tries to deal with the other end of the voice recognition problem – just how do you hear _anything_ when you are sat on top of the loudest source of noise in the room (the TV) when someone 10 feet away is trying to talk to you in the middle of a movie (or the final level on Sonic Generations, sat next to a screaming 6 year old who’s entire opinion of your success as a father rests on your ability to defeat the final “boss” ).  If you have a few minutes and are interested, this is a wonderful article that talks specifically about that challenge and how we employ the use of an array of 4 microphones to try and solve the problem.  There’s still more work to be done here, but it’s a great start in what is actually an incredibly complex problem  – think about it, if I can’t even hear my wife in the middle of a game of Halo or an episode of Star Trek (original series of course) how the hell is Kinect going to hear? (Oh, I’ve just been informed by her that apparently that particular issue is actually not a technical problem… #awkward).

So these two subtle technological differences in our approach are going to make all the difference in voice recognition becoming a reality as part of a much more natural way of interacting with technology.  Once that happens, we move into the really interesting part of the problem – our expectations of what we can do with it.

expectOur kids are a great way of understanding just how much of Pandora’s box getting into voice recognition (and other more natural forms of interaction) will be and I suspect that ultimately, our greatest challenge will be living up to the expectation of what is possible across all the forms of technical interaction we have, NUI parity across devices if you like.  My son’s expectation (quite reasonably) is that if he can talk to his xBox, then he should be able to talk to any other device and furthermore, if he can ask it to play movies and navigate to games why can’t it do other things?  I was sitting doing my research with him the night before my interview on all of this, and we were playing together at getting the voice recognition to work.  He asked the xBox play his movie, he told Sonic which level to play on Kinect FreeRiders then he paused, looked at me and then back at the TV, cracked a cheeky smile and said, “Xbox, do my homework…”.

Local is the new Global

Monday, June 13th, 2011

“Think Global, Act Local” or so the cliché goes. Thing is, this is about to become more possible and more accurate than ever before.  What this means for us as individuals in a modern society is a classic case for "arrogance of the present", _we just don’t know_, because most of us find it hard to imagine a world of such hyper-connectivity where the world takes on a very different viewpoint where most things become available within local reach but offering global supply.

For most of us, things like Foursquare and Gowalla are amusing distractions used primarily by the authorities to help identify and track the location and movement of geeks, but in fact they (and the infrastructural elements they rely on) will ultimately become the very fabric of how we access, consume and pay for services in the very near future.

batcaveI’m OK with the fact that most people think the above is a bit of a stretch, however, I’m actually more worried about the fact that because we can’t really imagine how all this stuff will come together, I don’t think we’ve properly figured out the true potential of what a powerful, connected, _local_ view on our world means yet and as such, I think we risk being derailed (at worst) or delayed (at best) in our ability to deliver an incredibly transformative change to the way in which technology enhances our lives.

Assuming we buy the current trajectory of smartphone sales  (n.b. when do we stop calling them smart? When everybody is smart, is _anybody_ smart anymore?).  We know that pretty soon, there will be more smartphones than dumb ones and new sales of slates and phones will outstrip PC’s – the mobile device revolution is finally here El Presidente, so let’s move on and think about the really important stuff before Apple ships another iPhone and everyone gets distracted again.

There are three key areas we need to figure out and triangulate if we are to achieve the vision, these are:

  1. Location (where am I?, where are you? and what else is near?)
  2. Geographic Meta data (POI at a macro and micro level)
  3. Connectivity (people, networks and devices)

I’m not going to get everything on the table on all 3 of these in one post alone, so for now, let’s just start with a broad definition, ready for deeper exploration in the future.

Location
There are three dimensions of location that we need to supply:  Where am I? Where is the "target"? What else is nearby? And of these three it is the first that should concern us most.  We currently rely on a brilliant but outdated and vulnerable service to locate ourselves which is also extremely flawed in its ability to provide accurate and timely location information to us in our localised existence or rather, should I more simply say, indoors or in the city…

Controversy aside, we desperately need _a range_ of mechanisms to identify our location and to be able to do so in a way that is fast, battery friendly and works indoors.  Funnily enough, it actually doesn’t need to be that precise, it just needs to be within 5m, we can figure out the rest for ourselves.  Good news is, (if you read behind the headlines) we are well on the way to solving this, externally at least, we need a much better (standard?) approach for how this might be achieved cost effectively indoors.

Geographic Meta-data
We need to think about location meta data (points of information etc) at both a macro and micro level.  At a micro level this is about a taxonomy of stores, services, opening times and other ancillaries like street furniture (e.g. post boxes, gritting bins etc) , at a micro level this needs to be really extended down to a very near field level providing a much more granular view of the environment around you.  This level of detail is crucial, for example, it’s no longer enough to know that the train station has disabled access facilities, you need to know which _exact_ door is the one that has zero lip for disabled access, or which end of the train should you stand to be nearest to the exit for your particular stop etc.

North America seems to do quite well at a macro level whereas in the UK we don’t with retailers and service providers (public and private) being rather slow (myopically so) in signing up and advertising in the established platforms.  We all suck however at the micro level, and it is this information we really need to figure out how to easily acquire and on-board.

Connectivity
This is about remembering we live in an "occasionally disconnected" world.  We may have pervasive mobile broadband but this doesn’t mean that it’s always available.  As application designers however, we seem to have forgotten that.  Most mobile apps these days will only function if a connection is present – this is a bad approach.  I live in the most populous country in Europe and work in the most populous city in the world yet I still experience several occasions _every day_ when I am without signal.  This probably adds up to about 2-3 hours _a day_ when I can’t use my smart device because the app designer has not thought about local caching (and before you start dusting off that fanboy attitude you’ve been saving, I’m packing multiple devices and they’re all the same).  This is not going to change anytime soon because we lack the funds and science (we’re dealing with the laws of physics here too ya know) so we need to get over it. Design apps and mobile platforms for the "occasionally disconnected" world and we’ll be fine.  (BTW – the historians among you will remember, this is what we used to do before we got fat, dumb and lazy with the promise of mobile broadband. When patchy mobile data was the best that was available, you were grateful for it and respectful of its use, 4G connection you say?, all we had was the thin end of a damp bit of string – Luxury…).

Connectivity is also about connecting individuals (When you’re walking down the street and pass a café that your best mate is sitting in, you want to know right? Or do you?)  and it’s also about connecting devices, the whole peer to peer network thing, but played out on mobile. (Man if I was smart I’d be buying shares in Groove and Ray Ozzie now, no wait, been there, missed out on that.)  Both of these we’ll cover in detail some other time.

So you will have figured out by now, there are no answers this week, just big questions.  Great for me as it gives me more room for what I think is the most important of the 127 "big bets" we’re undoubtedly going to have for months ahead and great for you because maybe, you’re sitting out there with some of the answer, come on now, don’t be shy.

Back to the Arrogance of the Present

Friday, March 18th, 2011

One of my favourite books is Jonathan Margolis’A Brief History of Tomorrow”, (if you’re into thinking about the future like we are here, then you should really give it a read).  One of my favourite concepts from the book is something Jonathan refers to as “the Arrogance of the Present” – essentially identifying that it is hard to measure the future potential of new technology when all you have is a mind-set from the “present” from which to make the judgement.

In many ways it’s like the situation Henry Ford found himself in way back in 1903, asking for funding for his new project only to be told by the President of the Bank of Michigan that “The horse is here to stay but the automobile is only a novelty – a fad”. Now with hindsight, it’s easy to sit here and make fun of that poor bank president and how stupid he must have been, but in reality, at the time that he made that statement, he was actually probably _right_.  His judgement on the future potential of Mr Ford’s ideas was coloured by his own understanding of the society at the time and his ability to understand how it may change.

Obviously we do not possess the ability to predict the future, but more importantly, we simply cannot comprehend the complex series of changes that society makes as it continues to evolve and therein lies our challenge.

We see examples of this kind of problem every day – many new technologies are misunderstood, dismissed and downright despised because we struggle to comprehend their role in a society that is significantly evolved from the one we experience today.

Camera phones are a great example of this – when they were introduced, I don’t know anyone that was inspired with excitement about the prospect of carrying around a poor quality, low resolution camera on their phone of all things.  Fast forward to today, when that functionality is poised to change the way society works whether it’s through citizens interacting with their local council on anti-social behaviour or augmented reality solutions that make a tangible difference to the way people are able to live their lives.

There are many more examples to illustrate the point but I’ll pick just two more – social networking and street level imagery – both of which are much maligned and misunderstood. That’s not to say they aren’t with their problems, but when we think about their potential it’s crucial that we do so not in the context of our understanding of today’s society, but instead by thinking about how they might work with the society of tomorrow. 

Of course, that’s not to say we should blindly accept any new technological principle, but instead of constraining our perception of value and relevance, we must use our experience from the past to help inform the right way of getting the most from the future potential innovation by implementing it in a way that is respectful and cognisant of all we have learned along the way.

The Arrogance of the Present

Monday, June 15th, 2009

A Brief History of TomorrowIn doing some research for some video debates we’ll be doing with the FT soon (more on that soon), I’ve been reading Jonathan Margolis’ intriguing book on Futurology – A Brief History of Tomorrow.

Early on, Jonathan posits a condition known as “Arrogance of the Present” – a condition many people suffer from (and have suffered from throughout history) which is the “belief of every successive generation that at last, sophisticated, modern folk that we are, we’ve got it and indeed, we _are_ it.”

This condition means many things to different people but whatever it means – it spells trouble for those of us that believe we can do better and/or different things with technology and that, believe it or not, as clever as we think we are, there’s still a whole lot more for us to do/invent/evolve.

This arrogance is not restricted to technology either, it applies to many of the changes we face in both our professional and personal lives. Think about what people must have thought when mobile phones were first invented – “why would anyone want to carry such a device” and then think about your lifestyles today. As much as we’d like the option to not have one for a week or two – I’ll bet there’s not many of us that would want to be permanently without one.

The problem for me is that this kind of attitude is infectious in those who don’t have an understanding for what the changes may bring and worse still, feeds on itself to blow even the inane into matters of critical importance to society.

I’m asking you then, as readers of this blog to seek out examples of this “Arrogance of the Present” – highlight them for what they are (hell, send ‘em in and I’ll post them here in a special category of their own if they’re good enough) – but whatever you do, help people understand that we’re just not _there_ yet and if it’s OK with them, we’d like to on keep trying to push the envelope a little…