Advertisement

  • News
  • Columns
  • Interviews
  • BW Communities
  • Events
  • BW TV
  • Subscribe to Print
  • Editorial Calendar 19-20
BW Businessworld

‘We Have Built A New Ecosystem’

Photo Credit :

Microsoft research has a footprint that extends from Redmond to Cairo, New England to India. In 20 years, the organisation has launched projects on at least 60 areas of technological advancement. Microsoft's chief research and strategy officer Craig Mundie met Businessworld to talk about its current research focus involving the use of natural user interface, or NUI.

Edited excerpts of the interaction:

You have been with Microsoft for almost 20 years and have worked closely with Bill Gates. How has your role evolved and in what aspects of the company are you most involved now?
When Bill retired, I essentially took over most of his role. For the past five years, I have been overseeing our research organisation. At Microsoft, Research and Development are two separate things. Many people talk about R and D as if they were one thing, for us, R is separate from D. We have a centralized corporate research function, which just had its 20th anniversary this year. Bill had made a very early commitment to the idea that we should advance the state of computer science in a largely academic model of research -- we publish our work and it has become the world's largest single facility for computer science research.  

Another part of my job in the 19 years that I've been there, has been to think about starting new businesses in the company. So, many of the things that are very prevalent today in the world, are things that I worked on in the early mid-90's at Microsoft. These laid the foundation for our efforts in this space today and for the years ahead. We continue to look at new capabilities. And I continue to have a role in the oversight of things in the company from a governance point of view as well, in particular along with Brad Smith, who's our General Counsel. Together, we own the oversight of Microsoft strategy with respect to intellectual property. 

You are a scientific advisor to President Obama. What does that role involve and what are the issues you address?
Outside of Microsoft, the company's been good enough to allow me to consult with governments of the United States and other countries around the world on issues that are important to us and to the US administration. That's how I've come to be the sponsor of Microsoft for a lot of our dialog with governments and policy people around the world, particularly China, India, Russia, and a number of other places. With Bill's retirement, I got to refocus a lot on the issues in Europe as well. I served the last three US administrations as an advisor to the President, at first, 10 years ago, in national security areas, and then when I was appointed by President Obama as one of his science advisors. It's a group of 20 people that essentially do studies at the request of the President, and I've participated in that for the last three years. I love the job because it has such a broad range of issues from, the leading edge of technology to many of the pressing problems that societies face around the world. So it's a great place to be, and a place where I think Microsoft will continue to play an important role.











FAST FACTS
Career: Started his career in 1970 and worked on operating system development while still studying at Georgia Tech, USA

Roles: Mundie has spent much of his career building startups in various fields, including supercomputing, consumer electronics, education, healthcare and robotics. He joined Microsoft in 1992

Education: Bachelor's degree in electrical engineering and a master's degree in information theory and computer science from Georgia Tech

What areas of research is Microsoft focusing on, in what is thought to be a post-PC era?
From the technology point of view, many things are happening. But I think there's two that Microsoft has had a big focus on, for the past few years, and where I think we have made some tremendous progress. One of those is in the concept - I think I actually named it in 2007 - NUI or natural user interfaces.

Up until the launch of Kinect on Xbox, I don't think people fully appreciated what we meant by a transformational change in the way people interact with machines. There had been many attempts by us and others to have the computer emulate different human senses: touch, vision or speech. But, almost invariably, they were just applied as an alternative way to operate the graphical interface. My belief is that in the end, it will be the more radical transformations of how people interact with computers that will really bring lasting change. And, in particular, extend the reach of computing to many, many more people on the planet. It is my belief that some of the most basic needs, particularly in the area of health and education, are going to be better served only when the computer allows a more direct exchange between people and machine.

How did the idea of NUI lead to the making of Kinect?
We've been thinking about this, and researching it and working down that path, but it was really about five years ago that the game group in Microsoft, who work on the Xbox, came and sat with the research people and said: you know, we really do believe that if we want to extend the range of gaming to a much broader group than males between the ages of 12 and 30, and allow it to be applied much more for general media consumption, we've got to get away from the complex controlling. For those of you that have tried, (gaming devices) it's sort of like trying to use one of those remote controls with anything up to 50 to 100 buttons on it, or requires cording with multiple figures, to get it to do anything. It's really more like learning to play a musical instrument rather than a thing that the average person can just pick up and use. And while there have been literally probably hundreds of millions of people who have mastered that kind of complex tactile interface, it's only a tiny percentage of the whole population. And so it became clear that it would really require a really radical change. to expand the genre of gaming, and its audience and then to allow it to fall over into a strategy that would really give people much broader access to the internet and to media - particularly as the media tended to move online.

And so, we set out on what appeared to be an impossible task - controller-less gaming. Yes, we found we had more than 10 years worth of research in many different areas which, when we integrated them, were able to give rise to what became Kinect. And we did that from start to finish in three years, which is in a sense quite remarkable when you think of not only the amount of technology that was there, but building a whole ecosystem and getting people to build games around it.

How has interaction between people and computers changed since the launch of Kinect?
The NUI is important, not only in how we think about using it, but in what it has engendered globally in alternative ways to think about what happens when you really throw away the graphical interface and approach man-machine interaction in a completely new way.

We created a montage showing Kinect on Xbox and within a week after we released this people all over the world started to put the Kinect onto personal computers and tried to write their own applications. And then we said we would release an SDK which we did in last June. The commercial version released on the 1st of February this year had applications that were done with the Microsoft toolkit, all made by people who download the kit, took their Kinect off their Xbox, plugged it into their personal computer, and start to write their own applications.
break-page-break
What are some of the more interesting applications that are coming out of this approach of man-machine interaction?
Some people in Germany put a Kinect on a helmet hooked onto a laptop in a backpack; put a Braille belt on, and created a system that guides blind people. They did that in about two weeks. When you think of it, how hard would it have been prior to this, for students in a university to produce a system that would give vision to blind people. In a sense, the machine sees in front of them, and basically talks to them, and then it gives them a map in Braille that they can feel on their stomach. Compared to being blind and only having the cane to tap with in an unknown environment,  it's quite a radical transformation. The ability to have a toolkit that takes a lot of this technology and makes it really useable by people who are not experts, is just a very, very powerful capability.

Are there other medical applications for the natural user interface?
A school that's north of Seattle, came to us and said, we've seen such a remarkable application for Kinect games that we wanted you to understand and help us document it, and show it to other people.  So we went and we found what happens when people start to think in completely different ways about the power of this direct representation of people as avatars on the screen. 
   
They had the remarkable forethought to use Kinect to engage autistic children in gesture-based games. This not only improved their motor coordination but also their social and language skills. This was specially so because it was motivating and fun, drawing children out of the inhibition which is pathognomonic of this condition. And they didn't make any changes to the actual games.

These kinds of things are very powerful. They show that computers are able to provide a level of interaction and feedback that is really hard even for dedicated teachers or caretakers. It shows how the motivational power of gaming can find a completely alternative use.

Another area that I think is going be very important is telepresence. We experimented with a medical kiosk a few years ago. In fact, we did this to show how in a country like India, where you have a large rural population and lack the ability to deliver high-quality healthcare, you can address the problem via telepresence (patients can have their preliminary diagnosis at a machine-operated medical kiosk and the reports can be sent to the doctor, who can later prescribe relevant medication).

We wanted to combine a number of things that are very important. Mainly in the inference systems which are at the heart of machine-learning capability. And robotics, where we wanted the computer to be able to express itself more directly as an avatar or some type of a humanoid. This represents a new capability. We would build on this idea that a lot of specialised human knowledge can be encapsulated in these expert systems. And then, you have them employed where they can interact more directly.

Was the medical kiosk just a research project or a prototype or has the application gone into regular use by doctors in rural areas?
This was a research project. We don't have a product that came out of it, but there the general line of research is called "Situated Interaction." So, the next thing, the person who did that work, did was to make a computer that sits outside his office and acts as his administrative assistant. So, in this case, you walk up and it says, hi Craig, Eric's not here right now, can I help you? In essence the same thing that a human assistant would do. And then you say, well, here's what I wanted, and then it tries to problem-solve it. from a pure research point of view, that's the way that that has been evolved.

How will telepresence alter business communications? 
Telepresence is going to become more and more important. For instance, using Xbox Live, you can have telepresence meetings where you and up to eight of your friends are all represented as avatars. The system has a built-in set of rules of cinematography. Without you doing anything, it shifts back from a wide-angle view where you see the whole environment to a local view where you see the other people from your own vantage point. It's very clear to me that multi-party interaction via telepresence will become a more and more accepted part of everyday life.

People are talking a lot about Big Data nowadays. What do you mean by that?
We're sourcing data from many many sources on a continuous basis — whether they're sensors or computers you interact with or the applications — and when you get the data, what can you learn from it? And when you have these very high-scale machines, you have very large amounts of data, there's often a lot of useful information that can be gleaned that humans can't find. So one example that we've done recently at Microsoft was we had a group that for the last six years was in the business of developing these large-scale database systems for medical information. When you put them in the hospitals, they ingest all of the operational and clinical data, so we had ten years worth of data from a hospital chain, and one of the problems that the medical community has is a problem called readmissions. So you know, you get sick, you go to the hospital, they fix you, and they discharge you. But an unusually high percentage of the times, between three and thirty days later, the person is back in the hospital. And many times, not for anything related to what they went in for the first time, and not a condition that they had before. So there's something about being in the hospital, you know, that produces this bad effect.

After years of knowing that this phenomenon is there, no substantive progress had really been made at reducing it. And so, it was obvious that it's a complex problem and that people really could figure out, well what are the things that are causing these things. So we took the machine learning group in research and we had them take ten years worth of medical data, and asked the machine to figure out what the patterns were that correlated with these readmissions. And it turned it out to be stunningly effective. So they set it off to learn and it got more and more granular in showing that there were correlations, which got down to, for example, finding a room that had a bacterial infection that they hadn't cleaned for. And if you happened to stay in that room, you got an infection. So it's just luck of the draw.

These are things that are dynamic in nature, and no human is going to find them because it happened after you were discharged. And when you got the infection, you came back in a different part of the hospital. So no one would ever know, you know, that the source of that happened to be that room. But the computer can tell because it has this history over time. So we built a computer model based on all of these patterns, and we now want the service where for people who use this database software, everyday they can run the model against their current patient population, and it produces a list in priority order of which patients are most likely to be readmitted and why. And so now, while you're being there being fixed, you know, for your primary problem, they can anticipate what might be the cause of your readmission, and they can fix that while you're there too.

And of course, this is just a huge win because you eliminate the entire second event. So you know, all the costs associated with that and all the pain for the patient is just eliminated, because many times, these were interactions that people didn't understand. For example, they found that if you were in the hospital for congestive heart failure, and you never had an episode of clinical depression, your odds of readmission are much, much higher. And the reason is that the depression emerges after you are released, and then you don't adhere to your drug regimen completely and you end up back in the hospital. So it wasn't even something that was an active condition at the time, it's just historical fact. And so, it's only the ability to cross-correlate all these various solid pieces of information, and then use their predictive capability that creates this breakthrough. And I think that there are gonna be hundreds, thousands of examples where people are gonna be able to take these large data sets that are emerging and apply machine learning to optimize your business results or learning or game development or whatever the facet might be. So, natural user interface, big data machine learning is sort of a special technology applied to big data, these are some of the areas that I'm personally the most interested in now, and one where Microsoft has obviously invested a lot.
break-page-break
What's the future, Mr Mundie, of big data considering that you know, you're already in a situation where for large corporations, the data requirement or at least what they store is almost doubling every six odd months or eight odd months, where are we headed and at what stage do you start destroying the previous data, because nobody wants to destroy 20 years of data, 20 years of analytics that you can always dip into five years down the line. So where is this headed in terms of complexity and in terms of data requirement and technology?
Well, there are three different issues there, you know, I mean, so let me start saying, I think the era where you throw things away is gone, right, you know, that we had a time where spaces were small, floppy disks and the hard disks and CDs and such things, were small relative to the amount of data you might want to have. But we've seen and continue to see an exponential increase in storage per dollar, and that does appear to be slowing down. So I think we've crossed the threshold now already, where there's no real economic requirement to get rid of the data, what may be as an individual, certainly not for businesses that can make any value for money. Similarly, the machine learning for men deals with the complexity problem, you know, just like I indicated it was complex to describe a vision-based gesture, but the machine could learn it very quickly. And I think the same is going to be true in the big data world, that is we have more and more data, you would say, well, as that grows exponentially, if you only have classical and analytical methods, you could say that they get overrun by the amount of data. But when you start to use these super scale machine learning facilities to get value from the data, then I think it essentially overwhelms the complexity problem and so in the future, I expect that whole new tools will be available. So I think those things will remain at least in balance for the foreseeable future. And that's why it will be also even imperative that people try to retain their data, whether it's personal, governmental or business because the more data you have, the more you can learn across it.

Does the fact that the R is separated from the D create problems in some of your better work making it to the market quickly? For instance, the avatar-based telepresence as a piece of research is quite advanced, but Microsoft has actually not done anything to capture the video conference market.
There are always challenges for research, which is curiosity-driven and not product-specific, and the product rule. And I think that's true everywhere and it's also true in Microsoft. My job especially in the last five years has been to optimise the path from the research activity to the product. And we feel actually quite good about that now with 15 - 20 years worth of research results in the repository and our ability to start to look for these problems. Kinect required, if I remember right, seven different research groups from four labs on three continents to come together with the product group to produce that system. And so, if we didn't have all of them, there really would not have been a solution, or we'd have had to try to import it from outside; even if it existed, it wouldn't have been a unique solution to us; therefore its economic value would be diminished. Despite everybody realizing how powerful this is, and despite many people trying in some ways, no one has come even close, 18 months later, to producing something like Kinect. And so, the combination of the algorithmic advances and the sensor advances, is so far yet to be duplicated. It's a bit like being a venture capitalist. You invest in a lot of young companies, and some of them become Microsoft, Apple and Google, and some become good staple businesses, and many don't make it. You have to think of the research as a bit like that where you have some superstar results, many good results, and many small results, but you don't get a uniform transfer. In the case of telepresence and avatars, we actually made a conscious choice that in the first place we would put out the telepresence avatar out for kids on Xbox. The reason was, there are a hundred million avatars already out there in Xbox, because you use that as part of the interface. So the sociology of you and your avatar was already established, but only for that community. We wanted to prove that it was possible with an audience that we thought would be intrinsically receptive to it. 

How do you see the concept of smart TV going forward and what will Microsoft's role be in this area?
A lot of our focus now is to sort of blend together natural user interaction with an internet-based delivery of media. If you look at the Xbox that was just launched in December, there were a couple of notable things. One, I mentioned that it now has the same natural interface, so if you want to use your gestures to control the big buttons on the screen, the same as you would on a tablet or your phone, you can do that. But we also introduced much more focus on voice-based communication. And it also integrates domain-specific search engines with the medium, so in the new Xbox, Bing has been trained on a set of data that just relates to media like music, movies, etc. So now on Xbox, you can just say "Xbox…" and give it a verbal command related to its own functions. But now, one of the new commands is Bing! So you can say "Xbox Bing -- Bollywood movies," and it'll basically go out and bring back all the Bollywood movies and put them into this array of tiles which you can then navigate by voice or refine by search or just point at with a gesture, and it will play them. The remote control just evaporates completely. So between speech and gesture, you can access all the content that the system has available. And because this content is all being delivered over the internet data site as opposed to the conventional TV broadcast or cable distribution systems, it appears that a lot of media delivery is moving in that direction. It remains to be seen how you can integrate that with the classical broadcast fiber capability. That's really not a sort of a model problem, it's a system interface problem because most of those systems have had set-top boxes of one form or another that have never really progressed very far from being good computers.

Right now, a lot of our focus has been to try to get to the IP side of the network and to integrate that into this voice and gesture-based model of access. One thing's really interesting, I think there's about 65 million Xboxes now, and a substantial percentage of them, I don't know the exact number, have this live subscription service both for the gaming and media. And in the last year, with the arrival of Kinect and the arrival of the content, people now spend more hours per week consuming content through their Xbox than they do playing games on the Xbox. 

Could this kind of intelligence and analytics be also replicated in other generalised requirement, like in the retail industry?
Absolutely. Every industry is going to need this capability. I don't know, in the final analysis, where the tool stops and the domain-specific knowledge begins in order to really create value. Obviously, we did the readmission manager in part because the customers who bought the data capability realised that the classical methods really didn't allow them to get the insight that they wanted from the data; they wanted more, and in a sense we came back and said, well, how about if we study the data and the domain experts will produce the model and we'll sell you the model. They loved that.. So I think all of these things will probably be there in some form or another, but I don't quite know what the generalisation of machine learning as a tool and its long-term role in the capital market is yet. I know it'll be important, and I know that there are certain domains where we know how to apply it. I don't know how to completely generalse it yet.

Do you think there will come a time when soldiers of the future can actually send out a robot and control it through your gestures as part of warfare?
I'm sure there's military guys who think that's an interesting thing to do, and they can buy the kit too. I'm sure there's probably guys in a bunch of the defensive agencies around the world who're trying to think about how to do this. If you look today, for example at the drones, in many cases, my understanding is a lot of the people who make the best drone pilots were kids that grew up playing video games. Then when they grow up, they go essentially fly these things for real war games. And in that case, they really were using more classical joysticks and other things. Whether or not for other kinds of applications, real mapping of human movement onto the thing is really the best thing or not, I don't know. But it's very clear that while militaries and commercial airlines have invested heavily to integrate those kinds of quite realistic experiences for training purposes. There's no reason to think that that won't be extended into these other areas as the cost of the technology climbs.

What research work being done by your competitors interests you? 
It turns out that not many companies are doing very much basic research. A lot is being left to the universities to do. Microsoft's probably doing more basic science than most of the people we compete with. Clearly, Google and Apple and others are all making advances, in some areas, in the same places we are. You can see for example, last year, a big focus for us has been on speech-based things, and of course, Apple put out a speech-based capability receiver in their phone recently.

I think there's certainly some obvious commonality of interests even though I think the approaches are not all identical -- although we do expect it to be. I think that this movement to natural interaction especially speech-based input will probably see a lot more emphasis by all of the major players in the next 12 to 24 months. There's a lot of focus by us and others in language translations, where you can take text and convert it. We've done demos from our research of real-time spoken word translation that's all done by machines. So it's essentially simultaneous translation, all by computer. That's not ready for prime time yet, but it's getting quite good, and it will be a breakthrough when that comes. So any person in essence could take up a phone call and talk, and a different language comes out on the other end. And if you think of the Avatar Kinect meetings, what's kind of interesting in that environment is all eight people could speak a different language, and they could all hear each other in their own language. So you know, in a globalising world, I think it's going to be fascinating to think about some of the advantages of not all being in the same room physically at the same time. Today if you wanted to go to a conference and hear that kind of real-time translation, you have to put all the interpreters in a soundproof booth at the back of the room and everybody has to wear earphones. But if you're all sitting in your own offices talking, and then the computer's doing the translation and you're only visually in the same 3D space. the computer can essentially substitute a different language in real-time for every person.

(This story was published in Businessworld Issue Dated 23-04-2012)