Interviewee: Britten Martin
In this interview with Britten Martin we talked to him about:
- The role of support inside Microsoft.
- A walkthrough of how a support request is handled.
- The ways in which the outside community can interact with the internal support processes.
- How a product is supported differently in beta than when it’s finally released.
- How a bug eventually become a fix in the product.
Scott: Britten if you could just introduce yourself for the record. That is usually a good place to start, and then I’ve got an initial question after that.
Britten: Cool. So, my name is Britten Martin, I am a group manager within CSS which is Customer Service and Support. I’ve been with Microsoft since 1999 where I came in as a support engineer supporting Windows 2000. So I have been a support engineer, I have done some work with our outsource partners in a partner technical lead role, then I moved into frontline management and did that for about three years. In my current role I am a manager of managers, I have an organization of about 75 people and I am the global sponsor for System Center Support, which is your SMS and MOM products.
Scott: So talk a little bit about what falls under the scope of support inside of Microsoft. I mean, engineering rather has its domain, marketing has its domain: what is support defined as and what are kind of the responsibilities roll up under that inside Microsoft?
Britten: That is a good question. It really depends on which segment of support. It is a great clarifying question, Scott, because I am within Commercial Support which is the enterprise-based support, your back‑end, your servers, many of your big customers that buy dedicated support contracts, the government, many of the major companies around the world. We also have certainly have a consumer end which is your home user running Vista, Office, also Xbox, Zune. Any product that you can think of that is released by Microsoft has support. So really what it is, the thing that separates, I think, Microsoft from other companies in that, is that no matter what’s out there, you can pick up the phone, call 1‑800‑MICROSOFT, tell them what your product is, and you’ll get through to someone to support your issue. Although it certainly depends on the customer segment that you are in, who you are routed to, where you go and how much it costs.
Britten: In Commerical Support, we support customers who have technical how-to questions about the products and certainly those who are getting error messages.† Also, if you need help setting it up, if you need more consultative type services, we do that as well.
Scott: OK. Then one other thing I was somewhat curious about too is ‑‑ one thing I have always been curious about I guess – is how a problem works its way through the cycle.
: So, you start out when the phone’s ringing and you have one customer, two customers, and three customers: they seem to be having a similar issue. How does it work its way through the process from the technician helping them on the phone, to maybe later on deciding it needs a KB on it, to maybe a hot fix, to maybe being rolled up in a Service Pack?
Britten: That is a good question too. OK. I will speak in today’s terms because major things have shifted over the last couple of years. There used to be a lot of process behind just what you were saying, which is, you know: "When do we do a KB? How many issues before we make a hotfix?" Nowadays with the advent of blogs, which is kind of, what you are doing, you know, what this interview is about, in support we have become a lot more proactive with getting immediate information out to our customers. I will give you an example from my group, within System Center: last week we had an issue with SCCM, the new version of SMS. Just like you described: one call, two calls, three calls. All of a sudden we are like, "Wait a minute, there’s something going on here!"
So we got some information together, dumped it out into our blog so it was out there. On the back‑end our support engineers, who are your front‑line resources, are working with the escalation team, our escalation engineers who do the debugging, who look at source code, who work with the product group. They are debugging the issue, and they get it down to, "OK, this is clearly a bug." At that point, that escalation engineer has a hotline per se to the product group, they pass along all of their research, they are reproducing the steps, reproduce the issue, send it on to them, all the documentation.
Then it is kind of in the product group’s hands to decide, "Is this something that we need to fix now. Do we need a hotfix, is it just something that we can put documentation out." You know kind of that form of decision making process. In this case it was a major issue, it was a deployment blocker, huge customer pain, so they released a hotfix. Then once again, depending on the urgency, is it a publically available hotfix or is it something that you have to call support to get a hold of? In this case, we knew this was a major issue so we made it available out on Microsoft.com, put a KB article out there, and linked the hotfix to it. Therefore, customers could immediately just go and download that hotfix.
Down the road certainly that’s going to be in the Service Pack 1 or similar, so you can pretty much figure for all of our major products, any bugs that we fix, or any hotfixes we release up until, there’s a cut‑off date, will make it into the Service Pack.
Scott: And then talk to me a little bit about the timeframe, right, between the time where maybe the first few calls came in, until the time that hotfix and that KB article are out there.
Britten: Well, once again it is very, very dependent upon the criticality and the widespread nature of it.
In this case from first call to KB/hotfix was about five days. In addition, some of those delays ‑‑ they had identified the code, they had written a hotfix within about two days, but there were some delays, you know, going through legal on official KB review, and getting everything up there and replicated. So from beginning to end, about five days.
Scott: Yeah, that is fast. I do not think most people would ever think of Microsoft turning something around that quickly.
Britten: Some times, I think Microsoft is seen as very slow to respond, but in reality, itís often not the case.† †If it is truly a deployment blocking issue or something that causes widespread customer pain you will see a quick turnaround on it and you will definitely see the documentation out there in a timely manner. I have been here eight years, I have seen issues where it was tough to figure out what the fix was, what the resolution was. Where it was something that would hide itself into multiple components so it was a complicated fix to write, it takes a little longer. However, if we can identify it, if we do a good job on the support side with identifying it, providing reproduction steps to the product group, you know, they can turn it around quickly.
Scott: So one question then too, would be, what does the triage process look like if after reproducing the bug you quickly realize that it has security implications versus something that’s a deployment blocker, which might be the next layer and then further down which are just kind of usability or general problems in using the software?
Britten: Yeah, I think, once again when you are running into security type issues what you get with that is even more resources thrown at the problem. We have security aliases internally, they have product group members, and they have certainly a high-level support presence as well, that if something comes out we can fire it up and quickly. There †is a whole process that is enabled. Let us say there is a new Sasser or Blaster or any of those. We have processes in place that will quickly call together members of support, product group. We have open lines 24 hours a day. It is kind of the war room type mentality.
It has product group members on it. It has support members. So people on the field can call in and say, "Look, I have got an issue with this bug or with this security hole and just want to make sure I have the up to date information". So they can call into this grid, they can get the information.
On the back‑end, it is hard to say how long it is going to take, depending on the fix but you are going to have even more resources available. You will see a fix as quickly as we can get it out there. After the Sasser/Blaster/Nimda pain, we really turned the corner not only on product development but also on the support side to ensure we could have a quick turnaround.
: What do you think the level of community involvement is in the support process because obviously that is a huge piece of the open source model. Right down from many eyes looking forsecurity bugs to a very well populated mailing list where there is a high discussion level, traffic, and things like that. However, at the same time with the closed source support process while the team is staffed potentially adequately with full time employees, there are general concerns out there with a closed source model in terms of how much the community can get involved.
So what are the avenues the community can actually affect the support
process at Microsoft and/or see what is happening under the covers of the support process?
Britten: Yeah, I will tell you that it has really been something we have been working on for the last few years. The product groups and the support teams have been trying to find ways to tap into that community model because there are many folks out there that do our support day in and day out with customers that have a lot of knowledge that could be helping the broader community.
So, one of the things that we have is the MVP program, the Microsoft Most Valuable Professional. One of the things that these folks do is to actively take part in the public newsgroups for our products. I can point to a couple of products ‑ Small Business Server is the poster child for that. It has a very strong community. Customers can go up to the Small Business Server forums and post problems and the MVPs will quickly get involved to help resolve the issue.
There is a lot of energy around building a stronger MVP community and empowering the MVPs to help solve customer issues. Many of the product groups have CPE, Customer Partner Experience, teams, that are really focused on tapping into that partner community.
Another example from my world System Center Essentials, or SCE. We actually launched support for that on a complete forum based model, where we had engineers from CSS monitoring the forum, and we worked to build strong partner participation. If the partner could not answer the question, or did not answer the question within 24 hours, someone from support would jump in. We are trying to grow more community presence around support.
Globally, there is more partner presence out there as well. A good example is SoftGrid or Softriciy, a recent acquisition. In Europe, before Microsoft took over, there was not really a ton of dedicated support. It was all done through the community. Therefore, we have been looking and trying to learn from them on how they involve the partners, and how we can keep that type of interaction going. I know there is a log of energy around this and it is a
Sean: Well, one question which is really back to the second part of the original question which, is that a lot of the product groups in Microsoft are being affected by the open source model.† For example recently we talked to Shawn Burke who has been instrumental in getting the source code† for the framework out the door. How much of push to learn from the open source community is affecting the support team in general? For example could you see a day where the status on bug fixes is more open and less a black box until the actual fix arrives out on the web?
Britten: So thinking more about ‑‑ I am partner, I am out there. I find a bug or a hole, I put it out there, and really being more transparent if that is going to be fixed. Is that kind of…?
Sean: Yeah, exactly. I realize it’s not going to be a plate glass window ‑ it never is. But at the same time I think a lot of people will look at support right now and say, once I send in my support request I don’t really have a lot visibility of how they are stacking it, ranking it, prioritizing it, and where it goes, until maybe I get an email that says you have been deferred. Right?
So I am just curious, I know a lot of the product groups specifically are trying to expose a fair amount of their development processes, so I was curious if maybe that was making its way over to the support side.
Britten: You know it is a fair point. Let’s say you are running a beta version of the product and you are involved in the one of the official beta programs, such as TAP ñ our Technology Adoption Program.† †The beta version is fully supported and you can actually go and directly file a bug. In addition, I know that they try to make a lot of that process transparent, so you know if the bug is a duplicate. Is it being fixed? Is it being deferred to the Service Pack 1…whatever. So actually, I know on some levels we do that.
On the support side, I think we make good use of our blog. There is a huge blogging effort, because there is a lot of knowledge within our support engineers and within our internal databases. For one reason or another that has never been made public. So I think we try to do some of that as well. But I can see the point. I mean maybe someday and maybe that some day is not too far away where you can log in and say, here is a bug that I submitted. Alternatively, even better for Windows, here is the bin of things that are going to be fixed in the next two weeks, months, whatever.
Scott: Well, and so you know not every support request anymore comes in through a phone call, right? There are things like the Connect site. There are things like Windows Error Reporting services. I guess if you would not mind talk for a minute about some of the different channels that, a problem can kind of come in through and… Go ahead.
Britten: No, no, it is fine. So today, and if I use Americas Commerical Support as a kind of a measuring stick, somewhere around 85% of our support issues come in via the phone ‑ very traditional means of operation. We are really trying to push more of our online type offerings. One is online submission where you can go up and create your issue; you can attach data, submit it to a support engineer. That support engineer reviews that data and calls you back, hopefully with an answer.
The Connect site is a lot more beta focused at this point. I am not an expert there, but that’s where you get into many of the cool things where you can actually submit bugs and all that. We also have the managed forums, which we talked a little bit about. The managed forums are a completely different avenue. Most of those are staffed by Microsoft support engineers as well as product group representation and the partner community.
So you have an issue with Windows, you can go up to a specific Windows forum, post a question. You might get an answer fromsomeone in the product group, a partner, or a support engineer. Therefore, you have those as well.
On the consumer side, they do really cool things with automated responses. Therefore, you submit an issue and it checks your data, checks your error log to see if it is a known issue. Before you would hear back from the support engineer, you might get an automated response with, you know, try these two things. If this does not work respond back and the engineer will contact you, then as you said, you had the whole Watson or error reporting. The error reporting stuff is very cool, because regardless of bugs filed or anything else that goes on, all the product groups take all of their Watson or their error reporting data and they make sure that as they look to roll out service packs, with new versions of products especially, they want to handle those top 15, top 20, top 50 Watson issues. They want to make sure that they are knocking those off.
Therefore, it is an interesting mix of many different avenues for customers to submit data, and in some cases, a Watson log, the error reporting, customers might not always think they are really contributing to the next version of the product, when in essence, they truly are.
: I realize, too, that you are right, that is largely a mechanism for when a product is in Beta. I guess, I have two questions, but one is a follow up on this thread initially. Talk a little bit, I guess; about how a product is supported differently while it is in beta, versus once, it has been released?
Britten: That is a good question. We have a few different types of beta opportunities for customers.† There are more formal relationships, such as TAP (Technology Adoption Program) and RDP (Rapid Deployment Program).† With these formal programs, customers agree to deploy this product into some type of production environment. In return, they get access to submit bugs, they get access to product group members, and they get
access to dedicated support engineers.
From within support, let us say we have a new version of Windows, we might have three or four support engineers in the US dedicated to this beta support. These engineers get a hold of the product early, they get access to product specs, they visit the product group, and they may even get to go out on site to customers who are deploying this product so they get some real world hands on experience.
Now, that is for the structured beta program. If you’re not a customer in one of these programs, you likely can still download beta copies of the software. Even in these situations, there is support out there, through the forums that we talked about before. Once again, you probably get answers from those same beta engineers and some of the product group folks. You just do not have that one-to-one interaction in a lot more, you know.
Does that make sense?
Scott: Yeah. Yes, very much so. One of the other things that ‑ I will kind of circle back to something you said earlier – If it needs code to solve the problem, you know, if it is not just configuration file that was messed up or any one of the myriad of things that could cause a problem. You know, some registry key got corrupted or whatever. However, it actually needs the coding hot fix, you could hand it over to the engineering, send it over to the product team and they triage it, compared to other stuff.
I mean, have you ever run into situations it seems like this would inevitably happen where they’ve got their engineers dedicated to building the next version of the product, and meanwhile, there’s an endless list of bugs in any complex software that they could be tasking engineering resources with. In addition, it seems that they kind of have to make that call about pulling people off engineering new features to work on a hotfix or something like that.
Scott: And how does that process of triage work? Because support is kind of on the front line with the customers, it seems like sometimes you guys might have to go back to them and say, "Um, you know the time frame that you’re looking at fixing this in probably isn’t really going to work, so think about it again."
Britten: Right. In addition, I will tell you, it is really dependent upon the product group. You look at something like Windows. Windows has a whole team, and you hear it from time to time called Win SE. ,Windows Sustained Engineering. They have a whole team dedicated to current version, and whatever supported versions are out there, on the hotfix front.
With some of the smaller products, you may run into situations where they have to pull developer resources off the next version to fix current version or legacy version problems. However, I tell you, in my time here, I have never seen that as being a blocker in getting something fixed. If there is truly an issue that needs to be resolved in the current version of the product, especially if its security related, you can be assured that they will find someone to handle it.
However, as a general rule of thumb, all big product groups have sustained engineering, folks that are dedicated to current version problems, or legacy version, so it is not necessarily that huge resource problem that you would think.
Scott: OK. Therefore, many times, they are not pulling people off features. They have people who are dedicated just to the maintenance and hot fixes.
Britten: Right. That is your sustained engineering folks, for the most part. As I said, I think you would run into some smaller product that might not be in that same boat, but the major ones have sustained engineering.
Scott: It seems, too, like support, it is a little bit of a difficult situation, because, I guarantee, if I call with any bug, it is a critical show‑stopper. The world will end if it does not get fixed. So how do you kind of manage customers, who are generally going to feel the importance of their bug is high, but Microsoft does not necessarily determine it so in the grand scheme of things?
Britten: Yeah. I tell you, our people are experts at conflict management and really setting customer expectations, and certainly the empathy piece as well. I mean, you have to be empathetic with the situation, but you also need to be very transparent and upfront.. It is saying, "Look, here’s where the bug is at this point. It is something that may or may not get fixed. Here’s the timeline."
With many of our customers in the Commercial space, these customers also have account managers that are Microsoft employees that really serve as the liaison between support and the customer. Therefore, we do a lot of work with the account managers to help us on the customer maintenance side of things, when maybe the messaging isn’t as good as we’d like it to be.
However, frankly, all major problems end up getting fixed. They might not get fixed in the period the customer would like to see it, but, between a hotfix, the next service pack, and the next version of the product, all issues do end up getting resolved.
Once again, it is really just dependent upon how many customers are having the same problem. Is it something that other customers could run into? In addition, really prioritizing, like you said before. I mean, the product group, while they have a sustained engineering team, has a limited number of resources, and they need to prioritize what are the big-ticket items that need to be worked on.
However, that skill of conflict management is one of the key pieces for successful support engineers, because, every day, when we answer the phone, when we respond to an email, that is generally that networking administratorís worst day of the year. The servers are down. They are catching heat from their management.† Something is going on, they are being yelled at. Our engineers deal with this on a day in and day out basis and as a result, they get very good at handling the heat.
: So one question I am curious about is, how do you deal with it when the support engineer knows that it is really a usability issue and not a bug, sure there is a sequence of phone calls and bug reports, but it is pretty clear that this isn’t a bug, it is more of an issue of usability. And it might be in the land of the undiscovered feature right? We were told, if I remember, some person in the office team told me that something like 90 percent of the feature requests for office, are already in the product, but nobody knows the feature is in there. So how do you people educate the product teams and what does that flow look like? How do you finally classify that, "OK this isn’t really a bug, it is an issue of usability?"
Britten: That is a good question.
Sean: Because obviously the product team, like everybody else is blind to what they build, right? Of course, it is useable. Because I built it, I mean every developer has fallen to that problem and we all have seen some pretty horrible UIs over the years. So how does the support team educate the product teams on this point?
Britten: Yeah. In the past, if you back up five years, 10 years, support was as you said an island. We were out here, taking all the heat and there was limited interaction with the product group. About five years ago, things drastically changed.† we started doing what is called supportability in our world. We review what are the top issues, how long does it take us to solve those issues and how much do they cost the company for us to support these issues?
Therefore, we came up with a list of the top 10 issues. We then present that back to the product group. Now the interesting thing about that is the product group actually funds us to do support for their product. So, when they see that a problem in their coding is †costing them $800,000. they to take some notice , and say wow OK, what does right look like, what are we missing here?
Sean: Well, just a quick follow up to that. What are some examples of things where you feel like you touched the product, and you made a real delta that the customers ended up bettering from, but at the same time you folks probably would never personally got credit for it
Britten: That is right. I donít have a list in front of me but DNS is one that consistently scored high for Windows 2000 and even some into Windows Server 2003.In addition, there was so much push and so much interaction that there were some major changes that happened later in 2003 and into 2008 that have drastically changed the way DNS works.
Another one I can give you that clearly had all the supportability people driving this was terminal server licensing. Terminal server licensing, when you go back to 2000 and once again, kind of into 2003 with a complete and utter nightmare for something that should be very simple. Consistently that silly process ended up top two, top three on our supportability list for Windows Server. And some very small tweaks based on advice provided by support completely changed this, dropped out of the top 10, out of the top 20 in support issues and saved the company a lot of money.
Those are two examples of technologies I know we beat on. However, on all the products out there, SQL, Exchange, Windows, SMS, MOM, they had the supportability groups that review the top 10 issues. In fact, I was just in Redmond last week, looking for the SMS side and they do cool things with it. I mean they look at top 10 issues for previous version of the product, look at the top 10 issues for current version, and review whether or not they fixed the top 10 call generateors? If not, what do we need to do to make sure it is in the next version that we do? Support is a huge data mine that we are just starting to realize the power of all the data that we have.† We are a great voice of the customer.
Sean: One quick follow up to that and then Iíll turn it back to Scott. I am curious because you mentioned that five years ago there was some type of a change, right? In addition, all of a sudden, you folks were a lot more integrated. What drove that? In addition, I realize with all changes ongoing, there is probably stuff you can say and stuff you cannot, but I am sure somebody reading the interview is going to be like, "Wait, wait, and ask him that." So what drove it at a high level?
Britten: I think that is fair. I wish I had some great and gory backroom drama to share, but if it happened, I just donít know the details.† In my opinion, you know, part of it was just the fact that, wow, especially on the Enterprise side, Windows Server 2000 started to really turn the corner and become a huge enterprise related product. People were out there buying it and implementing it and someone on the CSS side as well as, I can guarantee you, the Windows product group was like "We need to tap into this.î
Around this time, I was a support engineer and I suddenly realized that the product group was really listening.† In all businesses you started hearing about some monthly conference calls where we were there to provide feedback on top issues.
It was a time when we were moving from being just a consumer, home user company to being a legitimate player in the enterprise. In addition, there was a lot of pressure at that point to make sure these products were just better, and I think someone realized "Wow, CSS has a lot of knowledge out there; we need to tap into it."
Scott: Well, how much now do you guys get involved in product development from the standpoint of, you guys are looking at the product, while it is in development and saying, "You know what, if that error happens, there is no way we will able to trace it back and figure that out. We need instrumentation here, we are going to need these things, spit it out to a log file, we are going to need these kind of support features built into the product to be able to kind of quickly
and officially troubleshoot things."
Britten: That is a huge piece of our beta involvement. I know we talked about it before. For each beta, we dedicate engineers from our support teams to the pre-released product. Well, one of the way they make an impact is by providing that ongoing feedback to the product group. They are not only working with beta customers, they are not only answering newsgroups, but they are also messing with the product itself.
Therefore, they filed their own bugs. They have supportability war room meetings where they say, look there needs to be some logging here, or this error message does not make sense, or if this component is blowing up, I need something to trace it back. That is generally our involvement.
We do spec review. So many of our engineers have been on betas before, they are hooked in with the product group contacts. As the next version of the product is being planned, our engineers get involved to review specs, and just generally throughout the beta process from the beginning of our involvement through RTM, we are providing that ongoing supportability feedback.
So not only are we always supporting our customers, but we also look to help the product group on the supportability side. It is like you said, I mean, they are focused on here is a new component, here is a new thing that customers can use, maybe not always focused on how would I support that if it breaks?† Letís be honest Ö if we donít provide our feedback to the product group on what we need, it just causes more pain for us once the product is released.
Scott: One other one question is, so a new product is coming out and how do you get support folks up to speed on it? In the OpenSource model the development team in some cases is the support team and there is a tight integration between that team and the open source community so there isn’t really a need to educate the "support team" in this case. So how do the support engineers come up to speed on a given product?
Britten: This is another benefit of our involvement on the beta side.† †We get access to the source code. We get engineers playing with the product. We get that constant interaction with new builds of the product as it comes out. As a result, months, in some cases years, before the product actually ships, we have experts on this product.
Now, our experts certainly might be great technical writers, but we have a group called Global Technical Readiness. These folks are really program project managers who will say, "All right, we need Vista training for our support engineers." Therefore, they get their list of top issues. They get a list of new components, features, all of that. In addition, they start mining for data using these beta resources to come up with, "Here’s what engineers need to know. Here are the troubleshooting pieces; here are the top issues we have seen with our beta customers. Make sure engineers know this and that."
Therefore, over the course of this beta cycle, we are also in the background writing our own training. So these engineers write the training and as RTM approaches, depending on the training strategy, they might train all their engineers or a subset of the engineers. However, the knowledge is based off spec; it is based off beta engineer experiences, and beta engineer customer interaction.
In addition, the product group, in recent years, has really stepped up to help with training. They will review our training. They will provide insights. They will attend beta deliveries of the training and help the trainer get comfortable with the material and the product.
In some cases, the product group will also do a session of live meetings on, "Hey, here’s how we intended you to support the product. Here are the supportability features that we thought would be helpful." Therefore, you get a lot of interaction with the product group as well. From the beginning of the product to the end, not only are we helping customers on the beta side, but also our selfish win is to get the training that we need.
Scott: So, let me ask the flip side of that, right? So, you mentioned Configuration Manager. You’ve got System Center Configuration Manager 2007 coming out. Everybody wants to learn it…
Scott: …because everybody is motivated. Have do you rather incent people to keep wanting to support the previous version when there is something new, shiny, and interesting.
Britten: At least on the Commercial stage, I do not know that I have ever really run into that problem.
Britten: Just simply because, in the enterprise customer segment, what you generally see is not the consumer model where a product releases, and everybody goes and gets the new version. However, it is a gradual uptake, usually. So, say Configuration Manager. Right now, and I can give you the specifics on this one, it is 10% of our total SMS volume. Therefore, engineers get a couple of SCCM cases, but they get the majority of SMS 2003.
In our minds, that is beautiful. Because what happens is, slowly over time they can build their SCCM knowledge while still providing that long held knowledge on SMS. They can help those customers resolve their issues. Therefore, it is a slow mix that generally happens. I do not know, it is ‑‑ they are tech junkies. They love the new stuff, but I guess maybe we all know that we have to pay our dues with the old stuff as well. [laughing] I do not know.
Scott: OK. That makes sense.
Britten: That’s a good question but it never really happens quite that way on the Enterprise side of support.
Sean: What it sounds like for you folks, is that from an enterprise standpoint, you can point to pretty clear statistics on customers who are significant accounts. Alternatively, you could point to customers who, for lack of a better phrase, probably spend a fair amount of revenue, right. In addition, it is probably easy to also look at the segment of the market and say, "We’re not going to give up 15% of SMS just because SMS, the latest version is out" ‑‑or in this case, obviously the rebranded version.
Sean: Because you folks can point to real metrics in terms of your percentage of the number of people who are putting in support requests and things like that.
Sean: Thanks for taking the time to talk with us Britten.
Britten: No problem.