Interviewee: Mark Osborne
- Challenges unique to building developer tools
- Changes to how Microsoft is now building its developer tools
- Using "Feature Crews" to build product
- Predictability, efficiency, and agility in software development
- Preventing "corner cutting" in development
- Defining and enforcing quality gates
- 600 features, and no single development methodology
- Managing dependencies
In this interview, Scott Swigart and Sean Campbell talk with Mark Osborne. Mark is an architect in Microsoft’s developer division, and focuses on the productivity of developers building Microsoft’s products.
Scott: Mark, thanks for taking the time to chat. Tell us a little about your role at Microsoft.
Mark Osborne: Sure. As an architect in Microsoft’s developer division, I’m responsible for the productivity of our developers. Basically, I think about what we can do at a systems and process level to make sure that the division runs smoothly and can get quality releases out on time.
Scott: Let me drill into that a little bit. When I talk to people at Microsoft, I get the impression that there’s an incredible amount of effort that goes into pretty much every line of code. Code gets written by a developer, it gets reviewed, and then there are all sorts of static analysis tools that get run over it, you have to make sure it doesn’t break the build, bugs are posted against it, and the developers have to be thinking about all kind of things simultaneously.
They have to consider performance. They have to make sure there aren’t going to be localization issues with code. They have to follow security best practices, coding standards, and all sorts of things. Being a developer, and knowing the kind of code I write, I think, “Man, it must be really daunting to come on board and write your first line of code at Microsoft.”
Mark: I think there is a lot of stuff a developer has to think about. You mentioned pretty much everything that you have to think about when you’re actually writing code. We have lots of training courses and lots of best practices available within the company. The developers use those as resources so they know how to write great code. And of course we have lots of people with years of experience that know what to do and what not to do.
But at the end of the day, there’s this kind of tension between needing to get product out the door, and all the stuff that you have to do in order to produce a quality product. So at a divisional level, we’re really looking at what can we do to put a process in place that basically helps developers do the right thing.
This way when schedule pressure is applied to them, for example, they have a mechanism to push back and say, “No, actually I need to get the performance right. I need to get the security right. We’re going to have to cut features or we’re going to have to slip the release, or whatever, in order to be able to release with quality.”
Scott: One thing that seems unique to Microsoft, and unique to the developer division within Microsoft, is that you’ve got people building the technology that they then use to build the technologies, right?
Scott: People are working on the .Net Framework, they’re using Visual Studio, but they’re building the next version of the .Net Framework and the next Visual Studio, which they can’t necessarily run as their development tool early in the process, but probably want to switch over to later in the process. How does that work if you’re building the foundation you’re standing on, to some degree?
Mark: [laughs] Yeah, we have to build a compiler to build the compiler. We have that kind of problem. We have a complicated build process because of that, but as much as possible we try and dog food the environment that we’re building. So we try and build the tools with the most current version of the tools.
We have a process called LKG, which basically stands for Last Known Good. Which means that the stuff that we use to compile everything with, and all of the tools that we use, we recompile those about once a month. This keeps us reasonably up to date, but also keeps the division from being shut down by a problem in the absolute latest build.
But generally, we’re pretty proactive about dog fooding our own stuff.
It depends which part of the product you’re working on. Some teams may need a second version of something. For example, the debugger team might have a second version of the debugger. If you break the debugger because you’re working on it, you need another version you can actually debug with. So there are a few teams that fall into that situation and need a safe version. But generally, people just use close to the latest stuff.
Scott: How do you guys know that you have a good version? My understanding is there’s a build verification test that runs nightly on the stuff that got checked in, but then there are much more in-depth tests that the product goes through, and it either gets labeled a broken product as a result of those tests, or considered to be a good build.
Talk a little bit about how long it is between the time that a piece of code gets checked in, and knowing whether you’ve got a good build that you can roll people forward to.
Mark: Yeah. We do have this concept of being able to get through the build clean without any build breaks, and then on top of that being able to get through a set of build verification tests. And those are intended to be a representative set of tests that cover most of the core functionality on the product, so that that build should be usable by anyone in any one of the product units in order to move forward.
Mark: So stuff at the periphery could be broken for a certain team, but you’re not going to take down a large amount of the division. We do a lot of work at the beginning of the product cycle to make sure the build verification tests are solid.
One of the big changes that we made at the beginning of the 2008 cycle was that we changed the way that we check in code. We used to have everyone check into a set of branches, and then those branches were regularly integrated into the main branch. What that essentially meant is you have in-flight code from a couple of thousand developers across the division all being squished together in main.
Even with build verification tests, trying to keep that stable was very, very difficult. It was just too much stuff moving at once. For Visual Studio 2008, we adopted a concept of feature complete and feature crews. Features are now developed in isolated feature branches. A team works to develop that feature, to test that feature, and there’s a set of about 15 quality gates that they have to go through before they can say, “This feature is complete.”
And only when that feature is complete can they actually check it into the product for integration into main. And in that way the mainline branches that we have for the product actually stay very stable, because what we have going in is only completed, well-tested pieces of functionality.
Scott: As a user on the outside, how would I perceive that change? When Microsoft was building Visual Studio 2005, you guys might have popped out a community tech preview, a CTP, every six weeks or so. And sometimes the CTP felt like it was very broken.
Scott: I think that’s because of what you said, right? It was a snapshot of what had been checked in. I remember people in Microsoft’s developer division messaging to the rest of the world, “OK. You’re going to notice a change with the 2008 product where the CTPs are going to be a lot more stable, but a feature that you maybe heard about, or are interested in, is just simply not in the CTP because it just hasn’t gone through all those quality gates yet.”
Scott: So do I have it? Is that an accurate way to describe how it works?
Mark: Yeah. If you work back from the processes that we put in place, and the reasons that we put those processes in place, one of our goals was to keep the main branch customer-ready. The idea was that the main branch would be customer-ready at all times. That way we could just snap a CTP, and release it to customers, and the customers could have a good experience.
Mark: What was there in a previous CTP should now not be broken, and the new features should come together in pieces.
Mark: You may not have an entire end-to-end scenario early on, but the functionality for a scenario would slowly come together in stable chunks over time.
Scott: I know that after 2005 shipped, Soma blogged about something called MQ, or Milestone Quality. It seemed like that’s where a lot of the reworking of the process happened.
Talk about how the process for building code in developer division changes. Because obviously you can’t just be changing the process on 2000 developers willy-nilly, or nobody could track with the current, correct way to do their job. So how do you go about changing the process itself, in a way that everybody can keep up with?
Mark: Yeah, you’re right. We went into MQ and we looked at what we’d done before, what had gone well, what had not gone so well, and we came out with three main goals. One is we wanted to be more predictable. Basically, being able to say what we’re going to do, and then actually go do it.
We wanted to be more efficient. We wanted to be able to push more work upstream and not rework stuff over and over again because we worked on quality too late.
Finally, we wanted to be more agile. We wanted to be able to be ready to ship when the time was right and not be held hostage by having hundreds of thousands of lines of code that needed stabilizing before we could ship.
So those were the main things that we were trying to achieve by the end of MQ. We went to the division and worked with the division to say, “Here’s what we’re trying to achieve. What changes would we have to put in place in order to be able to achieve that?”
Three things came out of that. The first is, don’t defer work. A lot of our problems with agility and predictability were because work was being deferred. We had an incredibly long bug tail. A lot of work was happening after we were supposedly code complete.
Mark: And then there might be a year of bug fixing and DCRs and that kind of thing. And another thing that we want to do, as I said already, is keep the main-branch customer ready. And we wanted to increase our investment in automation.
What you don’t do is dream all this up in an ivory tower and then try and dump it on everyone.
Mark: So we had a very inclusive process where we worked with the managers throughout the entire division. We came with some ideas and we worked with the management to refine those ideas. We also did benchmarking around Microsoft; we looked at what was happening elsewhere in Microsoft and brought some of those ideas in.
And we came up with three things. One was the way that we planned the products. We wanted less of a peanut butter approach, which is basically adding a little bit of functionality across the whole surface area. We wanted an intentional planning process where we decided what the value propositions for the customers would be, what those experiences would be, and we then broke those down into features.
I’ve talked about being feature complete before you check in, so we defined a set of quality gates, which is really a way of describing what it means to be complete. The feature crew concepts, we actually took from the Office product group, and we adapted it. The quality gate concept was something that Windows had started with and we adapted that. We merged those two together and added in some ideas of our own and then worked with the division to roll that out. We did a lot of pilots during MQ to refine that as well.
Scott: So there are two pieces of terminology to drill into. Define for me what a feature crew is, and also define or give some examples of actual quality gates that stuff has to go through.
Mark: Yeah, sure. A feature crew is essentially a small interdisciplinary team that is assembled to write a feature and to push that feature to a completed stage. It would normally be a program manager, a set of testers—maybe five testers—and a set of developers—maybe five developers. They work together from somewhere between three weeks and ten weeks to design, implement, test, and move the feature through all the quality gates.
The next question is, what is a feature? And for us a feature is…well, ideally it’s something that you can present to the customer; it’s a customer experience. But in reality, it’s an independently testable piece of work, because in some cases, when you start breaking things down, you may have a piece of infrastructure that is required by customer-facing features. That piece of infrastructure work itself is large enough that you might develop that as an independent feature.
A lot of work goes in to the planning stage, and breaking down what you want to present to the customer, and how to arrange that as a set of features, and what the dependencies between those features are.
And then in terms of quality gates, we’ve basically got simple things like having the functional specification complete, which is basically defining what the feature is going to do; having a design so developers know what the architecture is going to look like—how they are going to design that feature, what the components are going to be; having a test plan and defining how that feature is going to be tested; and having a threat model—basically understanding what the security implications of that feature are and ensuring that it is secure.
And then we have things like being test-complete, which is basically that all your tests—all the ones that you have designed—have actually passed against the feature. Test-complete also says that your automated tests are hitting at least 70 percent of the code. And then we have ones around static analysis, pre-fast, FXCop, and a number of other ones as well.
Scott: So now you’ve gone through the new process from end to end. When 2005 shipped, you reflected on the process you had up to that point, kept things that worked, changed things in processes where you felt like things could be improved, and have now spent years building a new product using this new process.
Where are you at now? Are you back to that period of reflection and figuring out how to evolve the process again or…?
Mark: Actually, this time, we’re actually very happy with the results. With 2008, we achieved the goals of predictability, efficiency, and agility that we wanted. There’s still room for improvement, but as far as the general process is concerned, we’re happy with the results. Our delta between when we wanted to ship and when we actually shipped is probably an order of magnitude better than it was for Visual Studio 2005.
This time we’re looking at keeping the same processes in place and just tweaking them and refining them and basically baking that into the culture of the division so they become the normal way of doing things.
Scott: It sounds to me like it’s a lot easier to—not to oversimplify it—to basically run a report and figure out how far through your development process you actually are. If you’re saying you know how many features you’re slated to build—and features can get added or dropped as a product moves along—and each feature has these quality gates, and you’re not deferring work, you can look at it and know where you’re at against the plan.
It isn’t a case where you’re saying, “Well, our feature’s done, but we’ve got all this testing to do, and all our unit tests are actually broken, so we don’t have the code coverage. But we think we’re done.” Right?
Scott: You’re not in that situation. It sounds like you really know better exactly how “done” everything really is, across the board.
Mark: Yeah, definitely. We had 600-plus features. We had, almost to a tenth of a percent, a very good indicator of how far through the project we were. We would have these planning milestones, where we would look at the features we intended to implement during that period, and then what we actually managed to implement, and then we’d look at what our forecast was, going forward.
We could use that data in order to re-plan, to reset expectations, to say, “Actually, we were being a little overly ambitious.” And based on what we’ve learned, we need to cut some stuff that we were planning to put in the release if we still want to ship by this date. Or maybe we need to add a little bit more focus in these areas, because things aren’t going as well as we might want to.
So yeah, breaking it down into features, and actually knowing that a feature was done when it was checked in, gave us high-fidelity data around which to do re-planning, which definitely helped us to be more predictable and actually hit our RTM goal on time.
Sean: Based on the primary research we’ve done over the last nine months, if you talk to somebody in the open-source community, they would say, “Well, we have rigorous processes for checking in code too.”
But they differ, right? If you’re talking about security, people will talk about “many eyeballs” versus the Security Development Lifecycle. Around usability, if you talk to someone from OpenOffice, they’ll say, “It’s the voice of our users on the mailing lists and the forums.” And there’s a certain legitimacy to everybody’s argument.
But I’m curious, having been through the whole process, end to end, how much of your ability to push through a level of engineering excellence do you think was predicated on having a closed-source organization, where you can really kind of set down some pretty clear guidelines, and have people follow them? You want them to play ball, and you want them to feel like their voice is heard. But do you think the same kind of rigor could have been applied in an open source model, where you can scrutinize the code, but you can’t control the process that was used to create the code?
Mark: It’s difficult to say, because when you’re running a large division, command and control is not the best way of getting stuff done. You’re really depending on the energy and excellence of the engineers. What we’re doing is trying to put into place a system that encourages people to do the right thing. They already knew what the right thing was, but like I said, they’re under conflicting pressures.
The business pressures to get a product out at a particular time can offset an engineer’s natural, innate desire to do excellent work. A lot of what we did with the feature complete concepts, and the quality gates, was really around putting a system in place, at a divisional level, that allows people to live up to that expectation. I’m not sure how much being a corporate entity versus an open-source project plays into that.
Sean: Can you give me an example of a time you felt the processes you put in place was able to assist a developer, or a team, to say, “I know I’d love to have this feature in, but we’re going to have to put that off, or we’re going to have to delay it to a later CTP, because I just don’t really feel we’re hitting the quality mark.”
Mark: Code coverage is a good example. It’s very easy to skip on the testing side of things. Testing tends to get squeezed during the product cycle. Being able to write all the automated tests and get the code coverage that you need can be very difficult. You’ve got all of these other things that you’re trying to balance. The normal thing that happens is development rushes ahead, and test gets left behind, right?
Mark: This happens all of the time. What happens in the structure that we have now is that there’s this feature crew, with some developers and some testers in it; if test gets left behind, the feature crew can’t complete. You essentially have five devs there who are stuck, and they can’t get out of this feature crew, because they haven’t reached their code coverage goals.
So what happens in that case is that rather than leaning on the test guys to do more automation at the end of the cycle, you start having the discussions at the beginning of the cycle about how everything’s going to get done, and how the feature’s going to be built to support automated testing.
You get load leveling. You actually get developers helping out to produce automated tests, and helping to get code coverage up. It’s a much smaller set of people. It’s not the entire division saying “We need to hit 70 percent”; it’s a group of ten people who are working closely together, saying “What could we do to get to 70 percent?”
When you’ve got a small number of people trying to solve a problem like that, it’s much easier to turn that around, find solutions, and be creative. You get this accountability for the quality of the feature that runs across the whole feature crew. Everyone pulls together in order to achieve that result.
Sean: Where did the Engineering Excellence team get its inspiration from? Was this driven by an internal feeling that things were not going right? When you had those initial meetings and you said, “We really need to fix this,” was there anything you were trying to model from?
Mark: Yeah. I think the real start of it was going through the Visual Studio 2005 development process and not feeling particularly excited about how it went at the end. The product that we produced was great, but we didn’t enjoy the process.
Sean: Right. The testers have knee bruises from being dragged across the finish line, and being told to eat pizza at midnight. You know what I mean? To get stuff done.
Mark: Yeah. So it’s really coming out of that and saying, “How can we do better? What can we do to really improve the game here?”
And I’ve got to give a lot of credit to Soma for sponsoring this effort, for actually stopping and saying, “We’re going to take three months out and think about this problem. We’re going to do an MQ. We’re going to invest in our engineering. We’re going to try and do better next time.” And it really is difficult, when you’re running a business, to say, “Let’s take three months out and work on improving our engineering processes.”
Scott: And you probably maybe only know the answer to this anecdotally, but having been through the 2005 development cycle and having been through the 2008 cycle, do you feel like you guys made up that three months, just in having a smoother, more predictable process?
Mark: Yeah, we were always aiming for kind of a zero-sum game, which was to improve the process enough to at least get that time back. We built a schedule for VS 2008 that had a far more aggressive endgame. We closed down much more quickly than we did for 2005, and we were able to do that because of the investments that we’d put in up front.
We got a lot of stuff into VS 2008, and we managed to get that stuff in on time and not have this sort of unpredictable bug tail at the end.
One that you mentioned was a threat model. And I think, OK, well, if you’re right-clicking in the IDE and that’s supposed to make a pop-up menu appear, is that the kind of thing that somebody actually has to do a threat model for? Or are there certain things that don’t really warrant certain steps?
Mark: The reality is, there are a thousand things you have to think about. And when we created the quality gates, it was really about questions like “Where do we really want to focus people’s attention? What are the mandatory things that we need to be successful, that are true for everyone in the division?”
So, really, the quality gates are mandatory for every feature, but the amount of effort that may go into satisfying a quality gate for each feature may be different, right? If you do the threat analysis on something that is deep in the user interface and is building on a bunch of framework code, then it may be quite easy to say, “Look, we’re not doing anything new or risky.”
Mark: We’re not producing any new threats. Threat model done, right? Whereas if you’re introducing a new wire protocol, then you’ve got a lot of work to do. But the important thing is, by having it on a list, by actually having a person have to go and check the box and say “Yes, I’ve done due diligence on this line item,” it doesn’t get missed for the features where it’s critically important.
Scott: Do you think there are cases where people were working on a feature and they thought, “There’s zero risk on this. This thing doesn’t listen on the network, it doesn’t parse a file. It doesn’t take user input, no threats.” But then when they actually got to that stage in the process, somebody said “Well, but what about this?” And people went “Oh, yeah, yeah, I guess there is an attack vector here that we maybe wouldn’t have caught if we hadn’t stopped to think about it.”
Mark: Yeah, and I think that’s really the point—to make sure that people at least sit down and think about it once. They don’t just sort of push it to the bottom of their priority list and never quite get to it. And also, especially for the threat models, we had a separate process to go and identity all of the features that had an elevated risk.
The ones that had bytes on the wire, and that kind of thing. We did an extra deep dive into the threat models for those features, to make sure that the due diligence had been done, and things were going to be secure. So we had that second level. Again, because we had this inventory of features, it was possible to do that process.
Scott: Like you said, there are a thousand things that people need to think of. When a developer comes in who’s never worked for Microsoft before, how do they get up to speed? I know that you guys have formalized training, but there’s a lot of on-the-job education that happens too. How do you get new people up to speed on the Microsoft way of building products?
Mark: I think it’s a combination of training, and the groups that they’re working with providing that peer feedback, code review, design review, and helping to provide them with the experience to do a good job. At a divisional level, we try and address the problems that are going to help the division to be more effective. But what happens at the team level we leave very much to the product units and the individual teams.
So we have a bar, we have a set of expectations that we expect them to meet, but how they achieve those expectations, how they run the teams, is really up to the teams. Just to give an example of what I’m saying here, a lot of teams use agile methods—Scrum and Test Driven Development—in order to run their feature crews, but at a divisional level we don’t mandate that they follow any particular approach. It’s up to that team to decide what the appropriate approach for the feature that they’re working on is.
Scott: Which to me is kind of fascinating, right? I mean, on one hand, it’s cool that you’ve got a lot of autonomy, and teams can figure out what they want. On the other hand, people get so religious about agile methodology or think test driven development is the only way possible to build code. You have this product with 600 new features, thousands of developers, thousands of people working on it, and one team might have said, “I don’t know, waterfall’s always worked fine for us.” And another team says, “Test-driven development is the only way to go.”
At the end of the day, regardless of which methodology they’re using, they’ve all gone through the same gates to get those particular lines of code checked into the source tree and building on the mainline build.
Mark: We need a process that enables the division to be effective and meet its goals, but we also need to allow the individual developers and the individual teams, and the business leaders in those areas, to be passionate and creative, and not be held down by process. So at a division level, we don’t want to micromanage what’s happening.
We want to provide just enough structure that it all comes together nicely at the end, while allowing the teams to be creative. This is the developer division, and development methodology is something that people are excited about in this division, so we want to allow teams to experiment, to try out different techniques, different development methodologies, different tools. We don’t want to mandate a one-size-fits-all approach.
But at the end of the day, we need all of those features to come together and fit together and be able to ship at the same time. So that’s really what our goal was in building this process.
Scott: I know that Microsoft takes pride in eating its own dog food, right? For the 2005 time frame, you had some brand new types of products. You guys were rolling out a new source control system, for example.
You were rolling out kind of a new command line build technology, MSBuild. You guys were rolling out work items and defect tracking.
Out of that stuff, work items, source control, and build system, what are you guys using? I realize that you have to have working tools to have a working build process, but at the same time, you want to be using your own stuff that’s in development, but you can’t really expect a source repository that’s in development to be bug free… So talk about that a little bit.
Mark: We try and dog food that stuff as aggressively as possible. It’s my job to make sure that we strike an appropriate balance here. The division has to continue to be productive and yet we want to be as aggressive as possible.
So in 2005 we had a good chunk, maybe a third, maybe a half of the division using the Team Foundation System source control system, and the other half was using Source Depot. In the next product cycle, we’re going to have the whole division using the Team Foundation System source control. The way we used both was through mirroring, where we have certain branches mirrored between the two source control systems, which allowed us to exist in both worlds at the same time.
That allowed us to be aggressive about our dog fooding, but also hedge our bets a bit. As far as the work item tracking was concerned, we used a similar approach: we mirrored everything between TFS work item tracking and Product Studio, which is our internal bug-tracking tool. And that enabled people to exist in both worlds.
For the feature planning, we entered all of the features into TFS. We could enter that it had completed quality gates, all of that was done inside TFS work item tracking. That system provided us with the ability to define new work item types, which we didn’t have with Product Studio. So we try and use as much of our own stuff as possible.
Scott: It seems like one challenge with feature crews is dependencies. If my feature is dependent your feature, but you’re not 100 percent done, then your feature doesn’t show up in the mainline build for me to build on top of. That seems like that would be a big challenge.
Mark: Yeah, the best approach to that is to identify those problems during the planning stage so that you can architect your features in such a way that it takes those dependencies into account. So either the thing that everyone’s dependent on is produced first, or the codependent components are being developed at the same time in the same feature crew.
If you can architect the order and the dependencies between the features correctly, then you can deal with the fact that the code has a lot of cross-dependencies between different components. If you fail to do that, because you didn’t anticipate it up front, then you end up using sort of backdoor mechanisms to share code, and then you have to synchronize when you check in the features.
We had a couple of instances where things were very complicated, and we had teams mixing code together outside of the checked-in product so that they could use each other’s interfaces, but then when it came to check-in, they had to stage the order so that one team went first and then the other team went afterwards.
I refer to those as coping mechanisms. Ideally, everyone plans this perfect feature crew model, where everything has nice dependencies and is done in serial and there’s no overlapping. And then there are these coping mechanisms which allow things to happen in parallel in real life when things don’t quite go perfectly, so there’s some built-in slack there that allows people to still get the job done even if they didn’t do a perfect job of planning.
Actually, this was a learning experience for us, so my hope is next time, when we’re doing the planning, we’ll think about these things more carefully and we’ll have fewer instances where we have to rely on coping mechanisms in order to get things done.
Scott: It seems like one of the natural outputs of this is that you guys would have to do more thought up front about the dependencies, because you wouldn’t be able to depend on partially complete code checked in.
Mark: Yeah, and that’s where it’s important to understand that a feature doesn’t have to be a complete chunk of user-facing functionality, right? You may want to slice it into little pieces so that you can start to satisfy some of those dependencies earlier.
Scott: Oh, sure. I imagine a lot of stuff in the .NET Framework itself is a feature, but it’s only a code, it’s only an API that other stuff is going to use.
Mark: Yeah. If it’s necessary, you might even create a shim or a subbed-out API so the other people can start coding against it, even though it’s not hooked up at the back end, and maybe another feature will then hook that up and complete the scenario.
Scott: This has been a great conversation. Thanks for taking the time to chat.
Mark: Thank you.