In this interview, Scott Swigart, Sean Campbell, and Richard Bowler interview Michael Howard, a senior security program manager in the Security Engineering team at Microsoft, and an architect of the security-related process improvements at the company. He is the co-author of many security books including the award-winning Writing Secure Code, 19 Deadly Sins of Software Security, The Security Development Lifecycle and Writing Secure Code for Windows Vista.
In this interview, Michael exposes how Microsoft developed the Security Development Lifecycle, which has decreased the number and severity of vulnerabilities in their products. Michael also directly challenges the notion of “many eyeballs” leading to secure code. Highlights include:
- With proprietary software, you can mandate the process.
- Security Development Lifecycle (SDL) was needed because of the pandemic of security vulnerabilities.
- Goals of the SDL.
- Quantifiable results of the SDL
- There’s denial about the vulnerabilities in non-Microsoft products.
- As Microsoft products are hardened, hackers are going after softer targets.
- If you think “many eyeballs” equates to secure code, you’re out of touch with reality.
- Training is key for building secure products.
- The SDL can’t just ban things, it has to provide better alternatives.
- The Microsoft’s central security team has stopped products from shipping because they weren’t secure enough.
- The Bill Gates memo that started it all.
- The next security frontier: social engineering.
Richard: Michael, thanks for taking the time to chat with us. I’d like to start by getting your take, if it’s applicable, on the advantages of open source or closed source in the context of security.
Michael Howard: I think to actually describe it as closed source or open source is really a bit of a misnomer. Really, it’s who controls the process and what the process is. And so, the implication is that closed source tends to have more of a documented process. Whereas the open source guys, perhaps, don’t. I’m not saying that’s true either way.
But if we’re looking at it from a Microsoft perspective, I think one of the advantages that we have here — a big, big advantage — is that we can mandate policies. “You will do this, and here are the criteria that you must meet before you can ship some software.”
The joke is, if you want to be involved in the Microsoft Salary Participation Program, you have to do this stuff. When people say to me, “How many security people have you got working on Vista?”, the answer is that it’s 6,000 people, because everyone has got at least a baseline level of security expertise. We enforce through education, and through the use of centralized tools and that kind of stuff.
With the Security Development Lifecycle (SDL), there are things that you must do. You can’t write code that uses banned functionality. And so on, and so on, and so on. We can enforce that.
I think that’s a big advantage that we have.
Richard: I think that’s the general perception, that process is more enforceable in a proprietary source environment than when you’re dealing with individual contributors in different locations.
Michael: I’ve talked to a few open source guys about what we do here and their comment was, “Well, that’s just draconian and stifling creativity.”
My response to that is, “No, that’s just discipline.” And fortunately, I think the days of software being written by cowboys are behind us. When I say “us,” I mean the industry, I don’t mean Microsoft, I mean the industry as a whole. I mean, you need discipline. And that’s what the process gives us.
Richard: So, about the SDL, can you give us some idea what gave birth to the idea of a set of security-specific processes blended with the SDLC seed that was already in place?
Michael: Five or six years ago, it was obvious to us that software development processes, regardless of who the vendor is, just didn’t give you secure software. We saw that by the sheer pandemic of security bugs across the industry.
Therefore, we needed to change the process, and we had three possibilities. One was to hope that the problem would go away, which was obviously not going to happen. Another one was to completely rip out the existing process and replace it with something brand new. Well, that’s not going to happen either, because, at the end of the day, Microsoft is actually good at shipping software. You can argue about whether it’s on time, that’s another matter, but the company as a whole is good at shipping software. That’s what we do. So, let’s not change that. The third option is taking some existing process that does ship software, and augment it with security requirements.
That’s what really led to the SDL, which is ultimately just taking whatever process you currently have, and layering security requirements on top, with security-related deliverables along the way.
If you have a waterfall model or a spiral model or an agile model, it really doesn’t matter. SDL layers on top of your existing process.
Richard: It seems that Microsoft is being a little bit evangelistic about the SDL, trying to convince others to put it in place in their own development process.
Michael: Well, actually, the funny thing is, to answer your question directly, yes. But to give you the more long-winded version, we’re actually being asked by a lot of companies as well. “What have you done?”
And I think that’s really interesting to see. It means that 1) people are actually seeing that we’ve made some progress and 2) they also recognize that they can possibly make some progress too.
So, it’s actually both. We would like to just go out there and evangelize the stuff, but frankly, we’re also being asked the questions as well, which is absolutely wonderful.
Richard: The SDL seems focused on process to limit the number of security bugs that get written in the first place, instead of the entering a test and fix mode at the end of the process. Can you talk about the relative success of that goal?
Michael: Actually, there are really two goals of SDL. Goal number one is to reduce the number of vulnerabilities in the code. And goal number two is to reduce the severity of what you miss. You’re never going to get all the vulnerabilities, ever.
The thing that sets security apart, more than other disciplines, is that the science is constantly evolving. The person on the other side of the table wants to take you down and will do whatever possible to take you down. It’s a very, very interesting field of expertise.
Getting back to the first goal, which is reducing the number of vulnerabilities in the code, there are really two parts to that. One is essentially going through all the code and just removing vulnerabilities.
For example, we went through the entire Window’s Vista code base, looked for banned function calls, and removed those banned function calls from Vista.
It doesn’t mean there was vulnerability there, but one thing I can tell you is that by removing them, if there was an undiscovered vulnerability there, there’s a good chance that it’s gone now.
That’s the first aspect, just going in and cleaning up and re-reviewing code that may have been written 10 years ago, when the threat landscape was very different. The other aspect is to not add new vulnerabilities to the code. That’s why we have education; that’s why we have tools; that’s why we have bug testing. The whole point is to catch problems before they are added to the code and put into the software.
And, frankly, one of the biggest parts is just education, making sure people are aware: “Don’t do this, instead do that, if you’re writing this type of code. These are some of the assumptions that you need to be aware of about the attackability of your system. Here are the things that you should be aware of.”
Now, that by itself, just raising awareness and holding people accountable for it, exponentially increased the quality of the code. That includes producing fewer bugs. I’m not going to say that we’ll ever get to zero, I’m not going to go there at all. But we have seen a huge improvement.
Scott: I guess, talk a little about that. What, quantifiably, have you seen being the results of SDL?
Michael: The cynic in me says I can always point to a pathological case. I can always show pathological data that will prove that anything doesn’t work. It’s really about a bell curve of products. Some products have done exceptionally well; some products still need to make improvements.
[One of the] products that have done exceptionally well is our Web Server, IIS. When you look at IIS6, it has had two security bulletins, and both of them are in components that are off by default.
Now what is fascinating about those two bugs is both of those bugs also affect IIS5. In IIS5, the vulnerable features are on by default. In IIS6, they’re off by default. One of the goals of SDL is to reduce the severity of what you miss. Well, both of those bugs are less severe in IIS6.
That is one example. Another good example is SQL Server. In Service Pack 3, they really went to town on that product, security-wise. And we have not issued a security bulletin in the SQL Server engine in over two years.
Last year, in the “Month of browser bugs,” IE7 came through completely unscathed. I can’t say the same for IE6 and IE5, but IE7 came through completely unscathed. And that is because of the security work we did. And I can attribute that completely to SDL. You look at Office 2007 and the current bugs that are affecting Office 2003 are just not affecting Office 2007. So there is a lot of progress that is being made.
Richard: I was going to ask you about whether there were bad practices in the past or whether it was just an absence of good practices. I think you pretty well made the case that it was just the absence of good practices, right?
Michael: I think an absence of best practices and a lack of awareness of the real issues. A lot of people are just not aware of the issues that are out there. And I think a lot of people believe that because they are running on a Mac or Linux that they don’t have security bugs and everything is fine, it is just Microsoft with security bugs. Which is clearly not true.
Scott: People who read this will be sure that I’m saying this out of bias, but it does really seem that the latest Microsoft products are significantly more secure than the latest major open-source products.
Michael: Well, it’s funny, I listened to a podcast yesterday at the gym, and they got going on about how insecure Windows was and how they run Macs and they are fine. It’s like “Oh my God guys, have you been ignoring the security updates for your machine?” The sheer lack of recognition of the pervasiveness of security vulnerabilities across the industry just blows my mind.
Scott: Are you seeing any trends where as the Microsoft products become hardened that hackers are going after softer targets?
Michael: Yeah, you bet. Absolutely. We have been seeing that for about two years. The big problem is you don’t read about it, you don’t hear about it. We do, we watch what is going on in the industry, and we are seeing other systems being compromised regularly. Just because they are not getting as much press doesn’t mean they are not getting compromised.
But, you know, to be frank, if we get bad press, I don’t mind so much, because it means that people will know they have to apply the patches. I think that is critically important.
Scott: Playing devil’s advocate here, how would you respond if I came to you and said “If Microsoft really wanted secure products they would just open-source them and let many eyeballs look at the code.”
Michael: Right. I would say you’re are not in touch with reality. Here is why: it assumes that, number one, you know what you are looking for. Number two, it assumes you want to do it. OK, let’s take both of them. So security expertise in general is pretty rare. There are not many good security experts out there.
At Microsoft we educate you; it is part of your job. Not everyone here is an alpha security geek, but one thing we do is raise the collective awareness of security issues across the board, and we teach people the core issues that they should be aware of. That is number one.
Number two, it assumes you enjoy reviewing code. I don’t know anyone who actually really truly enjoys reviewing code day in and day out. It is slow, it is tedious, and it is mind-numbing work. But here we can make it part of the job. We can say, “You have really got to review those 50,000 lines of code, it is part of your job.” We can make you do it.
Let’s say I want to play devil’s advocate back. If you say, “Open-source it to make it more secure”, in Apache 2.0 in the same time period that we have issued two bulletins in IIS6, both of which were in components that are off by default; Apache 2.0 in the same time period has issued many, many more security updates: over thirty. Where is the reduction in security bugs?
By the way, for what it is worth, in the same time period in Apache 1.3 they issued half as many security bulletins, which means that Apache 2.0 is actually buggier than Apache 1.3. Where is the many eyeballs effect? I don’t see it, I don’t see it anywhere. I see people who believe in it but I see no evidence that it actually works, none whatsoever. And I challenge people with this all the time: show me the evidence that it works.
Richard: So you keep going back to engineering training. You have brought it up four or five times in passing. How important is that? How big a percentage of the gain is due to engineering training do you think in the SDL?
Michael: It’s huge. It substantially raises people’s awareness about what is required of them, and it also makes them more aware of just some of the fundamental issues that are out there. The other thing is that it elevates all their beliefs about how secure it can be.
Scott: And with SDL too, you are not just talking about the coders. I mean it seems to me that security is now thought of all the way back to the very first sort of feature request or feature idea.
Michael: Absolutely, this applies to everyone who is in the engineering field. So we teach developers, testers, program managers, architects, the whole lot of them. You can’t just fix code and hope that the problems go away. I mean you could have a bad design. That is why we have threat modeling to help uncover those design issues.
So, the SDL isn’t just folks who are trying to get code right, it is the design as well, and testing. It is a whole cradle to grave commitment.
Richard: So I’m not a security expert, but I imagine the training boils down to best practices — things you don’t do in code that you may have done in the past, things that you must do in code… Is that accurate?
Michael: That is pretty accurate. There’s one typical failing of security people, they are very good at telling you what is broken, but not telling you how to fix it. One of the parts of SDL is that if you want to put something into the SDL you have to show that it is broken but you also have to provide a correct and appropriate way of doing things.
Imagine if we said “OK, these 10 functions you can’t use anymore.” Well that means that we have to come up with replacements for those functions that are actually better. You can’t just say, “You can’t use those functions,” you have got to come up with very, very prescriptive guidance about what you should replace them with.
So really the SDL stuff has two parts to it. One is best practice; the other one is requirements. Stuff that we detect you using, we can stop you from shipping.
Richard: So when you are doing the code reviews, particularly, I would imagine that for code that was written before the SDL was in place, what you’re looking for is the existence of those vulnerabilities that you have identified since, based on the way the code is written?
Michael: Yeah, there is more to it than that but that is really a good first-order take.
Scott: Are there examples of products where the security people at Microsoft have actually said “This won’t ship”?
Michael: Oh, yeah. The good news is that the customers don’t see that. They may just see a product slipping past a date.
Windows Vista is a prime example. I mean, it took five years to get that puppy out of the door. I don’t know exactly how much time I can attribute being related to security, but it is a lot. It is a heck of a lot, because we wanted to do all the right things. There are some other products I don’t want to name. I don’t want to hide anything, but there are products that, yes, we have said, “You can’t ship. You have got to make these changes before you can ship.”
Scott: So it is way more than just advisory, there’s really teeth to it. It is part of the sign-off process.
Michael: If there are no teeth to it, then what is the point? You have got just bark and a lousy bite, so what is the point? Now don’t get me wrong, we don’t enjoy stopping products from shipping. Microsoft is in the business of shipping software, so we don’t go, “Whoo-hoo-hoo! Hey, we slipped another product! Whoo-hoo! Didn’t we do great!” and high-five each other down the hallway. That is not the case at all. It is like, “Man, you know, these guys can’t ship. They have got to go and change these things. It is going to add another month to the schedule.” It sucks. But, you know, I would much rather have that than have critical bugs that could affect customers.
Richard: I noticed that part of the idea is that you have a separate group in the company that drives the security processes rather than just maybe adding a person per development team to be the security expert. How important is it that that group is separate?
Michael: We have a central security team, and that is the team that I am in. It is a sort of consulting organization for the rest of Microsoft. We set the policy and we actually do the enforcement of that policy as well.
The implementation of that policy is done by the product groups themselves. If they were self-policing then they may cut corners. I am not saying we don’t trust them, it is just good to have a separate organization setting the policy and enforcing the policy and verifying that they are actually complying with that policy.
Richard: It kind of avoids a conflict of interest where you are getting pressure to make it secure at the same time that you are getting pressure to ship it. Do you find it that way?
Michael: Exactly, and that tension is actually healthy. I mean, yeah, there are fights every once in a while, but it is healthy.
Richard: I understand that in SDL that you have that final security review (FSR) step at the end of the process. The idea is that if you find a lot of vulnerabilities at that stage, what you want to do is feed back into the previous processes and make them better for the next iteration.
Michael: Well really there should be no surprise in the final security review. The job of the FSR is really to make sure you complied with the SDL. For example, there are no banned APIs, you have run all the tools, used the right compiler versions, your traps are in place, and the team is educated.
And we do, on some products like the big products like Windows, Office, SQL Server, Exchange, and so on, we will also do penetration work. We will do code review as well. We will go back to the threat models, we will look at the highest risk components, then we may do some penetration work to watch code we put in that area, just have a look-see.
Once in a while we find something that you really should fix, but in general, today anyway, there are relatively few surprises in the FSR. But post-ship, if we find something, we feed that back in to the process, absolutely.
Scott: It seems like with security and with testing it is kind of a statistics game in terms of the number of bugs you find. There is sort of a “tip of the iceberg” phenomenon, right? If you found this many bugs in this much code you can extrapolate out a theoretical total number that probably exist in the product. One of the things I just saw was the number of vulnerabilities for Vista for the first 90 days, and how the number of vulnerabilities found in the first 90 days are thought to be indicative of the total number of vulnerabilities that will be found.
Michael: I don’t flat out disagree with it, but I don’t whole-heartedly agree either. I think, because this is an evolving industry, it’s hard to really quantify Vista at this point. With that being said, if we came out the gate with a dozen security bugs, that wouldn’t bode well. Right now, I feel pretty good. We’re seeing bugs coming in affecting down-level platforms that are just bouncing off Vista.
Richard: It sounds like it’s more than just a policy, it sounds like Microsoft has gone through a cultural shift. What kinds of things do you need in the culture for SDL to be effective?
Michael: People will ask me, “How do you manage to be remotely successful with this stuff?” And I can point to one big thing — and that is Bill Gates.
If you’ve never done this stuff before, ever, then you’ve got lots of security bugs. That means that you’ve got to start fixing those bugs for the next version, and that means you’re going to slip your next version. Slipping means money; let’s be frank. If you can’t get the senior execs on board for this, all they’re going to see is their product slipping, and customers not paying for new features.
But Bill said, “You know what? We’re going to ship more secure stuff, and if that means slipping, so be it. It’s going to be painful, for the first five years or so, while all the products catch up. But, after that it’s going to be easier. It just becomes part of the ecosystem.” It’s just part of getting the job done.
About five years ago, we put everyone in limbo. This was after Bill Gates’ memo came out. We put everyone in the company through security training. We also had VPs kicking off our training sessions to explain how important this was.
I’ll never forget one of the sessions that I gave to a bunch of developers with Rob Short, who was the Vice President of the Kernel in Windows. He said something that’s so true. He said, “Why is security something that only the high priesthood can understand? This should be part of getting the job done. There is nothing special about security. We all know about performance. We all know about customer satisfaction, we all know about having usable features. Secure features is just part of getting the job done. There is nothing special about security.”
I think we’ve got to that point now at Microsoft, where security is just in the water. It’s just part of getting the job done. There’s nothing special about it anymore. I think across the industry, it’s still something special, and I think that is dead wrong.
You walk down the hallway, and you hear people saying, “So what’s the most secure way of doing blah?” And then someone will say, “Well, what are the threats that concern you?” Fantastic! That is exactly the right way to think about it. You can’t answer that question, without knowing what the threats are. Once I know the threat, then that’s a question that can actually be answered. You get to a point of view where people don’t just have a knee-jerk reaction to the most secure way of doing things. They look at the actual real threat they want to mitigate.
Richard: Has the application of the SDL in Microsoft brought in more of a cultural of quality overall?
Michael: Yeah. You often find that if you fix a security bug, you probably made the code more reliable too.
People sometimes worry about making security changes because they think there may be performance degradation. One thing that we’ve learned is: don’t assume there’s performance degradation. Measure it before you jump all over it. The Application Compatibility guys often think there’s going to be compatibility issues for them making security changes. Let’s measure it! There are a lot of assumptions about security that turn out to be wrong when you actually measure them.
That’s another thing that we’ve learned. We’ve learned to measure more before we rush into judgment. There were people in the past that said, “Well, you can’t replace these banned functions with these safe functions, because the product will just run dead slow.” OK. We’ll do it, and measure it.
Afterwards, they say, “OK. No perceptible difference.”
Scott: One last question, what do you think is next on the horizon for security?
Michael: Social engineering — absolutely.
Richard: Social engineering? Define that a little bit. How do you see social engineering and security intersecting?
Michael: Well, I mean disclosure of data. Look at a phishing attack. You’re giving your data to a bad guy. Now the bad guy has your real password. So, just classic social engineering. The funny thing is that it’s really, really hard to solve it. You can do all the engineering that you want, but that won’t necessarily make any headway on social engineering attacks.
Richard: In other words you can harden the product all day long, but if you get somebody with knowledge to just hand it over, there’s only so much the product can really do about that?
Michael: Yeah. You can warn users, but users don’t like a bunch of dialog boxes. It’s a really interesting next area.
Richard: Thanks a for taking the time to chat.
Michael: Thank you.