Interviewee: Justin Erenkrantz
In this interview with Justin Erenkrantz we talked to him about:
- The Apache Foundation.
- Project Management Committees and the Apache Foundation.
- Some of the reasons Jason feels Apache has been so successful.
- What other open source projects might look to Apache for in terms of inspiration due to Apache’s longevity.
- The Apache Incubator and it’s role as part of the Apache Software Foundation.
- What is the on horizon for the Apache Web Server.
Justin: So my name is Justin Erenkrantz. I’m currently president of the Apache Software Foundation (ASF), also on the board of directors. And I’ve been a contributor to the Apache HTTP Server, the Apache Portable Runtime, and Subversion, and some other projects for quite a while now.
Scott: Talk a little bit about Apache and talk about how it’s built.
Justin: The Foundation as a whole has over 50 different projects. There’s the Web server, Tomcat, SpamAssassin, Geronimo. There’s a whole variety of projects. So there’s an overall Foundation, with committers for each project. They’re relatively isolated. The fact that I have access and I work on HTTP Server doesn’t mean that I have access to, say, Maven.
Each project kind of gets its own merit. Our culture is a meritocracy, and so we expect people to show up on the public mailing lists and start contributing, and eventually they’ll be recognized. Eventually they’ll get a vote, and this vote will allow them to be able to commit (code). Across the Foundation, we have about almost 1,600 committers that can commit to some of the 50‑something projects. And with that, you get to make changes.
You also get something called a veto, which is something that we can probably talk about a little bit later. That’s one of the core governance structures that we have. There’s also a thing called the Project Management Committees, and those are the groups that are responsible for each one of those projects.
Above that is the board of directors. And in comparison with some of the other open source organizations, our board at Apache doesn’t get involved with technical details. We’re not going to get in and say, "Oh, you need to change this variable name." That’s not at all what we do. We’re just there to make sure the organization is running, make sure that everybody’s happy and getting along. You won’t see a director really getting involved in a technical discussion, unless they’re a part of that project to begin with.
Scott: Does the board provide more of a steering functionality, then?
Justin: No, not even that. We’re almost completely hands‑off. There are times when personality clashes happen in each project, and try to mediate situations. Again, we’re not going to make technical decisions.
There are some things that are centralized. So one of my responsibilities, as president, is I’m responsible for the day‑to‑day operations of the Foundation. We have a Subversion server. We have issue tracking. We have websites. All of that is centrally managed by our infrastructure, too.
Scott: One of the things you mentioned was votes and vetoes as the way the project is governed. So expand on that a little bit. How does that come into play?
Justin: For example, there’s the HTTP Server project. Within that, it’s governed by a Project Management Committee, a PMC. Looking at your past interviews, other people have mentioned this. This kind of structure has replicated itself on how we have it at Apache.
These are groups of committers who are responsible for the project. On HTTP Server, there are maybe about 60 to 70 people on the PMC. Every single one of those people has what we call a binding vote. The votes are used in two main ways.
The first way is for release. Within Apache, across any of the projects — this is one of the hard and fast rules — there must be three binding votes before it can be released. That means we must find three people out of 60 people on HTTP Server to say, "Yep, this is a good release. We’re going to put the Apache brand on it. It’s going to be the ‘Apache HTTP Server’." That’s the release part of the vote.
The other part is a veto. If even one of those 60 people say, "This change is bad, I’m going to veto it," that means the change doesn’t make it in.
One of the ASF’s founders, Roy Fielding, refers to it as a kind of a shot gun. It’s kind of like, "OK, I’m done discussing this with you. You’re not listening to reason. Veto. Stop. We’re not going any further."
And it’s usually a last resort. Vetos are uncommon. They’re not something that happens every week or every month. Generally, if there does happen to be a lot of vetoes, it means that people aren’t willing to compromise. So, that may be something where maybe the Board of Directors might say, "Hey, you know, we might keep a close eye on it. Is there anything you need to talk about, do you need any help to resolve this…"
But, generally vetoes are relatively rare but they do give a power to the 60 people to say, "You know what, there’s not going to be any change that I disagree with, there’s not going to be anything where I say ‘Oh my God, I can’t live with this change being made’." So that’s an enormous power and it’s given to the members of the PMC.
Sean: We found these nuclear options in place across many projects, but they don’t get used very often. It that involvement in open source takes considerable time and you do it because you believe in moving the project forward. It seems like going into this, people know that the only way to make progress is by consensus. No one person is just going to get their way. And if they really want to, they have another option, which is to go fork the source. But that has extremely high barriers also.
Justin: Yeah, absolutely. But as I said, generally the vetoes tend to be very rare. It’s almost like mutually assured destruction. That’s the point. One thing that has happened a couple of times over the 10 or 11 years of the HTTP Server Project, is where there were vetoes on both sides. "We’ve got to do it this way. No, we’ve got to do it that way". There was this huge flame war for a couple of weeks and then they finally said, "You know what, we’re not going to agree". They even had a telephone call. They were doing everything, it was just this big mess. And eventually they said, "You know what, we’re just going to leave it to a vote. We can’t agree, we need to move forwards, this is blocking us. OK, we’ll go ahead and resolve this, whatever way the vote may turn out".
Scott: Talk a little about the history of Apache. How did it get its start and what are some of the major evolutions it’s had getting to the present point?
Justin: Apache has its roots back in the early days of the World Wide Web. The story begins with NCSA Web server from Illinois Urbana‑Champaign. They were running the NCSA Web server. Eventually, a lot of people left to go to Netscape. The NCSA code eventually got abandoned, more or less. I think there were nine people who found each other on Usenet and said, "Hey, I have a patch for NCSA. OK. Why don’t we start trading patches?"
They got together and they started to exchange patches and start coming up with a new version of this NCSA service. They started taking it in this new direction. They started saying "OK. Maybe we should get another group going." So they founded something called the Apache Group. It was an informal thing. They did that for about four or five years, starting in ’93 or ’92 (started in Feb ’95) Eventually they got to the point where other people said, "Hey, we like what you are doing."
By this point Apache had already gone through, [inaudible] a Web server, and made it up to version 1.2. They created The Apache Software Foundation in 1999, and started doing things besides just a Web server.
That was the start of the Apache Software Foundation. The early initial project was the Web server, and that is still what a lot of people think Apache is. Now you have close to 60 projects.
Justin: For the web server, more than anything, it’s been the way we designed and supported all the standards. And it’s free. That was the tagline that Roy Fielding had on his website for a long time, "Apache, the best web server money can’t buy."
One thing that really speaks well of our community has been the lack of forks. The community embraces anybody who shows up. The project has evolved and widened, from just the Web server itself, where people have wanted to do new things. People wanted to do an FTP server, a mail server. It can do all those things today. The community has been characterized by being willing to be open to just about anything.
Scott: It seems like open source projects that are modular do better because working on the core of an open source project might have a high bar. If it’s modular, you can write modules without going through the scrutiny of submitting code to the core. Modules give you a way to get your feet wet and participate. At the same time you need a really healthy community too. It is the personalities and way the governance structure is set up around it. It has to be really healthy as well. Just to say it back in my own words, those two things seem like they came together with Apache…
Justin: Yeah, and if you look at one of the key evolution points between the original NCSA server and Apache, it’s when (early Apache developer) Robert Thau modularized the whole thing one weekend. By and large, most of what he did 10, 12, 13 years ago is still present in the code base and technical architecture. By modularizing, he did a really good job of cleaning up the earlier NCSA code base.
Scott: So what is version 2 all about?
Justin: Version 2 was all about threading and portability. With Apache 1.3 they added Win32, Netware, and OS/2 support.
Version 2 started out with a number of internal forks. One of them looked at Netscape’s portability runtime. Other developers did their own portability library, implementing the same function on three platforms and hiding the implementation details."
There ended up being a licensing dispute with the Netscape/Mozilla guys that prohibited the Apache guys from using the NSPR runtime. That spawned the Apache Portable Runtime project. That’s a lot of different projects now, but if you look at why it happened, it had to do with the licensing issue. If you look at the Foundation now, I think one of the things we are well known for is the terms of our licensing. It’s a key differentiator from, say, the Free Software Foundation.
Scott: Right, right.
Sean: Apache’s been around for a really long time, and it’s obviously seen as one of the more successful open source projects, to say the least. What do you think other open source projects look to Apache for in terms of inspiration when they’re starting up?
Justin: I think by and large, what you see most people copying are the governance structures and the licensing. Those are two things projects have been copying. I think you can see that in Eclipse: they almost use some of the exact same terminology.
Scott: You still there?
Sean: One of the things you mentioned earlier that has always been intriguing to me, was portability. I can see that it’s really important for Apache to be portable between different Linux flavors, and maybe even be portable to embedded devices and things like that. How important is it, from a practical standpoint, that Apache runs on more than just Linux?
Justin: Extremely important. We have contributors who are only interested in supporting a NetWare or Windows or OS/2, even BeOS in the past. It’s where we’ve gotten some of the diversity of the community. It’s a hook to get people into the community. "Here’s a little something I know about, I know my operating system, and I’ll contribute this patch. Hey, there’s something else that may not be platform‑specific."
Scott: Does Apache take the standpoint that it should run equally well across operating systems? OpenOffice, for example, wants to be pretty much the same OpenOffice regardless of where it’s running. Does Apache run differently depending on…
Justin: Absolutely differently. Basically, our approach is in whatever platforms people want to maintain, that’s what gets supported. By and large, on the Apache HTTP Server, we have one guy who does the Win32, and it’s been his baby for many, many years. There are other people who contributed a little bit to the Win32, but he’s this one person who had been the individual who is responsible for it.
It’s not a dictate that, "Oh we have to support that." If someone is interested in supporting an OS, great! We’re not going to stop them, but it’s not going to be a mission statement, that we have to support all these platforms equally.
Actually, if you do look at our HTTP server mission statement, it says, "Apache HTTP Server Project is an effort to develop and maintain an open source HTTP server for modern operating systems including Unix and Windows NT." So, it’s in our mission statement, but the only reason it’s there is because we have the contributors to provide that support.
Scott: One other thing that varies from project to project is where the code comes from. If you take a look at MySQL, pretty much everybody working on it works for the MySQL company. If you look at other things like the Linux kernel, a lot of that comes from corporate developers: IBM, Red Hat and a lot of people. Do you have a sense for where the Apache code comes from? How much of it is from corporate‑sponsored developers versus the proverbial guy‑in‑his‑garage?
Justin: I think it comes from a wide number of sources. What you will see is that contributors remain the same even when they move from job to job. That’s definitely been the case within the HTTP Server, that’s been the case for some of these older projects as well. One day they may be working for IBM, the next day they may be working for Red Hat, and then they may be working for some other company. They may be working for Google, maybe doing it on the side, that’s what you tend to see. Some of these contributors may have started out working at Sun or HP, then they move but they’re still working on it. They still contribute to the project.
Scott: There are certain people who look at open source and they think it’s all written by people in their garages, contributing. Other people look at it and say, "It’s all written by people working for corporations." How important do you think big corporate sponsorship is to a project like Apache, and does that also create certain challenges for the project?
Justin: It’s a balance. You see some people who are getting paid to work on it. They work on it all day during normal business hours.
Then you see people who are the exact opposite, who may be working as a system administrator or something else, and they only time that they can work on it is on the weekends. So you see the overlap.
One of the key things in Apache, another quote from Roy Fielding is, "If it didn’t happen on the mailing lists, it didn’t happen." All of the discussions, all of the decisions, have to be made on our published mailing list. That allows people who may be in different time zones, or different work schedules, to coordinate through this mailing list.
They can read it during the day when they’re at work, during the night when they’re at home, whatever works for them. That way, decisions aren’t made in a face‑to‑face meeting, or a call, or an IRC, all the decisions have to happen on a mailing list.
Scott: So IBM, just to take a big company name, can’t get something into Apache just because they’re IBM and they want it. If one person out of sixty people vetoes it, it doesn’t really matter how badly a big company wanted certain code in, it’s not going in.
Justin: That’s right. The other aspect of it, the thing Apache has been addressing the last couple of years, is how new projects come into Apache through something called the ASF Incubator. This is about how they operate as an Apache project. They have to get all of the legal paperwork in place, so we can say, "Yes, we can release this under the common Apache license." That’s how we are trying to get new projects, and that’s why you’re seeing growth in the number of our projects, because incubator keeps spinning out new projects.
It’s always a concern that in order to graduate from incubator and become a full‑fledged project, you have to have diversity. Basically, you can’t have any one company dominate the project.
The rule we use, that you see pop‑up again and again in Apache, is the rule of three. There must be at least three committers that are diverse. The discussion that is going on right now is, "What is the definition of diverse?" An example: "Well, I work for IBM, and I work on this project full‑time, but there’s another guy from a completely different division who isn’t getting paid to do this who’s also working on it." Should that be counted as a separate individual? That is a discussion now. Some of these companies are so big, it’s like the old joke of, "Oh, you’re from London! You must know so-and-so."
Scott: In open source projects what gets checked in is the source code. With Apache, it looks like there’s this thing called the Apache HTTP Test Project. Is that essentially like a test suite for Apache?
Justin: Yes. Yes it is.
Scott: OK. And what’s that focused on? Is that mainly functional testing?
Justin: Yeah, it’s basically a Perl‑driven test suite, originally from the Mod Perl guys. They had this whole Apache test tool kit that they used as a kind of smoke test. And we said, "Hey, we’d like to take that." And we extended it from there.
Generally, what you’ll see is you’ll see people will use that as a kind of smoke test before they do a release. We talked earlier before about that you need to have three plus one in order to do a release. But we haven’t said anything about how people make up their minds, and say, "Yes, release this." And so generally what people do is they run tests on their favorite platform.
One of the things that we did with 2.0, and still do to some extent is "eat our own dogfood." In the early days in of the 2.0 series for the Apache HTTP Server, we would say, "OK, we have a release candidate. We’re going to put it up on Apache.org. We’re going to go run it for 72 hours and it can’t crash."
Basically that was kind of another way of doing the acceptance testing. Saying, "OK, we can run it on a site that gets this much traffic. It didn’t crash so it’s probably going to be OK for you."
Scott: Are there tests specifically looking for vulnerabilities like buffer overruns, or is that really outside of the scope…
Justin: In the past when there’s been some type of buffer overflow, or some type of CDE vulnerability, generally you write and check in a test to make sure it won’t show back up in regression. I think that basically depends on if we can come up with an easy reproducible test case.
But I think you won’t see some test cases there that are typically for the vulnerabilities.
Scott: A while back we talked to Michael Howard, who’s a security guru at Microsoft. There’s a lot of things they do, but one in particular was banning certain APIs like strcpy because they were just inherently vulnerable.
Justin: Yeah. Basically we do some things and put them in Apache 2.0 with the APR path (code base). [sp]
If somebody actually tries to call these functions, it’s going to expand onto, "Why are you trying to do this?" There have been some cases where we put it in the file to say, "Don’t do this. Don’t call this." Or say, "Oh you’re going to call this? Well then we’re going redirect you to a safer version."
But, we don’t have the flexibility of say a Microsoft, and say, "Oh we want to have this new security API in the operating system." That’s not something that we have influence on.
We generally have to look into the constraints of the operating system and work with that.
Justin: Over the past few years, I think there have only been one or two cases where there were remote root exploits, and that speaks well for us.
Scott: When people are posting patches, are the security implications discussed on the mailing list?
Justin: Oh, absolutely. You’ll get people saying, "Hey there’s something with this vulnerability, or this will break this or that." So, yeah there’s this constant vigilance for the security.
Scott: Are there a lot of security-related tests that are put in proactively? I hear about things like "fuzz testing" and other proactive ways to probe the surface area for vulnerabilities. Does that kind of thing happen….
Justin: There are security product providers, Coverity is the one that pops to my mind. They’ll say, "Hey, we ran our tool on your code and here’s a report of vulnerabilities." We take a look at the reports and say, "Thank you very much." And then analyze them ourselves.
But it’s really triggered by what the committers are interested in. We’ll see committers who are very interested in conformance to the protocol specs. We’ll see people who are interested in security, people who are interested in performance, etc.
But we don’t tell the committers what to be interested in from the top‑down. It’s more like, "John is interested in security so he’s really focused on tying up all the security issues."
Scott: I’ve bumped into companies like Coverity that use open source to market their tools because open-source provides a large, free code base they can throw at their tool.
Justin: Absolutely. Yeah, I remember when we first looked at Coverity, the amount of false positives were generally high. We’d look at the code and determine, "No, there isn’t a vulnerability there. What the tool is reporting can’t actually happen." There were maybe a handful of actual things that we said, "Yes this is an issue".
They may not have been as severe as what the tool was claiming, but we said, "OK, we’ll clean this up."
Justin: The IETF is forming a new working group to do an editorial revision of the HTTP stack. Their work might lead to the next generation of the HTTP protocol. And I think that is something that we will be very much involved with.
Scott: What about in the incubator?
Justin: Every Board meeting we’re graduating things like ServiceMix which is an ESB and component suite based on the Java Business Interface. One that is going to be new, probably the next board meeting, is a standard C++ library, which we’re getting from RogueWave. There’s things like Abdera, which is an Atom feed. We’re seeing a lot of things like ActiveMQ which is event‑based messaging.
What you probably see a lot in Apache are low‑level infrastructure type things. You’re not going to see things like, say, OpenOffice. You’re going to see things that people can pick and choose to build larger applications. I think that’s what our niche really is.
Scott: What’s your sense for what happens to an Apache Web server release between the time you’re done with it and it gets distributed by a Red Hat, Oracle, Solaris , etc?
Justin: We have contributors from Red Hat. A lot of the people who are doing distros and ensuring that it gets in front of the users are involved in our community. Our philosophy has been why are you making this huge patch set for this particular product? Get it upstream, get it back to us, we want to take it. I think generally, for the most part, you will see that there isn’t a lot of variation when it gets into the distribution because these people have been working with us.
Sean: Let me do a follow‑up on that. Some say the strength of a closed source project is that the company may be able to provide more of an integrated stack. You take a look at something like Microsoft ships and they might say, "You should use this because it’s an integrated stack, and there’s a single vendor you go to for support." Obviously there are some projects that are tightly coupled together, such as Suversion and Apache. What would you say to that from the open source side?
Sean: It seems like in open source, the communities are not isolated. There’s a fair amount of core maintainer communication that’s going on.
Justin: Yeah, I think you see that. I think that’s why you see a number of committers in multiple communities. As you get used to it, you start to follow the dependency chain, and you get into those communities and say, "Hey, I just broke this for you over here, but here’s the patch to fix it." You tend to see a lot of that happening.
Sean: Are there things you see in other open source projects that look interesting and might influence Apache?
Justin: I like what Ubuntu has been doing, where they say, "We’re doing a release every six months. (no matter what), we’re going to have a release." That’s a very hard thing to do. That requires some of the dynamics that Canonical has with their contributors. That’s something that I think they do really well.
Generally our philosophy has been, “we’re releasing when it’s ready”, and some think that’s a good philosophy. You don’t want to promise something, but then you think, "Well it’s been so long since the last version." There are all these changes that sit there and keep getting improved upon. But if you have the regular release cycles, I think that’s a good thing.
Scott: Justin, we’re out of time, but this has been a great conversation. Thanks for taking the time to chat with us.