<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
		xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
	xmlns:media="http://search.yahoo.com/mrss/"
>

<channel>
	<title>How Software is Built &#187; databases</title>
	<atom:link href="http://howsoftwareisbuilt.com/tag/databases/feed/" rel="self" type="application/rss+xml" />
	<link>http://howsoftwareisbuilt.com</link>
	<description></description>
	<lastBuildDate>Fri, 25 Jun 2010 19:53:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<copyright>2006-2007 </copyright>
	<managingEditor>scottswigart@technologyevangelism.com (How Software is Built)</managingEditor>
	<webMaster>scottswigart@technologyevangelism.com (How Software is Built)</webMaster>
	<ttl>1440</ttl>
	<image>
		<url>http://howsoftwareisbuilt.com/wp-content/plugins/podpress/images/powered_by_podpress.jpg</url>
		<title>How Software is Built</title>
		<link>http://howsoftwareisbuilt.com</link>
		<width>144</width>
		<height>144</height>
	</image>
	<itunes:subtitle></itunes:subtitle>
	<itunes:summary></itunes:summary>
	<itunes:keywords></itunes:keywords>
	<itunes:category text="Society &#38; Culture" />
	<itunes:author>How Software is Built</itunes:author>
	<itunes:owner>
		<itunes:name>How Software is Built</itunes:name>
		<itunes:email>scottswigart@technologyevangelism.com</itunes:email>
	</itunes:owner>
	<itunes:block>no</itunes:block>
	<itunes:explicit>no</itunes:explicit>
	<itunes:image href="http://howsoftwareisbuilt.com/wp-content/plugins/podpress/images/powered_by_podpress_large.jpg" />
		<item>
		<title>Interview with David Campbell &#8211; Technical Fellow &#8211; Microsoft</title>
		<link>http://howsoftwareisbuilt.com/2008/01/04/interview-with-dave-campbell-technical-fellow-microsoft/</link>
		<comments>http://howsoftwareisbuilt.com/2008/01/04/interview-with-dave-campbell-technical-fellow-microsoft/#comments</comments>
		<pubDate>Fri, 04 Jan 2008 01:03:26 +0000</pubDate>
		<dc:creator>campsean</dc:creator>
				<category><![CDATA[Sean Campbell]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[David Campbell]]></category>
		<category><![CDATA[enterprise]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[security development lifecycle]]></category>
		<category><![CDATA[sql server]]></category>
		<category><![CDATA[strategy]]></category>

		<guid isPermaLink="false">http://howsoftwareisbuilt.com/2007/12/19/interview-with-dave-campbell-technical-fellow-microsoft/</guid>
		<description><![CDATA[Interviewers: Scott Swigart and Sean Campbell Interviewee: David Campbell In this interview with David Campbell we talked to him about: His background with the SQL Server team and Microsoft What about planning, implementation, testing, delivery, and servicing do you think are unique to a product as high profile and critical as SQL Server. The Security [...]]]></description>
			<content:encoded><![CDATA[<p><b>Interviewers:</b> <a href="http://howsoftwareisbuilt.com/about-scott-swigart/">Scott Swigart</a> and <a href="http://howsoftwareisbuilt.com/about-sean-campbell/">Sean Campbell</a></p>
<p><b>Interviewee:</b> <a href="http://howsoftwareisbuilt.com/david-campbell-technical-fellow-microsoft/">David Campbell</a></p>
<p>In this interview with David Campbell we talked to him about:</p>
<ul>
<li><a href="http://howsoftwareisbuilt.com/2007/12/19/interview-with-dave-campbell-technical-fellow-microsoft/#DavidBio">His background with the SQL Server team and Microsoft</a></li>
<li><a href="http://howsoftwareisbuilt.com/2007/12/19/interview-with-dave-campbell-technical-fellow-microsoft/#SQLUnique">What about planning, implementation, testing, delivery, and servicing do you think are unique to a product as high profile and critical as SQL Server.</a></li>
<li><a href="http://howsoftwareisbuilt.com/2007/12/19/interview-with-dave-campbell-technical-fellow-microsoft/#SDL">The Security Development Lifecycle and SQL Server.</a></li>
<li><a href="http://howsoftwareisbuilt.com/2007/12/19/interview-with-dave-campbell-technical-fellow-microsoft/#Usability">Usability and SQL Server&#8217;s Development.</a></li>
<li><a href="http://howsoftwareisbuilt.com/2007/12/19/interview-with-dave-campbell-technical-fellow-microsoft/#OpenSource">His experience with OpenSource databases.</a></li>
<li><a href="http://howsoftwareisbuilt.com/2007/12/19/interview-with-dave-campbell-technical-fellow-microsoft/#OpenSourceDB">How much the experiences of mySQL and other open source databases have affected SQL Server&#8217;s development.</a></li>
</ul>
<p><span id="more-120"></span></p>
<p><b>Sean:  </b>Tell us a bit about your role at Microsoft    with the SQL Server Team and as a Microsoft Fellow.</p>
<p><b><a name="DavidBio"></a>David: </b>I have been working on SQL Server for roughly    13 years in a variety of roles but my passions are in systems and product    development. Currently, I am heading up a team we call “Strategy,    Infrastructure and Architecture” – SIA for short but I’ve worked on SQL Server    as an SDE (Microsoft lingo: SDE = Software Development Engineer), Development    Manager, Product Unit Manager and General Manager overseeing a number of    component groups within the product.</p>
<p><b><a name="SQLUnique"></a>Sean: </b>SQL is a product upon which enterprises bet    their business.  So much of what’s mission critical depends on SQL Server.     What about planning, implementation, testing, delivery, and servicing do you    think are unique to a product as high profile and critical as SQL Server?</p>
<p><b>David: </b>In the business world, enterprise database    products define “mission critical”. A crash in many line of business    applications may result in a disruption of service until the server or    application is restarted but a severe crash in a database server resulting in    data loss can cripple a business. I learned this lesson the hard way before    coming to Microsoft when I worked on database systems at Digital Equipment    Corporation (DEC). One day I received a page from our support staff and found    that a product defect had resulted in a production line shutdown for a major    semiconductor manufacturer. They had to send folks home; some of the physical    equipment suffered damage as a result of stopping the production with partially    completed batches of material and they reminded me that they were losing    roughly a million dollars an hour in business while things were stopped. </p>
<p>Getting to your question might require a little background    on SQL Server. Some folks might know that Microsoft hired a bunch of people    from the database industry in the early to mid 1990’s to work on SQL Server as    Microsoft tried to become a player in enterprise database systems. We acquired    a source license for Sybase 4.2 and shipped two versions, SQL Server 6.0 and    SQL Server 6.5, on that architecture. We hired a number of query processing experts    and they started building a completely new query processor from a clean sheet    of paper. I worked on the storage engine team and SQL Server was getting beaten    up in the market around this time since we didn’t support row level locking  as    the Sybase architecture we inherited only supported page locking. I was    responsible for the design of the row level locking feature for SQL Server 7.0    and the more I dug into the Sybase architecture the more challenging the design    became since the entire Sybase transaction and recovery system was predicated    on page level locking and it was very difficult to do a clean row locking    design without a number of major compromises. After a number of sleepless    nights and arduous design meetings we ultimately came to the conclusion that we    would have to rewrite much of the Sybase storage engine to do this feature    correctly. As part of this major architecture shift we made a decision to    change the on disk format for SQL Server 7.0. This meant customers would have    to unload and reload all their data as part of the upgrade to the new release.</p>
<p>So, now we have the context to start answering your original    question. We started what would become SQL Server 7.0 with 2 strikes against    us: It was really going to be a V1 product with a V7.0 name and we would    require every customer to completely unload an reload their data to migrate to    the new product. We knew that poor quality would mean strike three. Since we    had a number of people that had experience building enterprise database systems    we weren’t lacking in design knowledge so the real success factors for the    release came down to doing a great job of architecture, implementation and    validation. On the validation front we did a number of interesting things that    could probably fill a book but I’ll highlight a couple of them here. </p>
<p>Given that we were going to make everyone migrate their data    we knew that we need to make the data migration process both highly performance    and rock solid. We engaged our sales force to ask our customers to help us by    sharing their databases so we could convert them to the new format in the lab.    We called this the 1,000 DB challenge and we wanted to get 1,000 real customer    databases of all different sizes and complexity so we could run them through a    database conversion lab that we created. The other interesting thing we did in    this space was to write a playback system. This consisted of a capture utility    that would log the customer activity on a production database and then a     playback utility that would allow us to play back the actual customer workload    against the customer database in our lab. We could play the workload back in    “real time” which included the dwell and think time between queries, or    “compressed” where we just jammed the queries into the server as fast as we    could. In this mode we could stuff a day’s worth of work into the system in an    hour or so and really stress the server. We’d ask customers to take a backup of    the database that corresponded to the workload and to capture the actual work    using the capture utility and then send us the database and capture log. This    data allowed us to test the conversion of the database from the old version of    SQL Server to the new version and also let us test the new version of SQL    Server by replaying the actual customer workload on the new system. Later on we    started to capture query performance of replay on the old vs. new version of    the product to find performance bugs. </p>
<p>The next interesting thing we did on the validation front    came from Don Slutz, a long time database veteran that was working in Microsoft    Research at the time. He wrote a program he called RAGS that was really a model    based testing system that used the SQL language grammar and the schema of an    existing database to generate bizarre, but syntactically legal, SQL statements    and feed them into the query processor. Basically, he married the state domain    from an existing database schema with that specified by the SQL grammar and was    able to probe all the dark corners of the search space programmatically. He    wrote an MSR technical report that is available on the Microsoft Research we    site. The way this played out was pretty interesting. At first it was pretty    easy for Don to crash the query processor. So he filed a bunch of bugs, the    developers fixed a bunch of bugs then Don ran it again, etc. After a couple of    iterations of this cycle RAGS needed to generate some pretty ugly queries to    crash things. You’d wind up with these 5 page SQL queries that looked like    random gibberish but was legal SQL syntax and the query processing team would    spend a bunch of time figuring out what the query was supposed to do and then    figure out why it crashed the system. Later on Don used RAGS against different    versions of SQL Server and other database products generating queries over    equivalent schemas and comparing both the results and the performance of the    different products across a wide range complex queries.</p>
<p>I could list more if you guys want to write a book…</p>
<p><b>Sean: </b>It’s hard to imagine a product that has higher    security requirements than a database server.  It has to talk on the network,    and it has to be able to store sensitive information.  Since the introduction    of the SDL, SQL has seen a dramatic reduction in vulnerabilities.  How does the    SDL play out on a day to day basis?  How does it affect your architecture?</p>
<p><b><a name="SDL"></a>David: </b>Great question! To fully appreciate the change    you have to understand that 10-20 years ago there was very little widespread    security knowledge in the software development world. In some sense the    environment didn’t require it; most systems weren’t interconnected and remotely    accessible. This meant that security breaches required typically physical    breaches and people had a model in their head for physical security. It was    easier to understand locks on doors than buffer overruns. What is interesting    about the SDL and SQL’s evolution is that we’ve gone from a period in which    security review and validation was done mostly after the coding was done to    today’s world where we formally design security threat models as part of the    design process before writing code. We also have a great training program in    place that everyone touching the code needs to take. We also have refreshers    that developers must take for emerging threats. Additionally, we’ve developed a    great set of static code analysis and run-time tools to avoid and detect    potential security issues. In 2002 we had a “security push” where the entire    development team stood down and reviewed every line of code in the product. We    did a smaller push in 2004 for SQL Server 2005 and with our current release we    have truly integrated the security best practices into the development process    and don’t need a separate security push as security is simply part of our day    to day process. One challenge in the security space is that the threats are    constantly evolving so we can’t rest and, as long as the bad guys are learning    new tricks, we need to up our game and having a process in place for rapid    mitigation in the event of a new threat or vulnerability. </p>
<p>In terms of how security has affected our architecture we    are much more mindful of the threat environment each line of code is executing    in. For example, there is a small amount of code that performs the initial    client authorization before allowing a connection into the server. Since a    remote client executes this code pre-authorization, any remote code that can    access the port over the network can execute this portion of code.    Architecturally, we strive to keep this code to a minimum and it is very    thoroughly reviewed. Similarly, the security base is designed in a layered    fashion so you have smallish amounts of thoroughly reviewed code providing    services that other aspects of the security system are built upon.</p>
<p><b>Sean: </b>How would you respond to a statement like, “If    you really wanted to make SQL secure, you’d forget about the SDL and just    open-source it so that many eyeballs could look at the code.”</p>
<p><b>David: </b>I think there is some benefit to having more    eyes on a piece of code. There is benefit from each person’s fresh perspective,    benefit from varied knowledge, etc. However, eyeballs alone are not nearly    enough. The SDL process evolution has been really interesting in that we have    changed the culture, habits, and processes of every Microsoft developer in a    way that is much more effective than having many otherwise competent    programmer’s looking at the code. With SDL we have many eyes looking at threat    models, many looking a the design of a security feature and ultimately, many    looking at the code following a pretty rigorous and proven process. I think    there’s sufficient independent objective evidence to say that SDL is working    for us. If you have a few minutes go to the National Vulnerability Database at    nist.gov and check out the vulnerability reports for SQL Server vs. other    databases. I’ll warn you, if you try to search for Oracle flaws they are hard    to find since every major and minor release of the Oracle database server is    listed separately so you need to do a little aggregation to get the real    picture. </p>
<p><b>Sean: </b>It’s one thing to design powerful functionality,    it’s another to make it easy to use.  Talk about how Microsoft insures not just    functionality, but usability.</p>
<p><b><a name="Usability"></a>David: </b>OK. Now you’ve hit on one of my personal    passions. I believe that many technologies go through stages of evolution as    they mature. I define three major phases – nascent, developing, and refined.    There are many examples that can serve as a lesson for software developers. </p>
<p>Consider televisions; thirty to forty years ago when TVs    were a nascent technology you almost needed to be a technician to own one.    Certainly, you needed to know how to take the back off and pull the tubes out    so you could go to Radio Shack and test them when one of them went bad.    Furthermore, there were knobs on televisions that were there solely because the    technology hadn’t matured to the point where they weren’t necessary. I often    ask audiences how many people miss the horizontal and vertical control knobs on    their TV and we’re getting to the point where many in the audience don’t know    that these knobs even existed to keep the picture from rolling and waving in    early TVs.</p>
<p>Twenty years ago TVs entered the developing age as they    became fully solid state. They were much more reliable and the technology    developed to the point where many of the knobs disappeared. In this phase they    were good enough for the masses. You didn’t need to be a technician to own one    but it was nice to have one in the neighborhood. Frankly, I sort of think this    is where PCs are today.</p>
<p>Today’s TVs are refined in that the user’s control surface    captures the user’s intent rather than exposing the control surface of the    underlying technology. For example, instead of fiddling directly with color    temperature, saturation and hue to adjust the picture, my TV has a control that    asks me if I want to watch sports, movies, or regular programming and adjusts    the color parameters accordingly. Furthermore, some TVs are aware of the operating    environment such as ambient light and compensate for that. You can do this same    thought experiment on automobiles, microwave ovens, etc. When viewed through    this perspective, most system software still has a long way to go to become a    refined technology. Of course, there are cases where these maturity phases    ripple and repeat through a single technology where advances happen in waves. I    think my new Smartphone is an example of that. It’s way more capable than my    previous cell phone but I never had to reboot my earlier one.</p>
<p>So, how are we doing on SQL Server? One simple example is    the work that we did in the database engine in SQL Server 7.0 when we got rid    of many of the knobs and made many of them self-tuning. SQL Server 6.5 had    roughly 100 knobs whereas SQL Server 7.0, which was much more complex in many    ways, had roughly 20. Many people thought that more knobs meant more control    but, in reality, we found many systems were performing poorly in the field due    to mis-configuration. We classified the knobs into those that should simply    take care of themselves – things like the number of locks the server could    allocate when it booted, the number of hash buckets in the cache manager, etc.    These were our “horizontal and vertical control knobs”. For other knobs we set    them up where the database server managed them by default but if an    administrator wanted to impose constraints he could. The amount of memory    allocated to the server was in this category; by default, SQL Server manages    memory in cooperation with the demands in the operating system but you can set    low and high watermarks if you want to. In other areas we used control theory    and feedback loops to have the system adapt dynamically to the environment and    control things based upon instantaneous system response. The adaptive systems    work we did was really interesting in that existing control knobs were often    static; a good example is “sort memory”. In many database systems before SQL    Server 7.0 you set aside a portion of memory for sorting to perform queries and    build indexes, etc. During times where you weren’t building indexes or didn’t    have queries that required a sort that reserved memory was just laying fallow    and wasted. Further, if you had a sort that needed a little more space than    what you had reserved it would spill to disk because the sort wouldn’t fit in    the reserved space – even if there plenty of memory unused elsewhere in the    system. In SQL Server 7.0 we created an internal memory broker that could use    server  memory for whatever purpose made the most sense over time so things    like the procedure cache, workspace, sort, and the buffer pool all cooperated    to use memory in the most efficient manner over time. The result was fewer    knobs and a more efficient system.</p>
<p>This was great work but we ran into a situation where we    were ahead of the market in many respects. DBAs were worried we were going to    put them out of a job. They thought they were getting paid, in some part, to    respond to their pager at 2:00 AM because a big batch job failed because they    hadn’t allocated enough lock blocks for the server. Our competitors also used    this maturation against us – things like, “How can SQL Server be a real    enterprise database system – it only has 20 knobs and our system has 500!”.    What’s interesting is that if you look at the major database systems they have    all made major investments in self-tuning and ease of use.</p>
<p>Market dynamics also demand that we make these systems    easier to use and self managing. As an industry we’ve expanded database    deployment from an installed base that was likely measured in the small    100,000s of units 20 years ago to one that is likely measured in 100,000,000s    of units today. They had better be easier to use than 20 years ago!</p>
<p><b>Scott: </b>How do customer needs and requirements make it    into the planning process?  How do you handle situations where the customer is    asking for the wrong feature?  (The customer asks for a setting so they can    tune X, and you realize that if a subsystem was redesigned, they wouldn’t need    to tune X)</p>
<p><b>David: </b>We have had many situations where customers    have asked for features that they have seen in other systems that didn’t make    sense for SQL Server. Often customers ask for performance features that other    products have that may provide a large advantage in those products but, given    the way that SQL Server is architected, these same features may provide little    to no benefit on our architecture. One example is raw device and partitioning    support. 20 years ago many UNIX systems didn’t have advanced I/O features such    as the ability to avoid the file cache, scatter/gather I/O or great    asynchronous I/O. In fact, many early UNIX file systems had 32 bit file offsets    so the maximum size of an individual database file could only be 2 or 4 GB in    size. NTFS, in contrast, supported 64 bit file addressing, great asynchronous    I/O, and the ability to do unbuffered I/O from the start. So, whereas other    products needed separate partitions over multiple files with an I/O thread per    file to simulate asynchronous I/O – SQL Server didn’t need any of this. This    didn’t prevent customers from asking though. Of course, once you get up to very    large databases, partitioning makes sense for a number of reasons such as the    ability to physically manage large tables in index in smaller pieces but the    point is that SQL Server didn’t need partitioning to get I/O parallelism in the    same way some other systems did. We had to educate many customers on these    points so they understood how our architecture achieved the performance that    other systems did but through a different architecture.</p>
<p>I’m also mindful that control doesn’t always represent    progress and often simplicity is the best approach – especially when it affords    an opportunity for the software to do a better job. I think one good example    involves a feature known as “tempdb in RAM”. Prior to SQL Server 7.0 you could    allocate a region of RAM to hold the temporary database which is used for    scratch tables and intermediate query results. In certain environments, placing    “tempdb in RAM” could provide a significant benefit given the way that SQL    Server 6.x was architected. Unfortunately, it was easy to under or over    allocate the amount of memory and wind up spilling to disk or paging    excessively. Another point is that the amount of RAM allocated to tempdb was statically    determined; once set you needed to reboot the server to change it. Since the    amount of space needed for tempdb varies depending on the workload, the optimal    caching strategy for tempdb is dynamic. We removed “tempdb in RAM” in SQL    Server 7.0 and did a number of other optimizations under the cover to better    manage tempdb pages in memory so the actual customer result was much, much    better across a wide range of scenarios for SQL Server 7.0 but customers who    had improved their performance 2-3x by using the tempdb in RAM feature in SQL    Server 6.5 screamed loudly. I finally wrote a long mail and included some    experimental results that proved that the new approach was better but I still    received hate mail for several years after that decision.</p>
<p> <b>Scott: </b>Areas like the developer division have    strived for greater transparency.  In open source, all development and    decisions are transparent.  Talk about how SQL Server views transparency during    development, and how you have to balance expectations from customers who want    full transparency, vs. not shooting yourself in the foot by disclosing early    and giving a closed source competitor like Oracle an advantage?</p>
<p><b>David: </b>You touch upon a very real challenge. SQL    Server has matured to the point where we do a very good job on the fundamentals    and, as a result, it’s more important that we listen to and work with our    customers to continue to produce a product that helps them better run their    business. We’ve talked mostly about the core relational database engine thus    far but today’s SQL Server includes data analysis, data mining, reporting,    enterprise class ETL, etc. The solutions we deliver in this space touch a    broader range of customers and the pace of innovation is much faster than that    seen in the core relational database engine. As a result, we need to be much    more in tune with our customers to produce the right product. We’re making some    real progress on including key customers and experts in our design process. For    some of our more complex SQL Server 2008 improvements we’ve done joint design    with key customers and MVPs and they’ve helped us make many of the tough    scenario and design tradeoffs required to deliver the right feature. Done well,    this reduces the need for iterative field testing to get solid feedback on new    features.</p>
<p>Disclosing early is a real risk and, yes, our competitors do    listen and respond. It’s funny, they pay much more attention to what we say now    than they did 10 years ago.</p>
<p><b><a name="OpenSource"></a>Sean: </b>What experience do you have with Open Source    Databases?</p>
<p><b>David: </b>Yes, I have experience in open source and    certainly follow the key open source databases from the perspective of    technology evolution and adoption. I don’t look at any source code from the    open source databases to avoid any potential IP issues. </p>
<p><b>Sean: </b>What do you think are some things that are easy    to accomplish in a closed source model that would be challenging in open    source?</p>
<p><b>David: </b>I think one thing is something I would call    “consistency in the large”. This isn’t necessarily easy in the closed source    world but in a coordinated engineering environment you can align things in a    way that lead to a degree of consistency which creates customer value. For    example, the products within Microsoft’s Server and Tools business have a set    of “Common Engineering Criteria” (CEC) that we all follow. The criteria include    things such as common processes and UE content that lead to a more uniform    customer experience and aligned features, such as a Best Practices Analyzer or    having a management pack for our management tools when we RTM. This sort of    broad consistency doesn’t happen organically and is one of the challenges that    large scale open source efforts have to wrestle with.</p>
<p><b><a name="OpenSourceDB"></a>: </b>How much is SQL development and features informed    by things like MySQL and DB2?  Are there features in SQL that were inspired by    these products?</p>
<p><b>David: </b>Certainly; and it goes both ways. Things like    persisted views have been done by all major database players. I can’t recall    whether Oracle or IBM did it first but now SQL Server, Oracle, and DB2 all have    a form of persisted views with varying degrees of updatability and query    matching sophistication. Oracle was the last major database player to have a    credible fully cost based optimizer. I would say that SQL Server led in the    advancement of real self-tuning and ease of use. Unfortunately we did it in    1998 and tried to sell it to a market that didn’t understand its value and IBM    did a great thing (for IBM) later on by coining the phrase “autonomous    computing” and selling the value. I don’t think MySQL has contributed too much    yet to the big 3 but I do like MySQL’s notion of installable storage engines    with implementations optimized for various scenarios. What’s interesting is    that SQL Server 7.0 was architected to support this concept with the OLE-DB    interface between the relational and storage engine but MySQL has really gotten    value from the concept.</p>
<p><b>Sean: </b>If you had the opportunity to borrow more from    the development model of the open source community when it comes to SQL Server    what are the top couple of things you would like to bring over to SQL Server    development that you see out there today in the Open Source community?</p>
<p><b>Scott: </b>In general I like the notion of tapping into a    large development community’s collective energy. Different people have    different passions and, if you can harness the energy effectively, it can lead    to some interesting results. For example, if someone gets excited about adding    a specific feature or fixing a particular bug they can often just make it    happen. Of course, one challenge in the open source world is maintaining    architectural and feature consistency. Smart people don’t always agree on    what’s important for the customer or know how a particular feature should be    implemented in a way that maintains the system’s design tenets. I think a key    aspect of great design is in providing the most value with the simplest and    most intuitive implementation and user model. The open source project    maintainers have a very tough job keeping this in check.</p>
<p>Another aspect of the open source world which is powerful is    the ability to generate an ecosystem of tools and add-ons around a core    technology. This is an ecosystem effect that may or may not require open source    access to the core technology. For example, imagine if MySQL were a closed    source project but that it had a vibrant open source community building    installable storage engines for various scenarios. We are starting to do some    of this with SQL Server through Codeplex. </p>
<p><b>Sean: </b>How much do you think SQL Server’s development    has been impacted by Open Source development efforts in terms of how they build    community around various database products?</p>
<p><b>David: </b>Enabling the community to help the community    and giving the user base a direct voice to the development team is one of the    most exciting and powerful concepts to come out of the open source development    model. We have learned a lot from this and in our product review meetings we    regularly review input from our community including how they vote on design    change requests. Rather than us guessing or asking a small sample we can now    reach out to our community quickly and efficiently. Frankly the Internet and    access to our community have enabled us to produce a much better product.    Things such as Watson, SQM, and our community product provide us direct    feedback that allows us to respond in ways we couldn’t even imagine 10 years    ago.</p>
<p><b>Scott: </b>Give us some understanding of the structure of    the SQL Server development team in terms of some high level estimates in terms    of the number of testers, developers, number of machines in the test lab,    length of time it takes test suites to run, etc.  Just so folks can put the SQL    Server development effort in perspective to other efforts they are familiar    with.</p>
<p>I do not have exact numbers in my head but let’s just say we    have 100’s of developers, 100’s of testers and probably 10,000 machines in    various test labs. We literally have millions of individual test cases and the    test system is highly automated. Many of our test machines are in offsite data    centers and we’re able to connect to and control them via IP KVM. We have    evolved our test methodology to include much more model based testing rather    than individual test case generation&#8230; One interesting thing is that we    completely revamped our development methodology between SQL Server 2005 and the    current release. This was a huge cultural and process change and we have    learned a lot from this experience. Perhaps we can chat a bit out this next    time.</p>
<img src="http://howsoftwareisbuilt.com/?ak_action=api_record_view&id=120&type=feed" alt="" /><!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a><strong><em>Bookmark this:</em></strong></a>
<br />
<div class="d">
<br />
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2008%2F01%2F04%2Finterview-with-dave-campbell-technical-fellow-microsoft%2F&amp;title=Interview+with+David+Campbell+%26%238211%3B+Technical+Fellow+%26%238211%3B+Microsoft" rel="nofollow" title="Add to&nbsp;Del.icio.us"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/delicious.png" title="Add to&nbsp;Del.icio.us" alt="Add to&nbsp;Del.icio.us" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2008%2F01%2F04%2Finterview-with-dave-campbell-technical-fellow-microsoft%2F&amp;title=Interview+with+David+Campbell+%26%238211%3B+Technical+Fellow+%26%238211%3B+Microsoft" rel="nofollow" title="Add to&nbsp;digg"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/digg.png" title="Add to&nbsp;digg" alt="Add to&nbsp;digg" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.facebook.com/sharer.php?u=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2008%2F01%2F04%2Finterview-with-dave-campbell-technical-fellow-microsoft%2F" rel="nofollow" title="Add to&nbsp;Facebook"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/facebook.png" title="Add to&nbsp;Facebook" alt="Add to&nbsp;Facebook" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://reddit.com/submit?url=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2008%2F01%2F04%2Finterview-with-dave-campbell-technical-fellow-microsoft%2F&amp;title=Interview+with+David+Campbell+%26%238211%3B+Technical+Fellow+%26%238211%3B+Microsoft" rel="nofollow" title="Add to&nbsp;reddit"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/reddit.png" title="Add to&nbsp;reddit" alt="Add to&nbsp;reddit" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.stumbleupon.com/submit?url=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2008%2F01%2F04%2Finterview-with-dave-campbell-technical-fellow-microsoft%2F&amp;title=Interview+with+David+Campbell+%26%238211%3B+Technical+Fellow+%26%238211%3B+Microsoft" rel="nofollow" title="Add to&nbsp;Stumble Upon"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/stumbleupon.png" title="Add to&nbsp;Stumble Upon" alt="Add to&nbsp;Stumble Upon" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.sphere.com/sphereit/http%3A%2F%2Fhowsoftwareisbuilt.com%2F2008%2F01%2F04%2Finterview-with-dave-campbell-technical-fellow-microsoft%2F" rel="nofollow" title="Add to&nbsp;SphereIt"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/sphereit.png" title="Add to&nbsp;SphereIt" alt="Add to&nbsp;SphereIt" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://twitter.com/home/?status=Check+out+Interview+with+David+Campbell+%26%238211%3B+Technical+Fellow+%26%238211%3B+Microsoft+@+http%3A%2F%2Fhowsoftwareisbuilt.com%2F2008%2F01%2F04%2Finterview-with-dave-campbell-technical-fellow-microsoft%2F" rel="nofollow" title="Add to&nbsp;Twitter"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/twitter.png" title="Add to&nbsp;Twitter" alt="Add to&nbsp;Twitter" /></a>
<br />
</div>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://howsoftwareisbuilt.com/2008/01/04/interview-with-dave-campbell-technical-fellow-microsoft/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Interview with Josh Berkus &#8211; PostgreSQL Core Team Lead &#8211; Sun Microsystems</title>
		<link>http://howsoftwareisbuilt.com/2007/08/22/interview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems/</link>
		<comments>http://howsoftwareisbuilt.com/2007/08/22/interview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems/#comments</comments>
		<pubDate>Wed, 22 Aug 2007 16:35:17 +0000</pubDate>
		<dc:creator>campsean</dc:creator>
				<category><![CDATA[Sean Campbell]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[Josh Berkus]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[sun microsystems]]></category>

		<guid isPermaLink="false">http://howsoftwareisbuilt.com/2007/08/22/interview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems/</guid>
		<description><![CDATA[Interviewers: Scott Swigart and Sean Campbell Interviewee: Josh Berkus In this interview with Josh, PostgreSQL Core Team Lead at Sun Microsystems Inc., we asked him about: How PostgreSQL fits into the landscape of open source databases New uses of PostgreSQL How new code and features are selected for PostgreSQL Quality control and maintenance of code [...]]]></description>
			<content:encoded><![CDATA[<p>
<p><strong>Interviewers:</strong><br />
<a href="http://howsoftwareisbuilt.com/about-scott-swigart/">Scott Swigart</a> and <a href="http://howsoftwareisbuilt.com/about-sean-campbell/">Sean Campbell</a> </p>
<p>
<p><strong>Interviewee:</strong><a href="http://howsoftwareisbuilt.com/about-josh-berkus-postgressql-core-team-lead/"> Josh Berkus</p>
<p></a><br />
In this interview with Josh, PostgreSQL Core Team Lead at Sun Microsystems Inc., we asked him about:</p>
<ul>
<li><a href="http://howsoftwareisbuilt.com/2007/08/22/interview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems/#whatispostgres">How PostgreSQL fits into the landscape of open source databases</a></li>
<li><a href="http://howsoftwareisbuilt.com/2007/08/22/interview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems/#usesofpostgres">New uses of PostgreSQL</a></li>
<li><a href="http://howsoftwareisbuilt.com/2007/08/22/interview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems/#includedfeatures">How new code and features are selected for PostgreSQL</a></li>
<li><a href="http://howsoftwareisbuilt.com/2007/08/22/interview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems/#codequality">Quality control and maintenance of code</a></li>
<li><a href="http://howsoftwareisbuilt.com/2007/08/22/interview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems/#maintainers">Core maintainers and code contributors</a></li>
<li><a href="http://howsoftwareisbuilt.com/2007/08/22/interview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems/#coders">How contributors participate in PostgreSQL development</a></li>
<li><a href="http://howsoftwareisbuilt.com/2007/08/22/interview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems/#opensource">Differences between open and closed source database</a></li>
<li><a href="http://howsoftwareisbuilt.com/2007/08/22/interview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems/#variations">Variations between PostgreSQL products</a></li>
<li><a href="http://howsoftwareisbuilt.com/2007/08/22/interview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems/#microsoft">PostgreSQL Windows users and Microsoft support</a></li>
</ul>
<p><span id="more-90"></span></p>
<p><strong>Sean Campell:</strong> Josh, tell us a bit about your role with PostgreSQL.</p>
<p><strong>Josh Berkus: </strong>OK. Well, what I&#8217;m best known for is that I&#8217;m a member of the seven member core steering committee for PostgreSQL Open Source Database Project. I also work at Sun on PostgreSQL and I&#8217;m the PostgreSQL lead, which is sort of a strategic and evangelism position at Sun Microsystems in our database technology group.</p>
<p>I&#8217;ve been a database applications developer since about 1994. I started out actually with desktop databases and moved to Microsoft SQL Server, and from there to PostgreSQL in 1998. For the last couple of years, I worked at Greenplum, a data warehousing company. I did a lot of consulting on database performance. Now I seem to spend most of my time going to conferences.</p>
<p><strong>Sean: </strong>Talk a little bit, if you would, about PostgreSQL and where it kind of fits in the ecosystem with things like MySQL, open source on one side and the proprietary Microsoft SQL Server and Oracle Database.</p>
<p><strong></p>
<p><a name="whatispostgres">Josh:</a> </strong>Well, the way I like to describe PostgreSQL is we&#8217;re sort of the high end of open source databases: very SMP scalable; capable of running large, complex queries involving multiple sub selects; and, all kinds of SQL tricks. A lot of functionality is built into the database, including triggers and views and schema, and the ability to use 11 or 12 procedural languages to write procedures in. </p>
<p>Mostly, people who look at PostgreSQL are also considering databases like Oracle or DB2 or SQL Server 2005 for their needs &#8211; large database installations involving possible terabytes of data and usually a dedicated database administrator.</p>
<p>That&#8217;s probably the majority of our usage. We also get a fairly substantial amount of sort of embedded usage. Not because PostgreSQL is really an embedded database, but because of our extremely liberal BSD open source licensing terms.</p>
<p>Something I would like us to be better known for is the extensibility of the database model. The origin of PostgreSQL goes back to 1986, which was the POSTGRES Project at the University of California &#8211; Berkeley. It was the second UCB database project. The reason it was called Postgres was it actually stands for post Ingres, Ingres having been the first UCB database project.</p>
<p>One of the principles that PostgreSQL was founded on was the idea of an object relational database. That is, the database administrator or designer should be able to modify the behavior of database objects with code, or add their own new types of database objects. On a useful basis, that&#8217;s given us a whole bunch of exotic data types that are extremely useful for keeping certain specific and unusual types of data    genetic sequences, geographic data, or cryptographic data    that might be hard to store in the standard SQL data types, let alone to index or do anything useful with.</p>
<p><strong>Sean: </strong></a>Well, tell me a little bit about out that, just to dive bomb for a second. Coming from a database background, there&#8217;s been a lot of discussion about this on the SQL Server side, progressively integrating additional data types, working in an XML data type, and so on.</p>
<p><strong>Josh: </strong>Right.</p>
<p><strong>Sean: </strong>But obviously, one of the challenges with something like this is making sure that it&#8217;s valid and useful for the community at large. I think the deeper you get into a particular data type, you really have to have it vetted by the community, to make sure it&#8217;s actually useful, and people don&#8217;t just go back and throw it into a more generic field type.</p>
<p>So, tell me a little bit about the process of getting something like that included in PostgresSQ, and maybe take the most interesting example you can think of. From soup to nuts, how did it get originally proposed, and then how did it get eventually integrated into the product?</p>
<p><strong>Josh:</strong>So an example of one, actually, that&#8217;s going to be fully integrated into the core code is the UUID data type. It&#8217;s coming out in version 8.3, which is the next release version, and will come out somewhere around October. </p>
<p>Now, I know that SQL Server has had UUIDs for a while. We&#8217;ve been actually putting off incorporating UUIDs and GUIDs, simply because there&#8217;s five or six competing ways to make up such a data type. </p>
<p>So, when UUIDs and GUIDs were proposed for inclusion in Postgres, one of the goals with that was to actually create a data type that would uphold the various different ways of forming a UUID, and yet have a compact representation as well as its own index types and operators.</p>
<p>That is, for example, equality for a UUID is the same sort of concept as it would be for an integer. Differential operators, like greater than, less than, and that sort of thing, actually don&#8217;t function the same.</p>
<p><strong>Sean: </strong>Sure.</p>
<p><strong>Josh: </strong>Because, among other things, UUIDs will contain information on which machine created that particular ID.</p>
<p><strong>Sean: </strong>Right.</p>
<p><a name="usesofpostgres">
<p><strong>Josh: </strong></a>Yeah. So, an example of a more exotic one that&#8217;s not yet in the core code, but is out there in our add ins and that I find much more interesting, is genetics data types. There&#8217;s a couple of projects; one of them is called Unison DB, which was sponsored and open sourced by Genentech, and another one is a community project called BLASTgres. </p>
<p>These are all projects of genomics scientists.</p>
<p>One of the things that they needed to be able to do is they needed to be able to store sequences of base pairs in the database. Now, that&#8217;s what we call a multi value data type, as in it has meaning as a whole, in sequential order, but also, each component has meaning in small pairs. Initially they tried to store this, actually, sort of vertically &#8211; you have each pair as a row. But when you&#8217;re talking about millions of protein sequences, each of which can have a couple thousand base pairs, that&#8217;s really not a realistic method of storage.</p>
<p>So instead, they actually modified our already existing array data type which is quite fully featured, in terms of having its own special comparison operators and special ways of indexing it  and modified it to hold base pairs.So, that allowed them to do meaningful comparisons of things like equality, and particularly to compare individual base pairs. That is, if the base pair in this third place equals both sequences, what percent of the base pairs are equal to those sequences?</p>
<p>Because in protein analysis, you&#8217;re not expecting an exact match; you&#8217;re expecting a match, basically, according to percentages. You want to be able to look up and say, &#8220;OK, I want to find all the genetic sequences that have this particular sequence of six base pairs anywhere in the proteinî and that requires a special index type which is based on our generalized index search tree. The tree allows you to create your own index types to support some of these exotic data types.</p>
<p><a name="includedfeatures">
<p><strong>Sean: </strong></a>One of the questions out of that, I guess, because that lays out pretty clearly what the feature set is, is about how you go about making the decision to incorporate something like that into the mainline product and/or to just kind of leave it as a community add on, for lack of a better phrase?</p>
<p><a name="codequality">
<p><strong>Josh:</strong></a>  There&#8217;s actually a variety of features, and we&#8217;re going through this right now with our full text search type. So there&#8217;s a whole variety of decisions that goes into that. One is how broad the usage is. That is, is this something which is used by a large percentage of our users, or is it relatively obscure?</p>
<p>A second question is how good the code quality is. That&#8217;s a big deal, because we&#8217;re an open source project that&#8217;s been open source for 11 years and part of a project that&#8217;s 21 years old. Maintainability in our code base is actually, possibly, our number one priority because every one of our major contributors now was not here at the beginning of the project.</p>
<p>Every single one of them inherited the code from someone else, and they know how important it is to be able to pass it on. So that&#8217;s actually been a big holdup in incorporating our full text search type into the core, because it was written by some Russian developers and they had a lot of issues with doing internal documentation of the code in English.</p>
<p>So, in that case, for other data types and the like, there&#8217;s going to be the same sort of standards in terms of internal documentation of the code; good quality public user documentation; the code being formatted correctly and easy to read; and, having consistent sub functions and references that are easy to follow and match our other coding standards that are part of our documentation. </p>
<p>Another part of it is going to be, again, for the maintenance issue, whether or not we feel that the contributors are going to be with the project for the medium  to long term, because if it&#8217;s something that somebody just dumps on us    however good it is right now    and then leaves, then what we&#8217;ve done is we&#8217;ve added to the burden of the core maintainers to keep up that extra 10,000 or 15,000 lines of code.</p>
<p>For an individual data type, that&#8217;s not very much. If you look at it like we add 10 new data types, then that requires us to have two more code maintainers, just to maintain that extra code base. So, if somebody is not going to be making a long term commitment to maintain the code, then the bar to add it to the core code becomes much higher.</p>
<p>Then, a final consideration would be external dependencies. That is, one of the things that we do to make PostgreSQL easy to install is that the dependencies to install the very core code of PostgreSQL with no additional options are extremely light. You basically need a handful of GNU utilities, certain C code building utilities, and that&#8217;s it. That&#8217;s made it very easy to install PostgreSQL on 30 different platforms; so has building it from source, so there&#8217;s a variety of means. For the people out on Windows, you can build it with either MinGW or with Visual Studio. That wouldn&#8217;t be possible if we had a whole slew of external dependencies.</p>
<p>So, thereís add ons that require heavy external dependencies. For example, there&#8217;s a procedural plug-in to allow you to use PHP inside the database. The reason it hasn&#8217;t been included in the PostgreSQL core code, even though it&#8217;s fairly feature complete and is reasonably popular, is specifically because in order to build it you have to have PHP and Apache installed and even configured in certain ways. So that makes it a real dependency issue in order to build that component.</p>
<p>That&#8217;s very tricky, and we don&#8217;t necessarily want things that are that hard to build in the core code. We&#8217;ll put them in add ons, where people realize that they have to take extra steps.</p>
<p><a name="maintainers">
<p><strong>Sean</strong>:  </a>One follow on to that, too, back onto the core maintainers. I&#8217;m just curious, how many core maintainers do you have, considering that came up a couple times in the vetting process? And, how does someone kind of move from just a general community member to a core maintainer? </p>
<p>Because different projects handle that promotion process a little bit differently.</p>
<p><strong>Josh</strong>:  Yeah. We don&#8217;t have a formal process. We&#8217;re actually at the sort of extreme end of open source projects in that all of our policies are negotiated and unwritten, pretty much. It&#8217;s because of the age of the project and it&#8217;s because we&#8217;ve always had a consensus process for making decisions, which has yet to break down. So there hasn&#8217;t been a need for some of the more elaborate, formal structures you would find in, for example, Apache.</p>
<p><strong>Sean</strong>:  Right, so you guys haven&#8217;t had to have like a voting process, per se, on things. It&#8217;s kind of been a communal decision process.</p>
<p><strong>Josh</strong>:  Yeah. There are some things that are conventions. You&#8217;re not going to be considered, for example, for getting commit ability on the CVS tree if you haven&#8217;t been around the project for a couple of years, contributing. Again, it&#8217;s a 10 year old project. We feel that we can wait two or three years for somebody&#8230;</p>
<p><strong>Sean</strong>:  [laughs] Right, right.</p>
<p><strong>Josh</strong>:  If they go away in that amount of time, then we didn&#8217;t want them as a committer anyway.</p>
<p><strong>Sean</strong>:  Right, right. Somebody that flashes onto the list and is like, &#8220;I want to be helpful! I want to be helpful!&#8221; then six months later, you&#8217;ve never heard from them. You have kind of a base vetting process just from that alone.</p>
<p><strong>Josh</strong>:  Yeah, yeah. So we&#8217;ve got people who&#8217;ve been around and contributing code for a couple of years. Volume is also a consideration, because somebody who&#8217;s only contributing one or two small patches per version, then there&#8217;s no particular need for them to have any greater level of access.</p>
<p>The big issue is having lots of free time or having time paid by your employer to work on these things, because the main thing that we need from major contributors now is actually time to review other people&#8217;s code.</p>
<p><a name="coders">
<p><strong>Sean</strong>:</a>  Well, one question on that, because we&#8217;ve found this to be interesting as we&#8217;ve talked to different projects, is how much of the project is funded by proxy, in the sense that somebody&#8217;s got a day job? </p>
<p>Is there a scenario where someone is funded predominantly by their employer to write code for PostgreSQL?</p>
<p><strong>Josh</strong>:  Yes.</p>
<p><strong>Sean</strong>:  And what percentage of the base of people writing the mainline code probably falls into that type of category?</p>
<p><strong>Josh</strong>:  We haven&#8217;t tried to do a count for about three years, but I would estimate, now, 80% to 90% of the code changes that go into any given version are written by people who were either paid directly to work on PostgreSQL development, or for whom working on PostgreSQL development is an approved use of their work time.</p>
<p>The second class would be large PostgreSQL users, like the staff of Afilias, for example. They&#8217;re not required by Afilias to contribute to Postgres, but if they want to spend Tuesday afternoon working on a Postgres patch, it&#8217;s completely acceptable to Afilias if they do so.</p>
<p>There&#8217;s a number of those. So, yeah. That&#8217;s actually one of the big myths of open source is people imagine a bunch of hobbyists. And I&#8217;ll say, in the early days of Postgres, we were hobbyists, because you couldn&#8217;t use it for much. I was actually earning my living as a SQL Server performance consultant, and working on Postgres for sort of my own stuff. But once a project gets big and commercially adopted, you&#8217;re going to find that at least three quarters of the code contribution comes from people who are paid to work on the project.</p>
<p><strong>Sean</strong>:  I want to give Scott a chance to chime in here, too. But one of the things that came out of one of the conversations we had   I think it was with Michael Tiemann &#8211; was the concept of: alright, so you&#8217;ve got a set of developers, they work for a large company, and that company would like to get something into the product. So, they go off and squirrel away and work on some feature.</p>
<p>Letís say it&#8217;s the genetic discussion we were having earlier, right? That&#8217;s probably not the best example, but let&#8217;s imagine that was the case. How do you deal with the challenge of someone going and building ìxî, and they feel they&#8217;ve invested real company time and stake and equity in it, but yet, maybe it doesn&#8217;t make it in because the community writ large just doesn&#8217;t feel it&#8217;s maintainable or it doesn&#8217;t meet the standards you guys are looking for?</p>
<p><strong>Josh</strong>:  Yeah. Well, that&#8217;s something that needs to be handled with a fair amount of diplomacy. And there have certainly been failures on that in the past, but I think itís because all of our interactions with our contributors tend to be highly personal. That is, if something gets rejected, then it&#8217;s going to be after Bruce or one of the other reviewers had 16 or 17 different email interactions with a contributor. But no, that&#8217;s not until after we&#8217;ve given the contributor multiple chances, made it clear to them what needed to be changed, and given them multiple chances to modify their stuff, and hopefully made them understand why it was being postponed.</p>
<p>We&#8217;re actually struggling with that right now, because we&#8217;re having a bit of a fire hose problem with version 8.3; as in, when we hit feature freeze at the beginning of April, we had something over 100 different patches pending, some of which involved up to 50 60,000 lines of code. So, the result is we&#8217;ve actually set a very high bar for things making it into 8.3. Patches that we might have accepted in an earlier version and spent more time getting up to the acceptable standards of code are instead being held back for 8.4.</p>
<p><a name="opensource">
<p><strong>Sean</strong>:</a>  Well, one last question, and then I want to give Scott a turn. </p>
<p>So, from a database side in the open source world, what do you think the open source development methodology brings to a database product that either gives it more credibility, a better feature set, etc. when compared to a closed source model for developing a database product? </p>
<p>Because we&#8217;ve asked everybody this, in some form or other, and the responses have been really interesting. I don&#8217;t mean that in a pejorative way, I mean they&#8217;ve been very interesting to listen to.</p>
<p>But, we haven&#8217;t talked to anybody from the database side.</p>
<p><strong>Josh</strong>:  Well, actually, one of the biggest benefits is for security and reliability. We actually had Coverity run a code check on PostgreSQL a couple of years ago, something that they&#8217;re apparently going to do again for us, and one of the first things that I noticed is that the PostgreSQL core actually has possibly the lowest code count, in terms of lines of code, for any major SQL database.</p>
<p>What that&#8217;s indicative of is that we&#8217;ve spent a lot of time; that every time we release a new version, there is significant refactoring involved in it, and a real effort to keep the code clean and eliminate anything that&#8217;s Byzantine or hard to read.</p>
<p>The payoff for that is that it makes it very easy to keep the code reliable and secure. That is, if somebody reports a security issue, we can generally come up with a fix in 24 to 48 hours, because it&#8217;s very easy to zero in on exactly where the problem is happening.</p>
<p>It also prevents such issues from occurring in the first place, because there aren&#8217;t mystery functions that nobody understands and can&#8217;t touch. Having worked on some proprietary software, I know how that kind of stuff creeps into your proprietary software, because you&#8217;re more concerned with meeting the ship date, and the idea is that you&#8217;ll clean it up in the first update version. Only after you meet the ship date, cleaning it up becomes a low priority. </p>
<p>So for us, because all of the code is out there and that it&#8217;s all visible, maintainability is a primary goal. There is no postponing cleaning it up. The cleaning it up has to happen before we release. The result has been very highly secure and very highly reliable code.</p>
<p><strong>Scott</strong>:  So, you mention that there&#8217;s a lot of patches queued up, and a lot of them aren&#8217;t necessarily going to make it into this upcoming version. How is that a decision that&#8217;s made? Open source projects kind of have a different hierarchy and a different culture, so I&#8217;m trying to understand with PostgreSQL, is it kind of a representative democracy where the steering committee sort of votes on those or&#8230;?</p>
<p><strong>Josh</strong>:  Think of it as almost a pure democracy. Yeah, because most of those decisions are made by rough consensus on what we call the &#8220;hackers mailing list&#8221; which has something on the order of 7,000 subscribers. Although probably only 75 -100 of those people are really active.</p>
<p><strong>Scott</strong>:  Sure.</p>
<p><strong>Josh</strong>:  The rest of them are just monitoring what goes on. Basically what happens is, if there&#8217;s a patch, there&#8217;s a couple of other lists attached to that &#8211; the actual patches mailing list or the actual committer&#8217;s mailing list. But most of the discussion happens on &#8220;hackers.&#8221;</p>
<p>If somebody submits a patch, or preferably a specification before they submit the patch, then there&#8217;s going to be lots of discussion and we&#8217;ll form a rough consensus on whether it&#8217;s a good idea or not, whether it belongs in the core code, and whether it belongs in an add in, and other issues like that. Then, when the patch actually gets submitted, it becomes up to the code reviewer    who will generally be one of our handful of committers, people who actually have direct access to the CVS tree    who decide whether the code is of sufficient quality to make it in or whether it needs work.</p>
<p>If that&#8217;s an extended process, they will generally take that back onto the hackers mailing list and say, &#8220;This is a really cool feature, but the code is a mess and it needs X, Y, and Z. If the original contributor didn&#8217;t clean it up, is there someone else who cares enough about it to clean it up, or are we going to hold it back?&#8221; That will get sort of worked out there. And it&#8217;s sort of peer democracy. It&#8217;s not so much pure democracy as actually what I call &#8220;volunteerocracy.&#8221;</p>
<p>[laughter]</p>
<p><strong>Josh</strong>:  It&#8217;s that somebody can force a decision by saying, &#8220;This feature is really, really important to me and I&#8217;m going to do whatever it takes to clean it up so it can go in.&#8221;</p>
<p><strong>Scott</strong>:  Right.</p>
<p><strong>Josh</strong>:  When somebody doesn&#8217;t step forward and do that, often stuff gets held back.</p>
<p><strong>Scott</strong>:  OK. So there isn&#8217;t like a formalized vote, but it is pretty obvious kind of what the consensus is.</p>
<p><strong>Josh</strong>:  Yeah. I mean, the core team is a steering committee, but our goal is to actually do as little as possible. The main thing that we do is we set the date for feature freeze, beta and release. We handle security issues, because those need to be dealt with in a confidential forum, and that&#8217;s it. </p>
<p>There&#8217;s been months where we&#8217;ve gotten maybe a dozen messages total on the closed core list. The vast majority of any decision making, any reviewing, any discussion, happens on the public mailing list, particularly hackers, but also to a lesser degree, patches and committers.</p>
<p>There&#8217;s a few segments of PostgreSQL, things like the JDBC driver, which have their own development mailing list, so they make their own decisions within their development mailing list to submit their decisions to hackers. We take their word for it because the rest of us don&#8217;t know anything about Java.</p>
<p><a name="variations">
<p><strong>Scott</strong>:</a>  Now, one of the things that happens with something like, let&#8217;s say, the Linux kernelÖ You&#8217;ve got Linus Torvalds, who essentially says, &#8220;Here&#8217;s the new kernel&#8221; and it goes out to the different distros. </p>
<p>And the different distros are all basically kind of a fork of that kernel.</p>
<p>The kernel that ships in Ubuntu isn&#8217;t exactly the same as the one that&#8217;ll ship in RedHat or something like that. And part of the reason for that is there may be certain features which are very important to RedHat customers, but it&#8217;s not something that they&#8217;re able to get necessarily into the core kernel, right? I&#8217;m curious if the same kind of thing happens with PostgreSQL; if Sun, for example, had customers who really needed certain features but the consensus was that those features shouldn&#8217;t end up in the core products &#8211; at least at this time &#8211; do you end up with companies kind of making their own weak forks for their specific customer or does that not really happen on this project?</p>
<p><strong>Josh</strong>:  Yeah, I&#8217;ll speak for PostgreSQL in general and then I&#8217;ll tell you specifically what we&#8217;re doing at Sun. In the case of PostgreSQL in general, it wasn&#8217;t something that used to happen, even though we supported the idea. We always have had sort of a kernel model, like Linux.</p>
<p>If you have the PostgreSQL core code, that 13 megabytes of code, and then you have probably 100 megabytes of add ins on places like (TP)Foundry and SourceForge and our contrib modules and a whole bunch of other places. In the past, it&#8217;s sort of been up to the user or the developer to add these things together themselves. What&#8217;s been happening more recently is that the packager for the Linux distribution or the BSD distribution has made certain decisions about what packages they want to include.</p>
<p>But those decisions have been fairly lightweight. People have not been taking the strategy of putting together an actual distribution until recently. And what&#8217;s changed recently is that we&#8217;ve gotten a lot of startups like Greenplum and EnterpriseDB that have sort of their own special version of Postgres that are deliberately maintained forms. In the case of Greenplum, it&#8217;s a data warehouse enhanced form of Postgres, and in the case of EnterpriseDB, it&#8217;s an Oracle compatible version of Postgres. There&#8217;s been a couple of others that are older, like the old Windows version, and the multi threaded Windows version called PowerGres in Japan, which is actually about four years old. Fujitsu actually has their own version, which is called Fujitsu Supported Postgres.</p>
<p>So, what&#8217;s been happening with this is that a lot of those companies do develop their own features for Postgres, which they submit, but don&#8217;t necessarily get accepted immediately, and possibly don&#8217;t get accepted in their original form.</p>
<p>That&#8217;s going to cause problems for those companies down the road, because&#8230; Well, I&#8217;ll give you an example. Greenplum actually developed bitmap on disk indexes. We currently have bitmap in memory indexes released with Postgres. But Greenplum developed bitmap on disk indexes a couple of years ago and put them into the Bizgres open source project to be submitted into the PostgreSQL core code.</p>
<p>The thing is that there have been some issues with index maintenance and with code style, and particularly with conflicts with other patches that we&#8217;ve incorporated to improve the performances of indexes, and that bitmap index patch.</p>
<p>Because those have not been resolved, the bitmap index patch is still not in the core code of Postgres. The problem that that is going to lead to is that when that patch does make it in    in 8.4 or whatever    it&#8217;s quite possibly going to have a slightly different API from the version that Greenplum has been distributing with their own proprietary product.</p>
<p>That will put Greenplum in the position where they actually need to have support for both versions: their original version and the version that made it into the core code. What that results in for these companies that are actually making core changes and distributing them in advance of getting at least vocal approval from the community, is that they develop an increasing maintenance burden. Now, companies like Greenplum and EnterpriseDB and Fujitsu in general recognize this and try to avoid that situation. They try to wait until their patch is queued and accepted before they start distributing it. But, like with the bitmap indexes, it doesn&#8217;t always work. Since the process of actually distributing modified versions of Postgres and marketing them heavily is relatively new, except for PowerGres, then it&#8217;s a little hard to see what could happen.</p>
<p>I mean, what could happen is what happened with PowerGres. XRA in Japan developed a multi threaded, high performance version of PostgreSQL 7.3 for Windows. But they modified PostgreSQL heavily to make it work in that context, to the point where it no longer worked on Linux. When the PostgreSQL project decided that we were going to adopt Windows as a platform, which we finally released in version 8.0, one of the decisions was that we wanted to have the same core code with no substantial differences regardless of operating system. That is, we weren&#8217;t going to have a separate code tree for Windows because that was going to be impossible to maintain.</p>
<p>So as a result, when we released the official Windows version, it was substantially different from PowerGres. So now a lot of the users in Japan are in the sort of weird position where they&#8217;re stuck with PowerGres, which is no longer advancing; it&#8217;s stuck at the version 7.3 feature set.</p>
<p>Or, they adopt the new official version, which is already five versions and four years later, and have a different performance profile and a lot of changes that will be requiring them to change their applications. So, that&#8217;s the sort of thing you want to avoid. That&#8217;s why at Sun, with our distribution of PostgreSQL, the PostgreSQL for Solaris, one of our policies is that we actually don&#8217;t distribute anything until it is accepted into the patch queue with a very strong assurance of acceptance versus revision by the PostgreSQL community.</p>
<p>If it is completely separable as an add in, if it&#8217;s something that can be added at build time and no later and doesn&#8217;t modify other APIs, then we might accept something. The particular example I&#8217;m thinking there is probes, which is that we can add in additional probes non invasively. It doesn&#8217;t matter if we add in a few extra probes before those probes appear in a community version, because they don&#8217;t affect other functionalities.</p>
<p><strong>Scott</strong>:  If you had to kind of ballpark it, how much of the effort do you feel like goes into adding new features to the product versus how much effort it is to support such a variety of platforms?</p>
<p>There&#8217;s different Linux distros, obviously, and supporting Windows on the same core code base obviously presents challenges also. How much time goes into bugs and testing and things like that just to ensure really good compatibility on all these different platforms versus building out new functionality?</p>
<p><strong>Josh</strong>:  Well, see there&#8217;s another way&#8230; You asked earlier about what the benefits were of the open source development process, and that&#8217;s another area where having really clean code is the big benefit, in that it&#8217;s allowed us to actually minimize the amount of platform specific engineering that we do.</p>
<p>We&#8217;ve also made some sacrifices in order to minimize that maintenance version, that maintenance issue. For example, on Linux and Unix platforms, we only use POSIX standard interfaces. This means that we&#8217;re not making use of some other operating system interfaces specific to particular operating systems that might give us additional operating system features and performance. For example, one of the big discussions I have here at Sun is that Sun&#8217;s new file system is ZFS and has some of its how APIs that are non POSIX. The ZFS people keep telling me it&#8217;s going to give us some tremendous benefit using databases on ZFS.</p>
<p>But we really don&#8217;t want to do that as an open source community, because the moment that you do that, you&#8217;re dedicating some amount of hours of somebody&#8217;s time just to maintain that interface for the code. So, we&#8217;ve completely avoided doing that, and that&#8217;s allowed us to minimize the maintenance version. That sort of platform specific bug and compatibility issue then becomes less than 20% of our overall development effort.</p>
<p>Now, Windows in particular, because there are more maintenance issues associated with Windows is very different from the POSIX platforms. We do have a couple of people for that. For example, one of our major contributors is Magnus&#8230; I&#8217;m going to mispronounce his last name, so I won&#8217;t say it. Magnus H. from Sweden. He spends most of his contribution time to Postgres, and he probably spends somewhere around half of his work time contributing to Postgres. He spends the majority of that, actually, maintaining Windows compatibility issues. So think about that as sort of one quarter of a developer for a year. Plus, a bunch of our other contributors and maintainers, like Bruce and Tom and Dave Page particularly, spend a minority of their time dealing with Windows build specific issues.</p>
<p>Again though, we try to stick to standard interfaces and to minimizing any particular code paths for Windows. Now, unfortunately that does limit our level of performance on Windows and our ability to integrate with some of the Windows utilities. But in terms of preventing us from having to have a completely separate version of Windows, it works.</p>
<p><strong>Scott</strong>:  If you talk to somebody who&#8217;s shipping a compiler, they would expect that&#8230; Well, Intel would sort of put people on the project to make compiler optimizations for Intel architectures, because Intel would really know how to do that.</p>
<p>AMD would put people on the project who make the compiler optimization for AMD architecture, because they would know how to do that. So you end up with kind of this collaborative effort where companies are kind of coming together and they&#8217;re putting their expertise in on the stuff they&#8217;ve developed. Microsoft isn&#8217;t staffing anybody on making sure that Postgres runs as well as it can on Microsoft&#8217;s operating system, is that correct? I mean, it sounds to me like you&#8217;re saying that work is being done by other people who&#8230;</p>
<p><strong>Josh</strong>:</a>  Yeah. The Microsoft folks have been friendly to us, particularly Microsoft Labs have been consistently friendly to us, but Microsoft doesn&#8217;t contribute any efforts or help with the project. And I&#8217;ll actually say, except for Sun, who is directly involved.</p>
<p>For some of the other things, for example, Intel did actually have some ICC optimization efforts, but that was through EnterpriseDB, and I don&#8217;t think it would have happened without EnterpriseDB&#8217;s involvement. So, a lot of this has been by proxy, which would be the case with Microsoft as well, if there was any huge interest in it. Microsoft has its own database product though, one that they&#8217;re pretty dedicated to promoting. Well, a lot of individual Microsoft engineers have been extremely friendly to us, but nobody actually contributing to the Postgres project works at Microsoft that I know of.</p>
<p>That work is done entirely by community people. And even those that are primarily Windows maintainers, like Magnus, spend as much or more of their time using, say, Linux than they do Windows. So, there hasn&#8217;t been a big push to Windows specific optimization through them.</p>
<p><strong>Sean</strong>:  This is good. I guess the same thing we&#8217;ve offered to everyone, Josh, are there any closing comments you would like to add to the discussion as it&#8217;s extrapolated so far? Is there something you feel we haven&#8217;t&#8217; touched that you&#8217;d like to get on the record, I guess?</p>
<p><a name="microsoft">
<p><strong>Josh</strong></a>:  I&#8217;d like to throw in a fun little factoid. This is aimed at Windows developers, yes?</p>
<p><strong>Sean</strong>:  Well, both. We&#8217;re getting heavy traffic from both sides. But still, there&#8217;s definitely some Windows folks involved, so go ahead.</p>
<p><strong>Josh</strong>:  Well, actually one of the other things that made it possible for us to do Microsoft for it has been that in general    not for our product specifically but Microsoft has actually made tools to build and run programs made for Linux and Unix a lot easier on the Windows platform than it used to be.</p>
<p>One thing in particular is that the PostgreSQL project was actually the first user of the open source WiX installer, something we&#8217;re actually extremely happy with. It has allowed us to make PostgreSQL vastly more accessible to users on Windows because it provides them with a really nice installer. So nice, in fact, that after we released the first Windows version, a bunch of the Linux folks began saying, &#8220;Hey, why can&#8217;t we have an installer like that in Linux?&#8221;</p>
<p>[laughter]</p>
<p><strong>Scott</strong>:  Well, that&#8217;s one of the other things too that you run into, is that in the Linux world, and even in the Unix world, people are more comfortable kind of running, making and building it for their particular configuration.</p>
<p>But in the Windows world, it&#8217;s really sort of mandatory to ship a binary and an installer, not so much the source. But that&#8217;s interesting that you found the WiX project to be really useful for your needs.</p>
<p><strong>Josh</strong>:  Yeah. It showed up at exactly the right time, because it got released like, I don&#8217;t know, three or four months before our targeted first Windows release. We were able to go out of the gate with an installer, as I recall. I wasn&#8217;t actually involved in the Windows build at the time, so I&#8217;m not sure that&#8217;s 100% accurate. But certainly very close after having the Windows release, we had a nice graphical installer that was not only a nice graphical installer, but it also has like little check boxes for all the most popular add ins.</p>
<p><strong>Scott</strong>:  Oh, nice.</p>
<p><strong>Josh</strong>:  And again, if you&#8217;re a Linux user or whatever, somebody says, &#8220;Oh, that&#8217;s in the contrib module, you just need to type in these three commands and it will be installed.&#8221; That&#8217;s not really available to Windows users. They have to have a whole tool chain that doesn&#8217;t ship with Windows. So, having that nice binary installer has made it vastly more accessible for Windows users.</p>
<p><strong>Scott</strong>:  Do you have any sense for how much Postgres runs on Unix versus Linux versus Windows?</p>
<p><strong>Josh</strong>:  It&#8217;s a little hard to tell, because where we really get a full sense is the people that are active on the mailing lists. But we&#8217;re keenly aware that that doesn&#8217;t actually represent what&#8217;s actually out in the field.</p>
<p><strong>Scott</strong>:  Sure.</p>
<p><strong>Josh</strong>:  Because we only have, like, 35,000 &#8211; 40,000 people that are active on the various mailing lists and forums. Whereas even just judging by a certain manufacturer&#8217;s distributions, who are bundling PostgreSQL in their products, there&#8217;s several tens of millions of copies out there in the field.</p>
<p>Just given that Windows users tend to be more used to web forums and IM than they are to mailing lists and IRC, we&#8217;re guessing that we probably have a lower level of participation from Windows users. So, on the one hand, we have this vast majority of downloads from the Windows users, but on the other hand in terms of people who actually come back to the project and ask questions, the people that we know are on Windows is a minority.</p>
<p>But that doesn&#8217;t tell us who&#8217;s using it. You follow me?</p>
<p><strong>Scott</strong>:  Yeah, sure.</p>
<p><strong>Josh</strong>:  Because it may be that a lot of people are using it on Windows and they&#8217;re just not joining the mailing lists. They&#8217;re getting their help in other ways.</p>
<p>The other thing is that having a unified code base where we have the same version for Windows and Linux, etc., means that actually for a lot of the newbies asking questions about how to do this and how to do that, until we have some interaction with them, we don&#8217;t actually know what platform they&#8217;re on.</p>
<p><strong>Scott</strong>:  Right.</p>
<p><strong>Josh</strong>:  I&#8217;ve had a number of chats on IRC where it wasn&#8217;t until I got to like the third or fourth exchange of questions that I realized that somebody was on Windows. Which is a terrific thing in terms of being able to support our user base, because that was one of the biggest things I was worried about when we released the Windows version, that all of the sudden we would have 50,000 new Windows users hitting the mailing list, and those of us who are not Windows users would not be able to help them.</p>
<p><strong>Scott</strong>:  Right.</p>
<p><strong>Josh</strong>:  And it hasn&#8217;t turned out that way. So, I would guess that probably the majority of installations are in Windows, just based on the download numbers. But how much those people are using their installations, I couldn&#8217;t tell you.</p>
<p><strong>Scott</strong>:  OK. Great. Well, thank you very, very much for taking the time to chat with us. This has been great. You were really good to talk to because you were able to really dig into a lot of how the process works. If you look at a lot of our interviews, process tends to be some of the stuff that we&#8217;re the most interested in. And I learned a lot, just from this conversation.</p>
<p><strong>Josh</strong>:  Well, I&#8217;m happy to have a chance to explain this and some stuff about how the project works. We don&#8217;t have a lot of written policy, so it can be very opaque to somebody who&#8217;s just joining. I actually recently did a presentation on developing PostgreSQL at OSCON because we realized a few people had problems with how very indecipherable this was to how we were supposed to do things&#8230;</p>
<p><strong>Scott</strong>:  Right.</p>
<p><strong>Josh</strong>: I&#8217;m really glad to have a chance to explain that a little.</p>
<img src="http://howsoftwareisbuilt.com/?ak_action=api_record_view&id=90&type=feed" alt="" /><!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a><strong><em>Bookmark this:</em></strong></a>
<br />
<div class="d">
<br />
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2007%2F08%2F22%2Finterview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems%2F&amp;title=Interview+with+Josh+Berkus+%26%238211%3B+PostgreSQL+Core+Team+Lead+%26%238211%3B+Sun+Microsystems" rel="nofollow" title="Add to&nbsp;Del.icio.us"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/delicious.png" title="Add to&nbsp;Del.icio.us" alt="Add to&nbsp;Del.icio.us" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2007%2F08%2F22%2Finterview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems%2F&amp;title=Interview+with+Josh+Berkus+%26%238211%3B+PostgreSQL+Core+Team+Lead+%26%238211%3B+Sun+Microsystems" rel="nofollow" title="Add to&nbsp;digg"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/digg.png" title="Add to&nbsp;digg" alt="Add to&nbsp;digg" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.facebook.com/sharer.php?u=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2007%2F08%2F22%2Finterview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems%2F" rel="nofollow" title="Add to&nbsp;Facebook"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/facebook.png" title="Add to&nbsp;Facebook" alt="Add to&nbsp;Facebook" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://reddit.com/submit?url=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2007%2F08%2F22%2Finterview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems%2F&amp;title=Interview+with+Josh+Berkus+%26%238211%3B+PostgreSQL+Core+Team+Lead+%26%238211%3B+Sun+Microsystems" rel="nofollow" title="Add to&nbsp;reddit"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/reddit.png" title="Add to&nbsp;reddit" alt="Add to&nbsp;reddit" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.stumbleupon.com/submit?url=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2007%2F08%2F22%2Finterview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems%2F&amp;title=Interview+with+Josh+Berkus+%26%238211%3B+PostgreSQL+Core+Team+Lead+%26%238211%3B+Sun+Microsystems" rel="nofollow" title="Add to&nbsp;Stumble Upon"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/stumbleupon.png" title="Add to&nbsp;Stumble Upon" alt="Add to&nbsp;Stumble Upon" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.sphere.com/sphereit/http%3A%2F%2Fhowsoftwareisbuilt.com%2F2007%2F08%2F22%2Finterview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems%2F" rel="nofollow" title="Add to&nbsp;SphereIt"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/sphereit.png" title="Add to&nbsp;SphereIt" alt="Add to&nbsp;SphereIt" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://twitter.com/home/?status=Check+out+Interview+with+Josh+Berkus+%26%238211%3B+PostgreSQL+Core+Team+Lead+%26%238211%3B+Sun+Microsystems+@+http%3A%2F%2Fhowsoftwareisbuilt.com%2F2007%2F08%2F22%2Finterview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems%2F" rel="nofollow" title="Add to&nbsp;Twitter"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/twitter.png" title="Add to&nbsp;Twitter" alt="Add to&nbsp;Twitter" /></a>
<br />
</div>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://howsoftwareisbuilt.com/2007/08/22/interview-with-josh-berkus-postgresql-core-team-lead-sun-microsystems/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Interview with Jay Pipes, North American Community Relations Manager at MySQL</title>
		<link>http://howsoftwareisbuilt.com/2007/07/18/interview-with-jay-pipes-north-american-community-relations-manager-at-mysql/</link>
		<comments>http://howsoftwareisbuilt.com/2007/07/18/interview-with-jay-pipes-north-american-community-relations-manager-at-mysql/#comments</comments>
		<pubDate>Wed, 18 Jul 2007 21:32:34 +0000</pubDate>
		<dc:creator>scottswigart</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[]]></category>
		<category><![CDATA[community]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[GPL]]></category>
		<category><![CDATA[Jay Pipes]]></category>
		<category><![CDATA[licensing]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[performance tuning]]></category>

		<guid isPermaLink="false">http://howsoftwareisbuilt.com/2007/07/18/interview-with-jay-pipes-north-american-community-relations-manager-at-mysql/</guid>
		<description><![CDATA[Interviewers: Scott Swigart, Sean Campbell Interviewee: Jay Pipes Jay Pipes In this interview, we speak with Jay Pipes North American Community Relations Manager at MySQL. We talk about: How MySQL Community and Enterprise servers are nothing like RedHat Fedora and RHEL. The nuances of MySQL licensing. It&#8217;s built mostly like traditional software, with 120 developers [...]]]></description>
			<content:encoded><![CDATA[<p><b>Interviewers: </b><a href="http://howsoftwareisbuilt.com/about-scott-swigart/">Scott Swigart</a>, <a href="http://howsoftwareisbuilt.com/about-sean-campbell/">Sean Campbell</a> </p>
<p><b>Interviewee:</b> <a href="http://howsoftwareisbuilt.com/about-jay-pipes-north-american-community-relations-manager-at-mysql/">Jay Pipes</a> </p>
<table border="0" unselectable="on">
<tbody>
<tr>
<td valign="top"><img src='http://howsoftwareisbuilt.com/wp-content/uploads/2007/07/jaypipes.thumbnail.jpg' alt='jaypipes.jpg' /></td>
</tr>
<tr>
<td align="middle">Jay Pipes</td>
</tr>
</tbody>
</table>
<p>In this interview, we speak with Jay Pipes North American Community Relations Manager at MySQL. </p>
<p>We talk about: </p>
<ul>
<li><a href="http://howsoftwareisbuilt.com/2007/07/18/interview-with-jay-pipes-north-american-community-relations-manager-at-mysql/#notlikerh">How MySQL Community and Enterprise servers are nothing like RedHat Fedora and RHEL.</a>
<li><a href="http://howsoftwareisbuilt.com/2007/07/18/interview-with-jay-pipes-north-american-community-relations-manager-at-mysql/#nuanses">The nuances of MySQL licensing.</a>
<li><a href="http://howsoftwareisbuilt.com/2007/07/18/interview-with-jay-pipes-north-american-community-relations-manager-at-mysql/#traditional">It&#8217;s built mostly like traditional software, with 120 developers on staff at MySQL.</a>
<li><a href="http://howsoftwareisbuilt.com/2007/07/18/interview-with-jay-pipes-north-american-community-relations-manager-at-mysql/#changing">But that&#8217;s changing, the worklog system has been opened to the community and they&#8217;re starting to take contributions.</a>
<li><a href="http://howsoftwareisbuilt.com/2007/07/18/interview-with-jay-pipes-north-american-community-relations-manager-at-mysql/#modular">For that to happen, MySQL has to become more modular and support plug-ins.</a>
<li><a href="http://howsoftwareisbuilt.com/2007/07/18/interview-with-jay-pipes-north-american-community-relations-manager-at-mysql/#portability">MySQL supports so many platforms through a portability layer, which was a lot of work initially, but now it&#8217;s pretty well baked.</a>
<li><a href="http://howsoftwareisbuilt.com/2007/07/18/interview-with-jay-pipes-north-american-community-relations-manager-at-mysql/#gold">Bugs = Gold.</a>
<li><a href="http://howsoftwareisbuilt.com/2007/07/18/interview-with-jay-pipes-north-american-community-relations-manager-at-mysql/#tenents">The tenets of MySQL: performance, reliability, and ease of use.</a>
<li><a href="http://howsoftwareisbuilt.com/2007/07/18/interview-with-jay-pipes-north-american-community-relations-manager-at-mysql/#money">It&#8217;s easier for proprietary software to make money, it&#8217;s easier for open-source to get work done.</a>
<li><a href="http://howsoftwareisbuilt.com/2007/07/18/interview-with-jay-pipes-north-american-community-relations-manager-at-mysql/#getinvolved">And finally, how to get involved.</a></li>
</ul>
<p><span id="more-68"></span></p>
<p><b></b></p>
<p><b>Jay Pipes:</b> My name is Jay Pipes. I work for MySQL. I&#8217;m the North American Community Relations Manager. Part of our job as community managers is to monitor, encourage, and grow the external MySQL ecosystem as much as possible. [This involves] responding to concerns of the community, pushing those concerns internally, being an advocate for the community within MySQL—being a liaison between free and open-source projects and companies and MySQL. Also, each of us has different responsibilities. I do a bit of development work for the MySQL Forge website. I work a bit with engineering. I do a lot of conferences and speaking engagements on performance tuning, and blogging, and writing, and that kind of thing. </p>
<p><b></b></p>
<p><b>Scott:</b> I get it. You&#8217;re greasing the wheels between MySQL and the community at large, and trying to keep the information flowing friction-free in both directions. </p>
<p><b></b></p>
<p><b>Jay:</b> Yes, I want to remove the friction points between contributors to MySQL and users of MySQL, developers and DBAs, and the flow of information from MySQL to make sure it&#8217;s accurate—that they understand. Especially with MySQL, we&#8217;re in a unique position. We provide this open-source software and we have this enormous, ubiquitous user community, but at the same time we&#8217;re selling products. So, because it started out as an open-source project, and the community knows that, a lot of people don&#8217;t even realize that MySQL is a company. </p>
<p>A lot of times we have to explain that MySQL as a company has to do certain things to provide revenue that will in turn benefit the open source community. Sometimes that&#8217;s the challenge. There&#8217;s this push and pull between commercial and the community. And the community team at MySQL, including me—a lot of times we get asked to advocate for the community within the company, but we also have to have an open mind. We are a revenue producing company, and at the same time, we want to be good for the community. That kind of balance is most of what my job is. </p>
<p><b>Scott:</b> It&#8217;s interesting that you bring that up. There&#8217;s still this feeling like it still works the way it did six or seven years ago where open-source was people working on stuff in their basement. But if you look at any of the major projects out there, the model has really shifted. If you look at the Linux kernel itself—that’s a lot of engineers at Red Hat, IBM, Intel, and various different companies that are really doing the bulk of coding. If you look at something like RedHat/Enterprise Linux—again, this is a commercial enterprise, right? This is an ongoing business that has come up with a business model around something open source. And so, with you guys—help me understand it a little bit… </p>
<p><b>Jay:</b> <a name="notlikerh"></a>I will try and clarify something. A lot of people say, “Well, when MySQL just recently (in September or October) split between the community server and the enterprise server in MySQL…“ A lot of people equated that to RedHat/Enterprise Linux split. One of the things significantly different between MySQL and RedHat, is that within RedHat does produce its own pieces of software within the RHEL stack and the Fedora community packaging, it doesn&#8217;t have total control over its software &#8212; their software stack is dependent on upstream committers. Obviously its engineers contribute to the Linux kernel and various other things. </p>
<p>But MySQL has always been 99% completely written by MySQL employees. So, it&#8217;s a different model in that RedHat (until more recently where they&#8217;ve started to create more of their own software) has been more of a packager than it has been a producer of software. And there is a difference there. It&#8217;s what&#8217;s made it difficult for people to say, &#8220;Well, MySQL has done the Fedora versus RHEL split. </p>
<p><b>Scott:</b> OK. </p>
<p><b>Jay:</b> It is different because we are producing almost 100% of the code that we&#8217;re selling in the enterprise and then giving away in the community. Does that make sense? </p>
<p><b>Scott:</b> Totally. And so, I&#8217;m sure you can explain this better, but if you&#8217;re using MySQL as part of an open-source project, then MySQL can be freely distributed with that project. But if you&#8217;re building something commercial on top of MySQL, then my understanding is that&#8217;s where you actually have to purchase a license. Or do I have it all wrong? </p>
<p><b>Jay: </b><a name="nuanses"></a>Well, it depends. A lot of people are confused by the licensing. Some of the confusion stems from the fact that it is GPL. Some confusion is that we sell commercially-licensed MySQL for those OEMs and ISVs that embed MySQL within their product and then distribute it. So, GPL is all about distribution—reciprocity and distribution. </p>
<p>If you distribute your product with MySQL, and your product is not GPL or a GPL-compatible license and is not an open-source product, then yes, you are required to pay a licensing fee to My SQL. However, a lot of people will say, &#8220;I&#8217;m distributing my product as a non-open-source piece of software. it can connect to MySQL.” So, if you have a MySQL server running on your network, my application can connect to that server and run against it. </p>
<p>There are different ways of using MySQL, but a lot of where the licensing comes in is when you&#8217;re an original equipment manufacturer and an independent service vendor—or whatever ISV is anymore—and you&#8217;re embedding and distributing MySQL as an integral part of your application. </p>
<p>That&#8217;s where the licensing comes into play. But not for web applications, when you have MySQL installed on the server and you&#8217;re a service-oriented application provider. That&#8217;s not where the licensing comes in. That&#8217;s where we sell the MySQL Enterprise edition, which is the support and services offering. </p>
<p><b>Scott:</b> Got you. So, just to make sure I understand that: If I&#8217;m an ISV and I&#8217;m selling my software, and I package MySQL with it, I have two options. I can distribute my software under the GPL license, or I can pay a licensing fee to MySQL and distribute my source under a proprietary license. Is that it? </p>
<p><b>Jay:</b> That is correct. </p>
<p><b>Scott:</b> OK. OK, good. </p>
<p><b>Jay:</b> And in the past, there&#8217;s been confusion about the protocol, but that&#8217;s not the issue. I think people have blown that up out of proportion. What you just said is exactly what it is. If you are distributing your application packaged with MySQL and you&#8217;re not releasing under GPL or a GPL-compatible license, you have to buy a license for each copy of your software that you distribute, because you are then distributing MySQL. Now, the licensing costs have also been blown completely out of proportion by a lot of people. People say, &#8220;Oh, it costs $595 per distribution.&#8221; That&#8217;s not right either.. </p>
<p>The pricing depends on whatever the sales team at MySQL and you agree on. It&#8217;s just like any other company. But that is the case when you need to buy licensing, when you embed MySQL and you do not want to release your source code as GPL or compatible licensing. </p>
<p><b>Scott:</b> Cool. And I&#8217;m just not involved enough in the community to really even be aware of what have obviously been contentious conversations, probably over the years. So it&#8217;s just me trying to understand it, coming&#8230; </p>
<p><b>Jay:</b> No, no. I understand where you&#8217;re coming from. I was referring to this kind of myth that MySQL licensing is overly complicated. I work for MySQL, so I&#8217;m biased. But I don&#8217;t particularly think that is a complex process to understand. I think that the complexity really stems from the fact that there are just a ton of open-source licenses out there that all have these weird idiosyncrasies to them. And I think that complexity lends itself to, &#8220;Well, MySQL is open source, so it&#8217;s going to be complicated.&#8221; </p>
<p><b>Scott:</b> Right. So, the main thing we&#8217;re focusing on is how software is built. We&#8217;re looking at things like Apache and the Linux Kernel. From the outside it kind of looks like they&#8217;re built over a mailing list. In other words, there are these core mailing lists, people post code to them, it gets reviewed, there&#8217;s a maintainer who decides whether it gets checked into the main tree or not. </p>
<p><b>Jay:</b> <a name="traditional"></a>That&#8217;s absolutely true, with Linux, Apache and Eclipse. Eclipse is more bureaucratic than that; they have hierarchies and procedures and policies and incubation periods and all that stuff—as does Apache. </p>
<p><b>Scott:</b> With you guys, it looks more like a traditional proprietary shop. I&#8217;m guessing that you sit down and have meetings. You discuss what features are going to be slated for the next release. You come up with project plans; you come up with specs. </p>
<p><b>Jay:</b> If only it were that simple. Yeah, on the outside is does look like a traditional software house, in that we have maybe 120 engineers working on various teams, from people that work on the connectors and the GUI tools to people that work on the server runtimes or the backup and replication folks. </p>
<p><b>Scott:</b> Which is a lot. When you talk about 120 engineers,that&#8217;s a mid-size software company. </p>
<p><b>Jay:</b> <a name="changing"></a>Absolutely. Absolutely. And yes it&#8217;s true that we have scrums internally. We have internal roadmaps. And up until, I would say, December or January of this year, what has been more of a cathedral-type model, meaning more of a closed-source model for development at MySQL. It&#8217;s now starting to open up significantly. </p>
<p>Recently, we opened up our internal work log system, which is as close as you&#8217;re going to get to a list of roadmap tasks that we&#8217;re working on. This is for anything from MySQL 5.2 and up to MySQL 8.1 and beyond from all sorts of crazy wish-list ideas to stuff that&#8217;s actually going into the code at this point. We&#8217;ve opened that up publicly on our MySQL Forge (http://forge.mysql.com/worklog/). People can comment on these tasks, provide suggestions and vote on things that they&#8217;d like to see. We&#8217;ve also started accepting more contributions from the outside community. So, it&#8217;s starting to be more of mix of an open-source project and a commercial company model. We&#8217;ll see how it goes. I&#8217;m obviously pushing for more of the open-source development model, having more outside committers and contributors that are providing both external tools to the MySQL server, but also fixing bugs and provided patches for small features within the server itself. </p>
<p><b>Scott:</b> That&#8217;s interesting that you&#8217;re transitioning from the cathedral to the bazaar, to some degree. </p>
<p>One of the things that I don&#8217;t understand about things like the Linux Kernel is how really big sub-systems get built or worked on. At the point where IBM decides, &#8220;OK, we just need this in the kernel.&#8221; They don&#8217;t ultimately kind of get to say, right? If Linus doesn&#8217;t like it, it doesn&#8217;t make it in. At the same time, corporations are sort of doing the bulk of the development. </p>
<p>So, with MySQL, how do you see that shaking out? Certainly MySQL, the company, is going to control the direction of the product, you&#8217;re opening it up to take more community input both in terms of suggestions and ideas and in terms of actual feature code and things like that. But I&#8217;m guessing there will still be a pretty significant section of the product that will be spec-ed out. A team of engineers will be put on it to build out a feature. It&#8217;ll go through&#8230; </p>
<p><b>Jay:</b> I think it will be a mix of both. And we&#8217;re still going through these growing pains of figuring out how this is going to work. The community team is going to be pushing more and more contributions from the community, and MySQL doesn’t dictate those in any way. Someone can hop on there and say, &#8220;You know what, I want to implement check constraints, and here&#8217;s the patch for it.&#8221; What will be the issue is which version of MySQL will that patch make it into. And will it be a module that will be marked as experimental? Will it be something that will be patched into the core kernel? </p>
<p>That&#8217;s the process that we&#8217;re currently going through. We&#8217;re still in these growing pains. We&#8217;re still really just now figuring out how to handle these kinds of contributions. So, over the next year or two, I think we&#8217;ll start to hammer that out, and understand, &#8220;This is going to go into the community server, and this is going to go into the core kernel.&#8221; I think, as we make the core server more modular, that issues like that are going to start to disappear a little more, because someone can provide a module, just like mod_ssl for Apache. </p>
<p><b>Scott:</b> Right. </p>
<p><b>Jay:</b> It can be a self-contained component that isn&#8217;t necessarily going to kill the main, core kernel of MySQL. And so it&#8217;s not going to be as big of an issue, because we can package and version up that module separately from the core kernel, and the community person can have it out there. Until we get to that modular core piece of MySQL, it&#8217;s going to be a little bit of a difficult road, as we decide how to patch that stuff in. But, on the commercial side, we&#8217;re always going to have companies that will provide us with what we call NRE, non-recurring engineering. </p>
<p><b>Scott:</b> Right. </p>
<p><b></b></p>
<p><b>Jay:</b> Which is basically, someone&#8217;s paying us to put a feature into MySQL that is vital, or mission-critical, for their business. So recently, a lot of that type of work has gone into the NDB Cluster tool, which is our high-availability tool. Telecom companies that extensively use MySQL Cluster would like certain things, and they&#8217;re paying for those things to get included. And we&#8217;ve got those type of projects going on all the time. In the next year or two, we&#8217;ll start to see a bigger balance of community driven activity, in engineering, and commercially driven activity. </p>
<p><b>Scott:</b> <a name="modular"></a>Looking at successful open source software, I think you&#8217;re exactly right. Modularity seems to be completely essential. I might have trouble getting something into the Apache core, but I wouldn&#8217;t have any trouble writing a module and just putting it out there, and if people like it&#8230; </p>
<p><b>Jay:</b> Absolutely, absolutely. And that&#8217;s the key to the community driven coding, is that once we get that architecture completed &#8212; the plugin interface &#8212; where people can write add-ons and extensions to MySQL – that problem of, &#8220;OK, which version of the server? Can we put this in there without destabilizing the core runtime?&#8221; Those kinds of questions will cease to be an issue. And so will packaging issues, because the community person can put it on their website: &#8220;Hey, this is my module for MySQL. Go download and install lit.&#8221; </p>
<p><b>Scott:</b> Right. </p>
<p><b>Jay:</b> Just like you would with any of the weird Apache modules that are floating out there. </p>
<p><b></b></p>
<p><b>Scott:</b> Right. Apache, Eclipse. All of these things have a modular architecture. To be an open-source project that&#8217;s taking community contributions, it seems essential to have that modular architecture. </p>
<p><b>Jay:</b> I think it is, yeah. It&#8217;s going to be a long ways to go. From my understanding, from the engineering team, it&#8217;s not something that happens overnight and, certainly, is going to take lower precedence to some of the commercial work that we need to get done on the server. And obviously, our roadmap is years ahead of time. [laughs] The stuff that&#8217;s going into MySQL 6.0 and 7.0 is already on the block. </p>
<p><b>Scott:</b> Right. </p>
<p><b>Jay:</b> Bringing stuff up like the modularization of the core kernel, we&#8217;re looking at two years down the road. But it&#8217;s still, in my opinion, vital to start thinking about this now if we&#8217;re going to really get to a point where a community is actively contributing to MySQL. </p>
<p><b>Scott:</b> So, from a software development standpoint, that seems like it would be a particular challenge for MySQL is just that MySQL runs on so many different platforms. You guys have, I don&#8217;t know how many kind of distros for a given version. You run on Windows. You run on Mac. You run on a whole bunch of different Linux flavors. How much of the engineering effort do you feel like goes into features themselves, versus how much of the engineering effort goes into making a distribution that runs on such an enormously wide variety of platforms? </p>
<p><b>Jay:</b> <a name="portability"></a>Yeah. I&#8217;m not privy to the exact numbers. I can take a guess, though. I would say that the portability layer that allows us to run on these various platforms is fairly stable. Not that it runs perfectly on all the platforms, but that we do run on all the major platforms. And the reason we can do that is an underlying subsystem that takes care of the portability between those systems. That&#8217;s been around for a while, so, unless we&#8217;re talking about newer things, like Windows 64-bit running on Falcon &#8212; which is our new storage engine coming out &#8212; I think a lot of that&#8217;s already been done. So, most of the work is really in the features, and a lot less in the portability layers. </p>
<p><b>Scott:</b> OK. </p>
<p><b>Jay:</b> And I would say that is the case with, say, PHP or Apache or Python, or many of the major open source projects. That core portability layer was a key thing early on, and it&#8217;s stabilized pretty dramatically recently, so that&#8217;s not really what people are working on; it&#8217;s more of the feature-wise stuff. </p>
<p><b>Scott:</b> Got you. Got you. So, the only time the portability layer really needs significant engineering, like you said, is if you&#8217;re porting to a whole different architecture, like 64-bit, or something like that. </p>
<p><b>Jay:</b> Or when you&#8217;re specifically profiling bottlenecks on a specific architecture. </p>
<p><b>Scott:</b> Right. </p>
<p><b></b></p>
<p><b>Jay:</b> But that&#8217;s more of a performance thing and less of, &#8220;Will it work on the platform?&#8221; </p>
<p><b>Scott:</b> So, talk to me about some of the other stuff that goes into building a product, things like testing and QA. How does that work? MySQL, I&#8217;m assuming that it&#8217;s got pre-release beta builds, or daily builds, or things like that, that you can pull down. </p>
<p><b>Jay:</b> Yeah. In fact, the release schedule of MySQL, on the way it&#8217;s built, I don&#8217;t think is going to be much different from most other open-source projects. We have an internal build team, which I think there&#8217;s four or five people on it, maybe. They are responsible for the overall release management: making sure that the builds compile on all platforms, that the binaries are stable on the major platforms. And also, building up the release notes, making sure the flags and switches that are relevant for each platform are turned on or off depending on what&#8217;s needed, and that everything runs through our internal push build system, which, essentially, is an automated system that says, &#8220;Will this build on this architecture?&#8221; And then, we also have a QA and testing team. </p>
<p>It used to be a single team. Now, we have one man, Omer Bar Nir, who&#8217;s the QA architect over the whole thing. But we have QA engineers, now, attached to each of the development teams. And so, they are focused specifically on the QA and testing of, say, backup and replication, or the storage engines, and things like that. So, where it used to be that the QA was across the board, now they’ve split up and focused on specific pieces. And I don&#8217;t think that&#8217;s very different from any other open source project. The way we release is, we use a tool called Bit Keeper for our source control, and we do nightly or daily snapshots from that, which you can take and build the source code yourself. </p>
<p>And then, once in a while, we&#8217;ll package up the source code into tar balls, or zip files, depending on what platform you need. And then, depending on what version of MySQL, whether it&#8217;s Enterprise or Community, they&#8217;re built into binaries and then distributed. All the distros that I know of don&#8217;t use the binaries at all. All the Linux distributions, they actually take the source, from either BitKeeper, or from the source tar balls for a release, and then modify it to suit their needs, mostly by where the configuration files go on install, what&#8217;s in there by default, all that kind of stuff. And then they package it up into a. DEB or an RPM, or whatever it is. </p>
<p><b></b></p>
<p><b>Scott</b>: Right. </p>
<p><b>Jay:</b> Now, for Windows folks, the vast majority of Windows users don&#8217;t have the ability to compile software locally on Windows. So, it&#8217;s much more important that we provide binaries for MySQL on Windows than it is for the Linux folks. So, that&#8217;s most of why MySQL has been providing binaries for so long. Also, we say, &#8220;If we built the binary, we&#8217;re assuring you that it&#8217;s stable on that platform.&#8221; And to be honest, most of the Linux distros are very stable as well. The same goes for Mac and Windows. We built the binaries so people can download them and install them. </p>
<p><b>Scott:</b> So, what percentage do you feel like of the QA or of the bugs that are found and posted for you guys to fix, how much is found by internal QA versus how critical is the community to wringing those bugs out of the product while you&#8217;re posting the daily builds, moving towards release? </p>
<p><b>Jay:</b> That&#8217;s a good question. I would have to refer to Omer and the MySQL group, they kind of have these stats. But I would say that, internally, probably 10% to 20% of the bugs are found by MySQL engineers or support engineers. And then you&#8217;re going to come across this gray border between who&#8217;s a user and who&#8217;s a customer. </p>
<p>A lot of users are also customers. Sometimes we&#8217;ll get a fairly large installation, say, Yahoo Finance or Google, that submits a bug on a specific version of MySQL. But a lot of times we&#8217;ll get larger installations from users as well. </p>
<p>We also have something called the Quality Contributor Program, which is for users that are really our bug seekers. They&#8217;re actively trying to find edge cases where stuff just blows up. And so we have a program for people like that. But overall I&#8217;d say it&#8217;s fairly spread out between internal folks finding bugs, customers finding bugs, and then the larger communities finding bugs. </p>
<p><b></b></p>
<p><b>Scott:</b> Got you. </p>
<p><b>Jay:</b> <a name="gold"></a>But we do get a ton of bugs. The majority are small. In other words, documentation type stuff. Whenever I&#8217;m giving a talk on MySQL, I talk about community. And I always say that bugs are gold to MySQL. We value them just as much as anything else from the community—especially a reproducible bug case. </p>
<p><b>Scott:</b> Right. </p>
<p><b>Jay:</b> Because it saves so much time for the engineers. Run this code and there, it crashes. That kind of thing is gold to MySQL. So, I&#8217;m always encouraging people, &#8220;If you ever find a bug in MySQL, don&#8217;t ignore it. Send it in.&#8221; </p>
<p><b>Scott:</b> That makes perfect sense. Talk about the work that MySQL—I mean, obviously it&#8217;s used so pervasively now, and lots of mission-critical stuff is built on it—what things do you do around security, reliability, all of the &#8220;itys&#8221; that people talk about with software? </p>
<p><b>Jay:</b> All the &#8220;itys.&#8221; [laughter] </p>
<p><b>Scott:</b> Stability, reliability. </p>
<p><b>Jay:</b> That&#8217;s a good&#8230; Performanceability. [laughter] Usability. </p>
<p><b>Scott:</b> Usability, scalability. </p>
<p><b>Jay:</b> <a name="tenents"></a>The three things that MySQL always strived for are performance, reliability, and ease of use. Those are the three binding principles of how our engineers kind of evaluate how well we&#8217;re doing. Is it easy to use? Does it perform well? Is it reliable? As far as security and stuff, as an open-source project we tend to worry a little bit less about security. There are just so many people looking for security holes in the software, because they can see the source code and look at it. They can see major problem areas and we usually get notified quickly and respond very quickly to those kind of things. Let&#8217;s see&#8230; Scalability. I&#8217;m biased, but I think we scale very, very well. And it&#8217;s always something we&#8217;re thinking about internally, because performance doesn&#8217;t necessarily mean scalability. You can get a hundred concurrent connections for doing web pages or responses at half a millisecond, but if you can get 10,000 concurrent connections at 0.7 seconds, it&#8217;s less performance but better scalability. And there&#8217;s sort of this constant refactoring process going on. How can we make this better? How can make it scale better. All that kind of stuff. </p>
<p><b>Scott: </b>Cool. </p>
<p><b>Jay:</b> Which I&#8217;m sure is the same with any closed-source software house and any open-source project as well. You&#8217;re always thinking about all those &#8220;itys.&#8221; </p>
<p><b>Scott:</b> Right, right. Well, you&#8217;ve got a mature product, so you&#8217;re not engineering it from scratch to have all of those, but you&#8217;re evolving it, and a lot of the work now is more in terms of&#8230;Either there&#8217;s a well-defined opportunity to rewrite something and increase performance and scalability, or you&#8217;re really just &#8212; as you add new features &#8212; trying to make sure you don&#8217;t negatively impact those areas that are already good. </p>
<p><b>Jay:</b> Right. As you increase the features in the code base, you increase the code complexity, and you always look out for performance regressions because of that. And there&#8217;s a way to combat that, but the general rule of thumb is, the more code you add to something, you&#8217;re going to impact the performance. So, there&#8217;s always this balance. Do we need this feature? Because the last thing I think MySQL wants to become is—no offense to Oracle or PostgreSQL—a database that has a million features that no one uses. </p>
<p><b></b></p>
<p><b>Scott:</b> Right. </p>
<p><b>Jay:</b> And that&#8217;s something that does go through the mind of the software architects at MySQL. Is this a critical functionality that the majority of users are going to use and are going to value, or is it going to be passed over and just slow down the code? </p>
<p><b>Scott:</b> That seems to be a key challenge of closed-source proprietary companies, is that they really have to guess. They have to shoot in the dark in terms of&#8230; First they have to come up with big features, because if they don&#8217;t, they can&#8217;t compel people to buy the upgrade. </p>
<p><b></b></p>
<p><b>Jay:</b> Right. Which is the opposite of how MySQL sells our stuff. We&#8217;re not trying to be this enormous feature-rich database piece. We&#8217;re trying to be the best and fastest online database. And so the features that we&#8217;re adding are designed for highly-scalable web applications and online databases. </p>
<p><b>Scott:</b> And I would guess, too, as you open it up to more and more community input, it will become easier to identify features which will be widely used. Because, the worst thing in the world is to write a feature, nobody uses it, but you can&#8217;t ever cut it because it&#8217;s actually not that nobody uses it, it&#8217;s that three people use it. </p>
<p><b>Jay:</b> Yeah. And this goes back to that, what&#8217;s commercial versus what&#8217;s community? And I think that actually closed-source software companies have more of a problem with this, in that a large customer really, really wants this feature in there. And it&#8217;ll be added into the code base and sometimes significantly affect performance of another piece, but it&#8217;s been bought and paid for by a customer and will stay in there. And unless the software is written in a modular fashion, it will impact adversely everyone else who will never use that feature. And I think the open-source model, which tends to lean towards a more modular architecture, can handle that dilemma better. </p>
<p><b>Scott:</b> And also, in things that are very, very open source, where most of the code is coming from community contributors, there are no features that are being written because somebody thinks someone else will want them. The only features that are being written are, &#8220;I&#8217;m writing this feature because I need it.&#8221; So, that&#8217;s at least a little validation that the feature is needed by somebody. </p>
<p><b>Jay:</b> Sure. </p>
<p><b>Scott:</b> But, doesn&#8217;t MySQL have just exactly the problem you talked about? Because you mentioned that you guys do some nonrecurring engineering. </p>
<p><b>Jay:</b> Yeah. And that&#8217;s the dilemma that all commercial software companies are faced with. Which is why I&#8217;ve said as we move more and more into that mix of community input and also making that core kernel much more modular, I think that we can significantly offset the disadvantage of that, or the drawbacks of having nonrecurring engineering work done. </p>
<p><b>Scott:</b> So, initially, I&#8217;m guessing the main driver of having MySQL be open source was just so that it would be used and accepted. In other words, it&#8217;s a much more difficult proposition to sell something that&#8217;s completely closed-source proprietary that targets Linux as a primary platform. And so, it seems like a lot of companies open-source their software and they derive their revenue off other things, support and that kind of stuff. </p>
<p><b>Jay:</b> And packaging. </p>
<p><b>Scott:</b> Yeah, and packaging. Otherwise you&#8217;re just not in the game. </p>
<p><b>Jay:</b> <a name="money"></a>Right. Well, one part of it is the revenue, right? When you look at open source versus commercial, commercial has just an enormous advantage from a revenue perspective because of their control over their product, right? The open-source company doesn&#8217;t necessarily have that. From the exact opposite end of the spectrum, the open-source company doesn&#8217;t have nearly the amount of cost involved in R&amp;D, QA, and testing that a closed-source company does. </p>
<p>So, the big shift that&#8217;s happened is that you&#8217;re going to see closed-source companies start to open source products that they are tired of spending money supporting, and let the open-source community take on the cost of that support. Now, I&#8217;ve read recently that Microsoft is open sourcing Visual FoxPro, which I thought was awesome. And then I started thinking about it. I&#8217;m like, &#8220;Well, they&#8217;re probably just tired of supporting it. And just give it to the open-source community and see what they do with it.&#8221; And I think that&#8217;s where we&#8217;ll start to see the first major shift with closed source companies that open-source products because they realize that the cost benefit of doing that, and letting that source out there to the open-source community to test and QA and support, is just so much more worth it than keeping an older product in-house that&#8217;s really is not making any revenue. </p>
<p><b>Scott:</b> Well, and one of the places where you see something similar to that is Adobe open-sourcing the Flex SDK. And to them it makes sense because it wasn&#8217;t something that they ever sold anyways. </p>
<p><b>Jay:</b> Right. </p>
<p><b>Scott:</b> So, it was free to begin with. There doesn&#8217;t seem to be a lot of downsides in open sourcing it. </p>
<p><b></b></p>
<p><b>Jay:</b> And there is a difference between free as in no-cost and open source. And MySQL is open source, and free and open source as in free as in freedom. But it doesn&#8217;t necessarily mean that just because something is GPL or is open source that it&#8217;s free of cost. </p>
<p><b>Scott:</b> Right. </p>
<p><b>Jay:</b> And the original definition of free and open-source software really had nothing to do with cost. </p>
<p><b>Scott:</b> Right. </p>
<p><b></b></p>
<p><b>Jay:</b> Right. So, when I talked about Microsoft or other companies open-sourcing and making free software, I meant it more in the sense of free as in freedom, so that the developers can get their hands on it, and tool with it, and tweak it, and completely change it, and support it themselves. It had less to do with charging for it. </p>
<p><b></b></p>
<p><b>Scott:</b> But it seems to me like a place where companies are looking at open-sourcing products and making them free as in freedom, are places where the product was already free as in cost, or maybe it wasn&#8217;t free but it&#8217;s just not generating a lot of revenue. </p>
<p><b>Jay:</b> Exactly. It&#8217;s costing more for them to support it than it would the open-source community. And that&#8217;s where I think the first round of closed source becoming open source is going to happen. </p>
<p><b>Scott:</b> So, MySQL started out as being open-source, but developed by a for-profit company. And at this point you&#8217;re moving to more of a traditional open-source model. And I&#8217;m guessing it&#8217;s because as the product has grown, and as it&#8217;s gotten more complex, and there are more and more features to it, there&#8217;s a need to kind of force it to be more modular. There&#8217;s a need to get more community involvement in the development&#8230; </p>
<p><b></b></p>
<p><b>Jay:</b> Yeah, and input. Right. </p>
<p><b>Scott:</b> And just to really shape the direction because it&#8217;s not just a single little standalone database anymore, right? I mean, this is a pretty massive product that you have. And, you&#8217;re at that point where you need that cost savings that the community can provide. And you need the input, and direction, and expertise that the community can provide to really move the product forward in the best possible way. Am I summing it up correctly? </p>
<p><b>Jay:</b> Yeah. Yeah, although I don&#8217;t think the plan was ever to push it towards the Apache or Linux model. But we want more of a balance, just like you said, so that we can get the benefits of the community. But also, so we can give more back to the community, and have them happier with the product that we&#8217;re providing. </p>
<p><b>Scott:</b> Sure, sure. There&#8217;s more buy-in when it&#8217;s something that you can actually work on. </p>
<p><b>Jay:</b> Right. </p>
<p><b>Scott:</b> And there&#8217;s more buy-in when you touch the product, so to speak, I guess. </p>
<p><b>Jay:</b> Right. </p>
<p><b>Scott:</b> OK. Well, I guess, what else would you like to say? I&#8217;ve gone through the questions that I had. I&#8217;ll go ahead and hand you the microphone. And any message that you think is important to get out that I didn&#8217;t cover, feel free. </p>
<p><b></b></p>
<p><b>Jay:</b> <a name="getinvolved"></a>Yeah. Well, I definitely did want to point out that the MySQL community headquarters is becoming the MySQL Forge website. And that is <a href="http://forge.mysql.com/">http://forge.mysql.com</a>. And, right now, what we have in there is a list of open-source and MySQL-related projects in a project directory (http://forge.mysql.com/projects/). A whole directory of code snippets in various languages on how to use MySQL, or coded in C++, PHP, Perl, SQL,.NET, you name it is in there (http://forge.mysql.com/snippets/). And then we also have our public worklogs, which I mentioned earlier. It essentially represents our big roadmap. And it&#8217;s broken into specific tasks that are unassigned or assigned to a specific developer. And the developers really want input from the community. So there&#8217;s a way of commenting on those work blogs that I highly encourage people that are interested in MySQL to go and give your input to the developer of that specific piece, and let them know what you think about it. So, forge.mysql.com. And then there&#8217;s also a huge Wiki that we&#8217;re developing as well. That would be the last thing I&#8217;d say. </p>
<p><b></b></p>
<p><b>Scott:</b> So then, if people want to contribute code to MySQL, if they want to actually work on it, how set up are you to take community contributions at this point? Or what do you think the timeline is to where you&#8217;ll really&#8230; </p>
<p><b>Jay:</b> We&#8217;ve accepted, I think, about 80 contributions already this year—contributions meaning one-line patches to semi-large features. So, we&#8217;re already set up to do that. If anyone is ever interested in doing that. I would highly suggest going to <a href="http://forge.mysql.com/wiki/Contributing">http://forge.mysql.com/wiki/Contributing</a>. </p>
<p><b>Scott:</b> Got you. </p>
<p><b>Jay:</b> And, that is how you can get all sorts of information on the various ways you can contribute, the different IRC channels where the hackers hangout, and where you can get help, how you can build a test case, all that kind of stuff. How you can build locally, and all that kind of information is all in that page? </p>
<p><b>Scott:</b> Jay, thanks for taking the time to chat. </p>
<p><b>Jay:</b> Thanks.</p>
<img src="http://howsoftwareisbuilt.com/?ak_action=api_record_view&id=68&type=feed" alt="" /><!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a><strong><em>Bookmark this:</em></strong></a>
<br />
<div class="d">
<br />
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2007%2F07%2F18%2Finterview-with-jay-pipes-north-american-community-relations-manager-at-mysql%2F&amp;title=Interview+with+Jay+Pipes%2C+North+American+Community+Relations+Manager+at+MySQL" rel="nofollow" title="Add to&nbsp;Del.icio.us"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/delicious.png" title="Add to&nbsp;Del.icio.us" alt="Add to&nbsp;Del.icio.us" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2007%2F07%2F18%2Finterview-with-jay-pipes-north-american-community-relations-manager-at-mysql%2F&amp;title=Interview+with+Jay+Pipes%2C+North+American+Community+Relations+Manager+at+MySQL" rel="nofollow" title="Add to&nbsp;digg"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/digg.png" title="Add to&nbsp;digg" alt="Add to&nbsp;digg" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.facebook.com/sharer.php?u=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2007%2F07%2F18%2Finterview-with-jay-pipes-north-american-community-relations-manager-at-mysql%2F" rel="nofollow" title="Add to&nbsp;Facebook"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/facebook.png" title="Add to&nbsp;Facebook" alt="Add to&nbsp;Facebook" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://reddit.com/submit?url=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2007%2F07%2F18%2Finterview-with-jay-pipes-north-american-community-relations-manager-at-mysql%2F&amp;title=Interview+with+Jay+Pipes%2C+North+American+Community+Relations+Manager+at+MySQL" rel="nofollow" title="Add to&nbsp;reddit"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/reddit.png" title="Add to&nbsp;reddit" alt="Add to&nbsp;reddit" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.stumbleupon.com/submit?url=http%3A%2F%2Fhowsoftwareisbuilt.com%2F2007%2F07%2F18%2Finterview-with-jay-pipes-north-american-community-relations-manager-at-mysql%2F&amp;title=Interview+with+Jay+Pipes%2C+North+American+Community+Relations+Manager+at+MySQL" rel="nofollow" title="Add to&nbsp;Stumble Upon"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/stumbleupon.png" title="Add to&nbsp;Stumble Upon" alt="Add to&nbsp;Stumble Upon" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.sphere.com/sphereit/http%3A%2F%2Fhowsoftwareisbuilt.com%2F2007%2F07%2F18%2Finterview-with-jay-pipes-north-american-community-relations-manager-at-mysql%2F" rel="nofollow" title="Add to&nbsp;SphereIt"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/sphereit.png" title="Add to&nbsp;SphereIt" alt="Add to&nbsp;SphereIt" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://twitter.com/home/?status=Check+out+Interview+with+Jay+Pipes%2C+North+American+Community+Relations+Manager+at+MySQL+@+http%3A%2F%2Fhowsoftwareisbuilt.com%2F2007%2F07%2F18%2Finterview-with-jay-pipes-north-american-community-relations-manager-at-mysql%2F" rel="nofollow" title="Add to&nbsp;Twitter"><img class="social_img" src="http://howsoftwareisbuilt.com/wp-content/plugins/social-bookmarks/images/twitter.png" title="Add to&nbsp;Twitter" alt="Add to&nbsp;Twitter" /></a>
<br />
</div>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://howsoftwareisbuilt.com/2007/07/18/interview-with-jay-pipes-north-american-community-relations-manager-at-mysql/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

