Interviewee: Jamie Thingelstad
In this interview we talk with Jamie Thingelstad – CTO of the Wall Street Journal’s Digital Network. In specific, we talk about:
- Open and closed-source technology stacks at WSJ
- The technology evaluation process
- Intersections between open source and commercial entities
- Security considerations with open and closed platforms
- The value and liability of owning code
Sean Campbell: Jamie, thanks for taking the time to chat. Could you give us a little background on your role at Dow Jones?
Jamie Thingelstad: Sure. I’m chief technology officer for the Wall Street Journal Digital Network, which is comprised of WSJ.com, MarketWatch.com, Barrons.com, and a number of our smaller web properties. I’ve been in this role for about a year and a half. Prior to that, I was the CTO for MarketWatch.com inside Dow Jones, after we were acquired by Dow Jones.
Going back further, I was CTO and one of the early founders of Bigcharts.com, which is a premium charting website. BigCharts was acquired by MarketWatch in 1999, and I became CTO of MarketWatch after the acquisition. We acquired a few companies, and then Dow Jones acquired MarketWatch.
Scott Swigart: You mentioned a number of digital properties that Dow Jones has. Tell us a bit about the infrastructure behind those and the kind of volume that those properties have to handle.
Jamie: Since they’re all news sites, there’s a certain unpredictability to the traffic volume. For instance, the market can drop at any second, and that can drive huge amounts of volume to our web sites.
The two largest sites that we run, WSJ.com and MarketWatch.com, are fairly different. WSJ.com is built in a Java environment using WebSphere, making a slow migration to Tomcat. It’s built in a fairly rigid framework, but it is very robust and very scalable. It’s stable, but also not quite as flexible as we’d like it to be.
The MarketWatch platform is a bit different. MarketWatch is built in a Windows environment and uses .NET. It was originally built in ASP and has been migrated to .NET in bits and pieces. It follows a “hundreds and hundreds of servers” approach, where no one server is doing all that much, so it is a bit more agile, and we can typically deploy things a little bit faster. It also has a slightly different performance characteristic. It can have the equivalent of a brown out more often than say, WSJ.com.
Scott: That’s pretty fascinating that you’ve got several high volume, critical sites and yet their underpinnings are so different from each other. You mentioned moving the Wall Street Journal site to an Apache open source back end, but the MarketWatch site is on a proprietary Microsoft stack.
What can you tell us about the decision-making process as different organizations evaluate technology choices? It looks like these two groups took pretty different approaches to solve similar goals
Jamie: The lesson to be learned there is that you can build large-scale sites to be stable, reliable, and highly available in either of those platforms. There is nothing about one versus the other that inherently makes it unable to build and provide those kinds of solutions. Good developers working in either one can get the job done, and do it elegantly.
The historical context is that the Journal web site as it exists now is the descendent of an engagement with IBM Consulting many years ago, and they obviously wanted to build in a WebSphere environment.
All this predates my being part of Dow Jones, but my understanding is that the decision was more about who to partner with than how to get to a specific architecture. In this case, IBM came to the table with WebSphere. In the MarketWatch example, that’s the platform that we built over the course of many years starting with BigCharts, and the platform actually started in a very different place.
In the very early days, before it was MarketWatch, when it was the platform that ran BigCharts, it was Sybase. It was Perl and CGI in Apache. And it was an ANSI C++ program that we had written to serve charts, that could actually be compiled on both the Win32 library as well as in Unix, at the time Solaris. And so, we had this sort of cross-platform environment. We made a decision around 1996 or ’97 that we were spending too much effort building plumbing code and decided that we needed to move to something else.
At the time, we evaluated CORBA and COM, and our conclusion was that COM was better suited to the type of solutions that we were trying to build. We rewrote the entire system into a COM model and then switched our front-end production environment from Solaris to Windows NT. The net benefit of all that was that we stopped writing a lot of code that was about glue and plumbing, and focused all of our code on business requirements and functions and features that our customers wanted.
Scott: Can you walk us through how you think about the technology evaluation process? And, as a follow-up, broadly speaking, what are your observations about the pluses and minuses of building on top of a community-supported open source stack, versus going with more of a partner, like IBM or Microsoft?
Jamie: I think you have to start with your team. If you’re able to build your own team, it’s less of an issue, but if you’re walking into an environment where the team is well-versed in one tool set or one part of the stack, you shouldn’t fool yourself into thinking that you can then choose some alien technology and that the team will just flow with it.
I don’t know many Java shops out there where you could just walk in and say, “Well, you know what? We’re going to do the next project in Windows” and have that be just fine. Independent of culture, independent of bigotry, I think it’s a matter of skills and capabilities, and some of those skills don’t transfer easily.
But if you take the team out of it, then I think it’s a pretty interesting angle. My view is that there really is little value that a proprietary platform like the Windows platform gives you. If I were going to start, green field, building my own shop, I would be heavily biased towards free platforms: some Linux distribution, likely something like Apache.
I would also be very biased towards an interpreted environment. I think there have been great advances in the compiled environments for the web, and .NET has made huge leaps in performance and capability. Similarly, so has Java. But it’s interesting that the sites that I see doing the most “out-there” stuff, that are the examples of innovation, are typically not running a compiled environment on the front end. WordPress, Twitter and Facebook come to mind.
I hate to suggest that there’s an either-or situation where either you are innovative and you’re on an interpreted environment, or you’re not and you run a compiled environment. I don’t think it’s quite that simple. But I do think that there is some real merit to the lightness that you get with something like PHP, with a robust framework, or Ruby on Rails.
In fact, we’ve tried to have our cake and eat it, too. One of the most recent products that we released on the MarketWatch site is a product called MarketWatch Community. And that project was actually built using the MonoRail framework from Castle Project, which is essentially the Rails framework laid on top of C#. It’s an interesting way to get the benefit of having a team that understands tools like Visual Studio, and wants to have that very robust debugging environment, but wants to also take advantage of the flexibility that Rails offers. We have been able to have it both ways, if you will.
Scott: I think there has been a resurgence in the last few years of dynamic languages or interpreted languages. That programming model seems to be making a pretty strong comeback. When I got started with Unix, way back in the day, they were used for a lot of administrative things, and there were things like Tcl, Tk and some of that kind of stuff. Then it went very compiled. Now it seems to be going back towards more of a dynamic language approach.
Jamie: It seems we always have the same problems, but we want to solve them in different ways. We try to find performance and optimization in different parts of the stacks. People have been arguing for decades: could you make an interpreted environment that’s as fast as a compiled environment just by doing JIT-based compilation?
Maybe you can, maybe you can’t–I don’t know. That’s pretty academic. At the end of the day, you can’t get away from the fact that Facebook is written on PHP. There are very large sites like Twitter that do an extreme amount of requests per second, and it was build on Rails. I don’t think that you can state emphatically anymore that those environments do not scale. Clearly, they can, even if it takes a different set of tools.
Scott: There was an article I read a couple of years ago that was called “The Hundred Year Language,” or something like that, where somebody was speculating about what we’re going to be using to program in 100 years from now. His conjecture was it will be just insanely inefficient, but it won’t really matter.
Because we would all write in assembly, if we wanted to wring the most performance out of it. Whether you have to buy 100 servers or 200 servers to run it, if it’s very agile in terms of being able to scale, then scale out and focus on supporting change rapidly and those kinds of things. That’s a lot more important than just wringing every drop out of the CPU.
Jamie: One of my professors in college, John Reidl, and this was the early ’90s so I think he was a little ahead of his time, would say that CPU, memory, bandwidth and disk all scale to infinity. The only thing that doesn’t is programmer time. At the end of the day, the most important thing is to understand your code and to be agile with your code. Hopefully, you can solve everything else through other innovations.
Sean: One thing I’m curious about is usability, whether that’s from a management perspective, a development environment perspective, or an end user perspective.
When you’re thinking about acquisitions is it less about scalability, and more about usability? Do you feel like the open source community is best to drive things that have a high degree of usability and utility? Or do you feel that that’s the place where maybe they’re striving to, or they have limits that are inherent in the model?
Jamie: Well, I guess it depends on what level you define that usability. Some of the most opaque projects around are open source projects. The code is all available, but unfortunately, you have to actually look at the code to figure out how to do anything with it.
And the result of any question is, “Well, you’re an idiot. Go look at the code.” That’s the extreme, but I think that’s one of the places that the open source model puts an additional obligation on somebody like a CTO, or a VP of engineering.
With proprietary software, you can somewhat base the viability of the product on its market acceptance, and you get the benefit of the fact that either these other people have bought it or they haven’t. And they have reasons why they bought it or they haven’t bought it. And bad products die naturally, out of lack of adoption. In the open source model, that’s not necessarily the case. You can have a product that is great and just doesn’t get picked up, or vice versa, so it requires a vetting that’s a little bit different. What’s the velocity of the project? How many people are truly committed to the project? How willing are you to commit to the project?
I couldn’t tell you how many corporate CTO types I’ve talked to that would love to use open-source projects, but would never give anything back. And, if that’s your position, I think you should question the ethics of using open source software and just being a parasite. If you’re not willing to give back your additions, which in some cases is the requirement, that’s not good.
The Mac also has an interesting dichotomy. I’m an old-school Mac guy, and I’ve recently returned to the platform when they switched to Intel. I have commented to many developers that they need to go get a Mac, because I think that, in many ways, the most innovative software that is being built right now is being built on the Mac. In some cases, it’s just wacky programs like WriteRoom.
A great example is QuickSilver. QuickSilver re-defines how you would interact with your computer. It’s clear to me that the most innovative development right now is happening on the Mac. Yet at the same time, the thing that bothers me about the Mac is that it is positioned as the computer for the rest of us. It’s portrayed as the people’s computer, when in fact; it’s the most closed platform out there. You can’t build one.
I have always felt that people’s computer needs to be a computer that the people can build. So there’s this dichotomy. But it’s a great ecosystem. Really great software is being built on Macs. I don’t know if I have ever seen a Rails screencast that was done on a PC. It’s very clear that Apple has successfully gained mindshare in that developer community, and they’ve gotten a lot of market share as a result.
I doubt you’ll ever see any volume of Macs in server rooms. But at the end of the day, it’s Unix underneath, and that’s what gives it the credibility to be able to do things like Rails development.
Sean: It seems like there are a lot of open source acquisitions and even some open source projects that have a single voice at the maintainer level. Sun gets criticized for this a lot. You see a lot of companies put an open-source bumper sticker on their product. The “free” version is open-source, but that’s just to get some market share and up-sell you to the paid version. Do you see a trend that direction, or do you see it going a different way?
Jamie: I think one of the great examples there is Red Hat. I don’t want to say that they’re not fully imbued with open source in their blood. They are. But Red Hat Enterprise is as expensive as Windows. You come to the table with this idea that it’s Linux, so it must be cheaper, but in fact, no it’s not. That proves out that technology teams still need support. No matter how good the teams are or how deep your team is, from time to time, most teams are going to need support from an expert. So there’s a market to be made there in providing that support.
It also highlights that the traditional software business model is still valid. Microsoft’s and Oracle’s models are valid, and are a proven way to run a business. These companies are successful doing that. One other thing concerns me–Sun buying MySQL makes me wonder about the future direction of MySQL. Is MySQL going to continue to be a viable platform for the kinds of projects that it has been, or are they going to try to move it into a different market space?
And even MySQL, prior to Sun buying it, is an interesting example. I’ve noticed that SQLite has really made a lot of headway. And what’s happening, from my perspective at least, is that SQLite is picking up where MySQL has abandoned. MySQL decided to go more up-market and tackle bigger problems, and build in some features that are going after the Oracles and the Microsoft SQLs of the world. And that left a space for a much lightweight SQL engine, so SQLite is filling that gap.
So you could argue that, maybe, that’s how it should work, that when a void is created, there’s a new open source project that will fill it. But I do think that this need for an enterprise version of these various packages is really just a reflection on the support needs of organizations.
The big thing that they sell all that stuff on is FUD. When there’s a security issue, you want the verified Red Hat patch, so you’re going to pay a thousand dollars a year per box, or something like that.
Scott: When you look at the spectrum of open source projects, you’ve got some projects that are very, very old school. You’ve got Linux. You’ve got Apache Web Server and some of that stuff. They’ve been around for a long time. Extremely mature communities, high-quality code, and ubiquitous.
And then you’ve got kind of this next-generation of open source, to some degree–PHP, MySQL, Xen–a lot of these kind of companies, where the main driver behind the code base is a single company. And some of them are really what I’ve heard people call an aquarium model–like MySQL–where they don’t really have outside contributors. You can look at the code, but unless you work for MySQL, you’re probably not a contributor. A lot of Sun’s stuff is that way, too, although I think they’re actually working to get more outside contributors.
And so the question is, if you’re going to launch an open source project today, how do you take the world by storm with it? Take a look at Ubuntu building on top of Debian. It seems like the ones that catch fire are the ones where there is a revenue model and there is a business around the code base. What are your thoughts on that?
Jamie: I think you’re right, although it’s hard to know what’s the cause and what’s the effect. Is it successful because there’s this revenue model, or is there this revenue model because of something else? Are there examples of very, very successful open source frameworks that are not built around a commercial endeavor? Could you say Rails? Rails is fundamentally the creation of 37signals, and they’re doing commercial things with it.
Scott: Debian comes to mind; they’re not commercial. But then again, Debian caught fire to some degree when Ubuntu used it as their basis. There seems to be a bit of an uneasy relationship between the two communities. I think Debian is thankful for the attention that Ubuntu brought, but at the same time, the philosophies of the projects are different. Going the other direction, Canonical is very happy to say, “we’re built on this rock solid distribution. It’s a distro you can trust, but we also have got our own ideas.”
One of the things that I look at is that we are in this first wave, so you’ve got PHP, which has the chief sponsor Zend. You have Xen, whose chief sponsor is XenSource. Red Hat acquires JBoss. And now as these acquisitions start to happen, I wonder what the landscape will look like in five years. MySQL gets acquired by Sun, which has promised to open source everything, so they are moving toward an open source model.
XenSource gets acquired by Citrix, which doesn’t have an open source history and ethos. JBoss gets acquired by Red Hat, that’s sort of open from the beginning. Then maybe some of those companies get acquired by other companies. To me, it will be interesting to see how much of the projects will remain tied to the companies.
And what happens if Citrix ends up not being a good steward of Xen? Will they end up having bought nothing because there will be a fork that emerges?
Jamie: The other challenge that presents itself in those cases is, how many open source contributors want to contribute to Citrix? Probably not a lot. Once you tie projects into these corporations, the culture may resist it and go off in a different direction.
I think it will be a true test over the course of the next two years to watch the legal pushes on the GPL and the various software agreements out there. Will companies be able to weasel themselves out of that, or just identify ways to break the requirements of the GPL or the Apache license?
Scott: It’s interesting too that MySQL had a way of unlocking the GPL, so if you didn’t want to buy into the GPL but you wanted to redistribute MySQL with your proprietary code, you could just pay MySQL for that right.
On the other hand, if you wanted to adopt a GPL license yourself, then you could redistribute without paying MySQL the licensing fee. It’s a very kind of Wild West feeling. Everybody is trying a different model and a different way. I wonder if that’s the way it will stay, or whether things will consolidate over the course of a decade. How much does this disparity of business models and licensing for open source projects weigh on your decision making?
Jamie: It weighs a lot. It concerns me greatly that we would be leveraging an open source framework that could potentially die or be acquired by the wrong company. Of course, the acquisition angle can happen no matter what. Products can be acquired by companies that you’re not really excited about, so I don’t think you’re protected from that in any model.
But the bigger issue, in my book, is the longevity and the velocity of the project. For example, I think Ruby on Rails has an incredible amount of velocity right now. There are a lot of people working on it, there’s a huge developer community out there. But I ask myself questions like, “Will I be able to find a Ruby developer in 10 years?” I don’t know. Will I be able to find a Ruby developer in a year? They’re not out there at the same volume as other skills.
So in that case, the fallback is, how hard is it to teach people? If it’s a fairly transferable thing and it’s something that an average engineer who knows a few languages could easily pick up, then I don’t think there’s a high amount of risk. But if it’s something very esoteric, I wouldn’t touch it. The other question to ask is, at what layer of your architecture does that system live?
There is a certain level of the architecture where I would be concerned about the ability of an open-source project to provide that level of always-on availability. I have to admit that that goes in the face of Linux itself, since in many cases, Linux is the underlying OS. But it’s at a little bit of a different layer in the stack. Knowing the complexity of that layer of development and knowing that it’s something that companies have a very hard time pulling off, I would be hesitant to use an open-source load balancer.
Sean: Given that a lot of the work you do is with financial data, I can only imagine you’re an interesting target for a variety of folks out there who have malevolent intent. You’ve got two stacks–one that’s predominately a closed-source base and one that’s predominately an open-source base. What do you think about security issues around those platforms? Lately, this is a really, really hot topic. It always is, but it seems to ebb and flow.
Jamie: Maybe I’m too much of a technologist to look for it, but whenever somebody says their platform is more secure than everybody else, to me it’s just a bunch of crap. There are really very few platforms that have truly iron-clad security. And by and large, those are the platforms that have been around for so long that they’ve just been exposed to many, many tests. When you think of the Unix kernels themselves, some of them have not seen significant modifications for a decade. And those are probably pretty secure.
But the reality is, if you’re regularly modifying and releasing code, you’re going to have security issues. So we protect all our systems with counter-measures in the front and make sure that we have compensating systems to defend against all of that. So I wouldn’t go one way or the other based on this promise of security. In fact, I think it’s a little slimy for Red Hat to use security as the push toward getting people to buy Red Hat Enterprise. I think you should be selling based on features and not fear, but I think it’s a wash between the various platforms.
Scott: There are organizations that are building things on top of Linux or Apache, and they’re not being hacked right and left. Like you said, it may not have that much to do with the platform itself. It may have to do with the expertise of the organization like yours. You actually do the implementation and know how to build these perimeters of defense and that kind of stuff. Do you have an opinion on the notion of many eyeballs as it relates to security?
Jamie: My general opinion is there is some difference between the number of the eyeballs and the quality of the eyeballs. Just having thousands of developers looking at your code certainly is not going to qualify it as secure unless they really are qualified to say it is secure. I do think there is a benefit to that. At the end of the day it gets to these bigger issues, like what is the right way to develop source?
We know that in medicine, for example, it is not good for one doctor to just do everything themselves. They should consult and review their findings with other doctors and come to some conclusions. We certainly would not let one engineer build a bridge without ever having another engineer look at the code. In technology, we have this belief that it is our intellectual capital. The source code is the family jewels, and that is where the intellectual capital is located.
I don’t tend to really follow that idea. I tend to believe more that the intellectual capital is the engineers that are writing that code. I have actually gone on record many times saying that your source code repository is not intellectual capital, it is a liability. If you look at it on your balance sheet, you should be putting it on your balance sheet as a liability. The only thing you have to offset it is the intellectual capital you have in your engineering team. All code is just waiting to break. It is just a matter of time.
Scott: I have heard people say that code is like maintaining inventory. You have a warehouse full of stuff.
Jamie: Every line has a cost. In the future, is the fact that your code is open source really going to matter much to your business model?
You could argue that companies like Red Hat and MySQL are showing that it’s not the case–that you can have a very normal business model and have an open source code base underneath that.
Sean: Right. Someone would have to treat MySQL very, very badly for somebody else to do a meaningful fork, because nobody else is going to staff 200 developers on it.
I think Sun is going through that transition, where they’re realizing the value of Solaris is not necessarily the code itself. It is the fact that they have x-hundred number of Solaris developers that know how to work on it. I think Microsoft could do the same thing, if they open-sourced Windows.
Jamie: You could argue, what would happen if Microsoft just open-sourced Windows tomorrow?
Sean: I don’t think it would change very much at all. Again, nobody is going to fork that in a way that Dow Jones would trust this competitor to enhance, patch, and build the next version of Windows more than they would trust Microsoft to keep going.
The expertise is in the people’s expertise around that code base. That seems to be fundamentally what a company is acquiring. As long as they are reasonably good stewards of the code, I don’t think companies have to worry about losing control of a project just because it is open source.
Jamie: It is interesting, because I am a believer in this kind of long view of computer science. What I mean by that is it is still an incredibly immature field. It is much more art than it is science. We have not figured out the basics to our craft to the extent that people who build bridges have figured it out. We may find 20 years from now that the best practice is for all the code to just be out there and that is just how you do it.
Part of the benefit of that is the many eyeballs. Or part of the benefit is automated agents that are scanning code on the behalf of third parties, looking for things. It is hard to know exactly what that would be. In general, I do think this belief that your code is a huge asset you have to keep locked in a big safe is terminal. I do not think that is going to be the path forever.