Replay Solutions makes innovative solutions the help developers troubleshoot and
debug appliciatons. Basically, it lets you record interactions that are
happening in product, play them back in the lab, and figure out what's going on.
In this interview, we chat with Jonathan Linxu, CEO of Replay Solutions.
The genesis of ReplayDIRECTOR: recording and playback for software
Managing the architectural complexity of simulating multi-level applications
Implementing the test and debug environment, and the results that follow
Handling the economic pressures of being a startup, especially in a down economy
The value of open source in an entrepreneurial setting
Scott Swigart: To start us off, take a second to introduce yourself and talk a little bit about Replay Solutions.
Jonathan Lindo: I’m one of the co-founders and CEO of Replay Solutions. We’re a software company that has developed a very unique technology that helps to automate a large portion of the software life cycle for anybody who is building or managing software applications.
My background is technical. I’ve been a software engineer for over 15 years, before founding this company in 2004. Throughout my own career, I’ve felt the pains and the need for a technology like ReplayDIRECTOR, which is our core product.
We’ve experienced a lot of the challenges and difficulties that folks are facing even more today, in terms of delivering high quality applications for the new types of environments in which people are now deploying applications.
Scott: Unpack that a little bit. Explain what ReplayDIRECTOR does and a little bit about how it does it.
Jonathan: Let me start off with a bit of a story to tell you how we got to where we are as a company. My co-founder, Jeff Daudel, who is our CTO, and I have worked together for over a decade, most recently at a company called Muse Corporation.
That company was building a very ambitious software project to deliver the next generation of the Internet, including a lot of new technologies like 3D graphics, multi-user browsing, spatialized audio, and video, in a 3D environment.
The idea was that companies like Paramount and Sony that have a lot of content could develop very rich 3D websites, while making them so simple from a user perspective that your mom could very easily access that content in a very compelling way.
Building that required a lot of different components. There was a highly distributed, heterogeneous back end, with a lot of databases talking to application servers.
What we found as we were trying to bring that product to market was that, with our large group of QA testers and beta testers, we were drowning in the sea of bugs and issues that were being reported. We were spending almost all of our time simply trying to reproduce and fix these problems that were being reported, as opposed to actually improving and shipping the product.
This was in the late ’90s, and TiVo had just started to become popular. Both Jeff and I were early adopters and big fans of that technology, and we still are. We both thought it would be great if we could take a similar concept to TiVo’s recording and replay capability to software development.
When a problem was experienced in the field or during testing, we would have a recording of it as it first occurred, which would allow us to easily trace back to the exact root cause. That was the genesis of the idea, and how it works is actually pretty simple. We basically hook into the software application as it’s running and record all of the sources of input that could potentially affect the software.
By doing so, we provide just enough data to allow the software to replay and actually reproduce, at the source code level, the exact root cause of any error. We reproduce the error in such a way that it’s actually debuggable. You can attach your debugger, like Visual Studio or Eclipse, step through the code, and see the root cause of the problem, right down to the responsible line of code.
Scott: It seems that there would be a lot of technical challenges to building that, in terms of capturing all those events, all those inputs, into the software, and being able to play it back. A number of things spring to mind.
Let’s say that a user logs into their online banking site, tries to pay a bill, and gets some kind of an error. You can’t literally just replay that transaction over and over, taking money out of their account and giving it to the power company each time. How do you address cases like that, where an error can’t be replayed in the production environment?
Jonathan: In your online banking example, a browser talks to a web server, and there’s also an application server, a database, and probably some authentication servers running. All of that interaction will result in a lot of different transactions taking place between the various components of your application.
That set of transactions speaks to the incredible complexity that now exists in application-deployment environments, which becomes even more pronounced with deployment into the very distributed environments associated with cloud infrastructures.
Our technology basically isolates each of those components as they’re running in the system. Let’s take the example of the application server. What we do is actually monitor the activity on that application server in a very lightweight, seamless way, and allow just those actions that occurred on the application server to be recorded and replayed.
The result of that approach is you can take the recording of the problem that we just described, an error in an online banking system, and you can bring that problem out of the actual production environment and put it onto a developer’s or a QA person’s workstation.
Without having to add access to the database, the authentication servers, or any of the other components, including the client itself, you can actually replay the problem and see it occur at a source code level. It’s not necessary to actually execute the transaction again.
When you replay, you’re actually running in a virtualized environment, so part of what we actually do is application virtualization. VMware and Microsoft are very popular for doing OS virtualization, and we bring that up the stack and virtualize at the application layer. That allows us to run software in a virtualized mode without impacting any of the surrounding components that existed when the recording took place.
Scott: Do I understand correctly, then, that you’re creating a virtualized test instance of the application server? When input comes in, the application server makes a call out to try to talk to the database and retrieve records. The replay is basically just replaying back the data that it got when it did that in the production environment.
From the app server’s perspective, it thinks it called to the database, and it thinks it got data back, but it’s really just getting a replay of that. It’s been transplanted into your test environment, but it still thinks that it’s in the production system and is behaving as though it were.
Jonathan: That’s a perfect way to describe it. We inject the same inputs back into the application and simulate the surrounding components such as the database and any clients that existed when that issue took place.
You can imagine that, if you had 1,000 clients connected to your application, one of the big benefits of a solution like ReplayDIRECTOR is that you don’t need to generate that load again. The replay will have captured all of that activity from those 1,000 clients, and you simply need to press “play.”
Scott: Do you run into issues when something needs to authenticate? My understanding is that some security systems are specifically designed not to let you replay a response of an access token, since that could allow a malicious user to intercept an authentication request as a means to spoof or hack the system.
I’m just kind of trying to poke around the edges, but do you occasionally run into some scenarios that you can’t replay, because of issues like that?
Jonathan: If you think about your earlier description, where the database is not required on replay, it’s the same for any other server, such as an authentication server. We are actually simulating and making the application think that there’s an authentication server present during replay.
When that request goes out to the authentication server to authenticate, our system will respond, in place of the authentication server, to say, “Yes, you’re OK. Keep going.” That’s what allows you to take that recording out of the environment in which it occurred and replay it anywhere.
Scott: I’m picturing little agents or something similar that do this lightweight monitoring. They are able to capture these inputs and outputs in your production environment, so you can take them back to the lab.
How does your hosted solution plug into that? That is, what’s the difference between doing this on premise versus hosted?
Jonathan: We’re really excited to have announced with Version 3.0 of ReplayDIRECTOR that we are now offering cloud hosting as part of our solution. What that means is, as you described, you’ve got agents that run on your application servers that are recording, either 24 by 7 or however you’d like to set that up.
The data those recordings generate is being sent to the ReplayDIRECTOR server, which can be run behind your firewall and is used to store and manage those recordings. It also provides a web-based interface that lets people share, collaborate, and access those recordings.
In the cloud-hosted solution we’re now offering, the ReplayDIRECTOR servers will run on our premises. We provide that as a hosted service, so the recordings will be stored securely on our servers. That’s the component that allows folks to get up and running really quickly, without having to download and install a server on their own.
Scott: I imagine that these recordings could result in extremely large amounts of data, depending on what’s flowing through the system and what’s required to service a request. Sometimes applications aren’t very efficient, and they’ll make a request to a database that returns a lot more data than it actually needs, and the application will parse through it, rather than correctly pushing the work to the back end.
Especially if you’re going to be pushing this data up to a hosted service, it seems like there must be some kind of tuning you can do to control what’s captured, how long it’s kept, and those sorts of things.
Jonathan: One of the nice things about our technology is that it’s very simple to get, configure, install, and start to use. It’s basically a very binary type, and there’s not a lot of tuning required. We see that as a very good thing.
We’re able to do that because the recording system itself is extremely lightweight. It can be run all the time, even in a production environment, and it’s there when you need it.
We’ve gone through a lot of research and development effort over the last six years, building up this technology so that it’s extremely lightweight and records as little data as humanly possible. The data it does record is actually compressed and encrypted as well, which further reduces the data size that’s generated.
How long the recordings are kept is also tunable. If you want to allocate 500 gigabytes of disk for your recordings, that’s often going to be enough to store more than 30 days worth of recordings. You can also tune it to keep a rolling window of the last seven days, for example.
Scott: Obviously, the primary scenario is that an error happens in production, or there’s a potential bug, and you need to track it down. Have you had people use it for other things? I could imagine, for example, that people could use it forensically to find out what happened when their web server got hacked.
Jonathan: There are really four areas where this technology has been used successfully. The first is in problem reproduction and resolution. The second is in security, as you mentioned, which allows people to go back and look at any security issue, as it occurred, with as much detail as they would like.
Third, this is a very powerful technology to analyze and resolve performance issues. Finally, it allows any kind of compliance-related issue to be verified, reproduced, and audited. We’ve had applications of our technology outside of problem resolution and with good success.
Scott: Without divulging anything about your customers, obviously, what can you tell us about the spectrum of uses for this technology, maybe from small startups to relatively massive implementations that require a lot more scale?
Jonathan: Let me give you an example of a very large software company that has a lot of products out in the field with thousands of customers. They had a product that was used on a daily basis by large software teams, and every couple of months, they’d get a report of a bug that was corrupting a database.
It was so bad that people would have to go back and actually restore the database from the last backup that they’d made. This was the kind of problem that they could never reproduce in their pre-production environment. They ended up making seven attempted fixes to this problem, and one after another, all seven failed in the field, and another problem report would come in with a corrupted database.
When they started using Replay, they captured the problem with a recording, which allowed them to reproduce the issue in literally about 10 minutes. All they had to do was to access the recording, press “play,” and watch the problem happen again. Their next fix nailed that bug, which it turned out had been in their system for over two years.
Scott: Talk a little bit about the business side of the company. These are interesting times, as they say, economically. What can you tell some of our readers who may aspire to starting a company about your lessons learned and that sort of thing?
Jonathan: In 2004, having identified this as a major pain point for us and for most people in our industry, we decided to found a company. We went out with a prototype of the technology, early on, and we started working with companies directly.
We were fortunate enough to be able to use some of our contacts to start working directly with companies before we even considered raising money or building up the company. At that time, it was just me and my business partner, Jeff.
We built up a small stable of pretty notable customers, including Microsoft, NVIDIA, and Electronic Arts. We were able to use those relationships to bootstrap the company and continue our research and development.
A lot of man years of research and development have gone into this technology, including about a dozen patents, two of which have now been issued. We’re very excited to have reached this point, but it certainly was a bit of a road.
In 2006, we had built the company up to a handful of folks. We had reached profitability and then decided to go and look for an opportunity to grow the company. We did so by partnering with two venture capital firms in Silicon Valley. One was Hummer Winblad, and the second was Partech.
With them, we were able to grow the company and really expand the breadth of our technology, to build a more ubiquitous solution that could be used by even bigger teams than in the past. We’ve continued that research, and we actually raised our second round of funding in 2008, with Sigma Partners and UV Partners.
Since then, the economy has really started to rebound, and we’re seeing the benefits of that. We had a record 2009, and we have started off very well in 2010. We’re hiring, we’ve expanded our office space, and we intend to meet or exceed our targets for the year.
We’re certainly seeing some good signs in the economy. People are coming back and looking at investing in technology that will help them be more efficient in the new environment in which we find ourselves.
People are looking for solutions to help them with the present complexity of the software environment, which includes cloud computing, and we’re one of the new players on the scene to really address those kinds of issues.
From a business standpoint, people are looking to be lean, mean, and efficient, but they are also looking to be strategic, and we’re benefiting from that as well. Some of our large customers in the financial services space are really taking that attitude, and it’s been good.
Scott: Looking back at the very beginning of your company, there are a couple of tensions that I think any startup faces. That’s especially true if you’re going after a lot of large enterprise customers, and I’d be interested to hear how you dealt with them.
The first of these problems I’ve seen is that a big company might be very interested in your technology, but they could be afraid that your little company won’t be around in a year. They may refuse to make a big investment in it that includes implementation, training people, and so on, because you’re simply not big enough yet.
The other common problem is that they may think the base technology is great, but your product may be missing one or more features that they really need. At that point, you have a conflict in staying true to your vision for the product, while at the same time servicing that kind of one-off feature requests. They may take your product in a direction that’s not strategically beneficial to your company as a whole.
Jonathan: There’s no question that we have felt those pressures during the course of our company’s lifetime. We’ve approached that first question, about convincing large organizations to make a financial and workflow investment, by building our software to be extremely seamless and lightweight.
That means that people don’t really have to make significant changes, if they have to make any at all, to the way they build and manage their software. We’re very complementary to the existing workflows, technologies, and tools that folks are using today.
We have painstakingly taken the approach to keep our technology as transparent as possible so it fits into existing workflows. That means that, if they did need to remove our technology from their workflow, it would not cause things to break.
Obviously, it would make their jobs more difficult not having Replay, but it’s not something that they’re betting the farm on, in terms of the way that they go about their daily work flows.
Your second point concerns how to react and adjust to pressure for custom development or feature requests that might not be strategic for our company. We’ve approached that, again, by building our technology to be lightweight and seamless, but also to have a low entry point.
By having a broad appeal, and by virtue of our technology being scalable and robust, we can afford to take on smaller customers, rather than just the $500,000 deals. Our low entry point in terms of price means that we win out on volume and don’t necessarily have to bend over backwards to win specific deals.
We take a long-term view in terms of how we engage with our customers. People might start off by deploying to their pre-production teams in one or two departments, and then they might get comfortable with it and grow with the product.
Scott: I’d also like to talk a bit about your approach of bootstrapping the company for a couple of years, and then going out and looking for venture funding. It seems to me that I have heard a lot of people talking about VC funding really having dried up in the 2007-2008 timeframe.
I think there are a lot of people out there who think that they can walk into a VC with a set of slides to show off a great idea and fairly easily get funding. What advice can you share about walking down that path of bootstrapping versus VC, as well as the right point to switch from one to the other?
Jonathan: Walking into a venture capital office with a slide deck is a very rough hill to climb, unless your last three companies were extraordinarily successful. It’s also the case that today, people can get very substantial infrastructures online with very little capital investment.
It is really a new world in terms of being able to build substantial modern technology, get it online, and start building a customer base or a community around it with very few resources. Then you can start thinking about venture capital or other forms of investments, or not.
There are options in 2010 that didn’t exist in 2004, in terms of your ability to go out with a small team and do incredible things. Our premise was to prove to ourselves that our idea was really viable before anything else. We did that by building up a base of about 15 customers that were really getting value from our technology.
Once we had done that, we felt like we had a proof of concept on which to build a profitable company, so we decided to see if we could expand at a more rapid pace. At some point, you do have to be concerned about other companies coming up and chasing down those technologies.
It is a bit of a balance, and I think today, VCs are more risk-averse then they have ever been before. It is a little bit more difficult to go and attract venture financing, but on the flipside, it is also much easier to do more with less.
I think it is a great time to be thinking like an entrepreneur, and if you have a great idea, go out there and try to make something happen. If you look at what has happened with the iPhone App Store, the Google Apps Store, and soon Microsoft’s Windows Mobile 7, there are many opportunities to go and build new technology and bring it to market.
Scott: The majority of people we talk to are involved with open source projects, or their business is wrapped around an open source project. There is the perception in some quarters that open source is trendy, and there is more interest in a product that has an open source component to it, versus something fully proprietary.
Obviously, you have been successful during a pretty tough economic time, so what is your view about the perceived necessity to have some piece of the puzzle be open?
Jonathan: I think open source makes a tremendous amount of sense for certain sectors, although I am not convinced that the problem of how to monetize successful open source projects has been completely solved.
I think that is an ongoing area of experimentation and research. There certainly have been some good successes with open source companies that have been acquired, but in terms of really generating a lot of revenue in that space, I think there is still a lot of work to be done.
Because our technology is so new, and there really isn’t a lot to compare it to out there, we have decided so far not to open source the core component, although we do open source pieces of it, such as our plug in.
That is certainly an area we are looking at constantly, and we are big fans of the open source community. We actually are providing our software free of charge to many open source groups, and certainly we will continue to do that and expand our engagement with the open source community. In terms of open sourcing our technology, I am still looking for some good business reasons and benefits to the company to do so.
Scott: Well, we have covered a lot of great ground. Are there any closing thoughts from your end?
Jonathan: It is an exciting time to be developing software and building new technologies. We like to think of ourselves as a great example of a company that is taking advantage of the new paradigm in software. Our technology, we think, is really helping folks to capitalize on that, so we’re hoping that people will check out ReplayDIRECTOR and see what it can do for them.
Scott: Great. Thanks for taking some time to chat.
Jonathan: Thank you.






