In this interview we talk with Keith and David. In specific, we talk about:
- Background: a high-energy physics project at the South Pole
- Open source software for highly specialized uses
- Sharing code and architecture among research projects
- Systems reliability in a harsh environment
- Cyclical data analysis and system calibration
- Life at the South Pole
Sean Campbell: To get us started, could you each tell us about your background?
Keith: My background is in software development. I have a bachelor’s degree in mathematics and a master’s degree in computer science. I worked at Lawrence Berkeley National Laboratory as a grad student, and after I finished my master’s, I went off and worked in industry for a while, at a couple of dot-coms, didn’t make a million dollars, but did have some really good experiences and made some really good friends.
And then, I decided to go back to LBL, primarily because I enjoy working on science projects, and I knew that I’d have a much greater opportunity to do that, as well as to be involved in computer science research itself. This ,IceCube project is probably the sexiest project I’ll ever work on in my life–I’ve been able to go to the South Pole twice, and the opportunity to learn about the physics involved is great.
Dave: I also have a computer science background–I worked for the City of St. Paul for a while, then at some Silicon Valley companies, and then I went to work at the CS department at Berkeley; I’ve been in the university environment since then. After I worked at Berkeley for a few years, I worked with the Space Science Center here in Wisconsin, then gradually drifted into IceCube.
Sean: Give us a little bit of background on the IceCube project.
Keith: The IceCube neutrino detector is at the South Pole, buried deep within the two miles of ice there. The idea is that the very dark, clear ice allows the detection of neutrinos, which are sub-atomic particles that come from a wide variety of sources, many of which we don’t understand. Neutrinos aren’t affected by magnetic charges, so from the perspective of a neutrino, nothing is solid, because magnetic charge is what makes things solid.
Neutrinos are very interesting in the sense that they have come in a straight line from their source, which means that they carry a lot of information about where that source is. The tricky part is being able to detect something that passes through everything. The way this neutrino detector works is that when, by chance, a neutrino hits (I believe) the nucleus of an atom within the ice, there is a chain reaction that results in a little flash of light. That’s why the very dark and clear ice at the South Pole is the perfect place for building a neutrino detector.
We’re drilling about 80 holes, and then dropping a string of light detectors into each one. The holes are each about two and a half kilometers deep, the bottom kilometer of which has 60 DOMS (digital optical modules) in it, which detect light. The complete neutrino detector will be roughly a cubic kilometer of ice instrumented with these digital optical modules, one and a half to two and a half kilometers down beneath the South Pole. The digital optical modules will detect those little flashes of light that occur when neutrinos pass through the detector array.
The DOM is basically a basketball-sized sphere, the bottom half of which is a photo multiplier tube (that’s the actual eyeball), and the top half of which is custom built hardware. The hardware runs some custom software that basically digitizes that signal and sends it up to the surface, where we have the rest of the data-acquisition software running on a cluster of machines.
Sean: Tell us a little about the software you’re using.
Keith: The DOMs themselves have about four megabytes of memory that holds some firmware that we’ve written, which basically digitizes the information that the DOM sees and the stores it. Each string of DOMs is basically a little network, and at the top of the string is a DOM hub machine and what we call a “DOR card,” which is another piece of custom hardware.
It’s almost as if you have an array of peripheral devices inside of a PC, and the peripherals are one and a half to two and a half kilometers away, buried beneath the ice. The DOM hub runs a piece of software that we call the “string hub” that basically polls these DOMs, pulls data up from them, and stores it.
Dave: On top of the string hub, there are a couple layers of triggers. All these hits come up from the DOMs and get passed into the triggering system. The triggering system tries to differentiate between what’s just noise and what looks like it might be real information, and it identifies time windows where there’s information that might be worth looking at. They pass the time data on to the event builder, which polls the individual string hubs to get all the information in the relevant time window. It builds an event out of that information, which gets passed on to the next layer of data processing and eventually shipped up to the Northern Hemisphere.
Sean: Which parts of the system do you guys work on?
Keith: We work on the Data Acquisition software (DAQ) which consists of the event builder, the triggers, and the string hub. We build a data file out of a whole bunch off events, which gets handed off to another processing and filtering sub system, and from there, it gets shipped up north over a satellite link–about 30 gigabytes of data per day.
Scott Swigart: Talk a little bit about what you’re building on top of. What open source software was useful to you as you were putting this system together?
Keith: As much as possible, we use commodity, off the shelf hardware. Obviously, the DOMs and the DOR cards aren’t, but the DOM hubs and all the other hardware are. I believe it’s all Intel-based PC’s running Red Hat Linux.
I believe that choice was made for the sake of flexibility and not being locked into a commercial software vendor. All the DAQ software is actually written in Java. We also are using Python for a fair amount of the control, monitoring, and information logging.
There have been more than a couple of times where we’ve had problems, and it’s been tremendously valuable to look into the source code and figure out what the heck is going on. We’ve had a few problems where some of the tools haven’t worked the way we want them to, and we’ve been able to crack into the source, make a few changes, and keep going. I think I did send those changes back to the authors, although I’m not sure whether any have actually been incorporated back into the products’ next releases.
Sean: We find that open source is extremely common and popular in research, academia, and the kind of stuff that you guys are working on. Since this is not the only neutrino detector in the world, do you share code with other projects? How much of it is built from scratch–is there a little micro-community of open source around these really specialized research projects?
Keith: There really isn’t that much. I don’t think anyone else is using the software that we’re writing, although I don’t think any of us would have any objection to it.
Dave: Each of the projects are so different, and the infrastructures are so different, that there is not a lot of value in sharing stuff.
Dave: I would think that it would take a considerable effort to distill out the common elements in order to be able to develop, say, a library that was usable across different projects. Generally, we are just trying to get this thing to work, have it handle the load that it needs to handle, fix bugs that we find, and that sort of thing. At the moment, the extra effort it would take to develop some kind of a reusable set of components isn’t even on the horizon.
Sean: That makes sense, since these are such one-of-a-kind projects. Other than Linux, were there other open source applications or components that you found useful for this work?
Keith: In a previous version of the infrastructure, we did have a PostgreSQL database. We also used JBoss for a while, and there are other people in the project that are using MySQL. The data acquisition system is just one small part of the IceCube project. There is a lot of software that’s written for analysis.
The physicists are generally using something called ROOT, which actually is an example of, I believe, an open source project that is used across a lot of different science disciplines.
Dave: There is some sharing. For example, we’re having some problems with our triggering system, so we talked a little bit about how a couple of other projects do this stuff. In the future, we may try to use some of their ideas, although we aren’t going to use any of their code.
Keith: This architecture that we’ve described to you is based on systems that were designed by other people for previous projects. On the other hand, we’re mostly sharing ideas, not actual code, which is unfortunate. It would be great if someone was able to distill out some kind of generic data-acquisition tool kit. It probably would be very difficult.
Dave: And as far as the funding of these projects, we always have barely enough money to do what we need to do.
Sean: Let me ask you a question in a different area–reliability. What’s the frequency of the occurrence that you guys are tracking?
Keith: Well, that’s kind of a tricky question. You have to get into the physics of it, as there are different kinds of neutrinos, but right now with our 40-string detector, our event rate is about 700 hertz. That means we have 700 events a second, but that doesn’t mean there are 700 neutrinos, and as a matter of fact, we won’t actually know until the physicists do their analysis which, if any of those, were a neutrino and then whether they were of the type we are most interested in. The actual occurrence of finding those high energy neutrinos is on the order of maybe a handful a year.
Sean: Then obviously, uptime is a significant issue. Regardless of the fact that it’s not exactly the most pleasant environment to stack a bunch of computer systems, if they were to go down on the one day you were going to get a neutrino hit, you might have lost the equivalent of six months worth of opportunity.
Keith: You’re exactly right. And that happens actually. A few weeks ago, we were down for an upgrade or something, and we missed a piece of gamma ray burst. We actually got saved, because there was an auxiliary system that was still taking data, but, it was definitely an embarrassment that we were down when this astronomical event happened, and you can’t predict when that’s going to happen.
Sean: It seems like you guys have quite a balancing act to do, because on one hand, you can’t just send all of the data to North America, because there’s so much of it. And even after you do analysis to figure out what’s worth looking at, you still end up having 30 gigabytes a day to send.
Keith: Right–there was just a discussion this morning about writing all of the raw data to tape, which is going to generate about a terabyte a day. So, in the future, if someone discovers some new physics and they want to do some new triggering on it or some new filtering of it, they could theoretically use the data recorded. It’s an open question, though, whether anyone would actually go through 365 terabytes of data for each year of operation.
Just the physical aspect of spinning those tapes and getting them onto a disk will be no small task. Somebody made the point that 10 years from now, if we still have that data, we may have thumbdrives with exobytes on them or something like that, but somebody’s still going to have to put all those tapes in what will then be an archaic machine.
Still, some of the physicists in the room were making the argument that that we don’t know what we’re going to discover five years from now. There could be a whole new theory, and this data may confirm or rule it out.
Dave: And it’s not easy to just spin up a whole new detector just to answer a question. But on the other hand, it’s a huge amount of data that maybe no one will ever really look at. It’s also sitting around at the South Pole, so you have the problem of shipping it to the northern hemisphere.
Sean: Let me ask you a question in a different area. When people acquire commercial software, there’s a standard argument made that support and the availability to contact someone is part of the value.
You guys are the classic case at the opposite end. You’re not going to get an on-site support visit in Antarctica, right? So, you guys obviously have to leverage the community-based support pretty heavily, because that’s really the only link you have.
What have been your experiences in building this particular project? What were some of the highlights of the community interactions you found around certain open source efforts? Or what were some of the areas where you thought a project could improve?
Dave: I don’t think we’ve had a lot of need for support really, outside of needing to do bug fixes to little subsystems here and there.
Keith: Mostly we’ve just plodded along like everybody else because we’re writing all our own software, for the most part. The Operations guys, who are responsible for the hardware comprising the cluster down there, probably do have some support contracts with hardware manufacturers, though honestly, I don’t know much about that aspect of the project.
Sean: How much do you run into issues because you’re using your systems for data acquisition, which maybe they weren’t developed specifically for?
Dave: One thing that we’ve run into problems with due to the location is our tape backup system. The air is so dry there that there’s more static electricity, so the tape systems are failing at a much higher rate than expected just because things are so dry.
Keith: We also had a problem with disk failures. Believe it or not, we had a problem with keeping the machines cool enough at the South Pole!
The air is very thin down there. The natural elevation is about 9000 feet, and the effective elevation is more like 10 or 11.
Sean: So it’s cold, but it isn’t dense enough to really dissipate big quantities of heat.
Keith: That’s exactly it.
Sean: How much of the analysis has to happen early so it can feed back into changes you might make for better data acquisition?
Keith: It definitely does feed back in that way. There is another whole section of software for offline analysis. I believe it’s all C++ based because they are using the ROOT Toolkit, and it basically takes our output as its input.
Those tools have started to mature. We have just recently enlarged the detector, which we do every year during the Antarctic summer, which is November through February. Each year we add in more strings; last year we had 22 strings, and now we have 40 strings. In the same sense that our data acquisition system is maturing in order to be able to handle a larger detector, their offline software is maturing to be able to allow the physicists to do their analysis sooner.
From a physics perspective, the ultimate output is papers. [laughs] They want to be able to publish results as soon as possible.
Sean: Obviously, it’s an incredibly complex system. Huge amounts of data go through multiple layers of analysis and filtering, and you look at it and try to figure out whether there is anything important there. And if everything works perfectly, you’ll detect something like two actual neutrinos a year.
It seems to me that it would be very, very difficult to know for sure that the system is working properly and that you’re ready for those rare events. How do you engineer the system to provide high confidence?
Dave: Just an hour and a half ago, somebody was talking about checking the time system to make sure that things are calibrated correctly, so that we can be confident when we do get an event that we’re seeing what we think we’re seeing.
Keith: There is a technique that we call “flasher runs.” These DOMs can emit little flashes of light that we use to calibrate the detector itself. That calibration is obviously very important if you want to be able to reconstruct what the detector saw and be confident that you are finding meaningful events.
There are also purely theoretical modeling runs, where based on the current understanding of the physics, the researchers can construct a model of what they would expect the detector to see when a neutrino is detected. They run a simulation based on that model, and then they compare the outcome of the simulation to the actual data to be sure they correspond to one another.
Dave: They essentially have a mathematical model of the way IceCube should work, and they compare it against the actual model and make sure that things are coordinated.
Keith: Right. Those are two ways of checking outcomes in order to be confident that what the detector is seeing is believable in real physics, and I’m sure there are others.
Dave: People are also looking at the output, from all the way down at the DOM level to all the way up at the other end, making sure that it’s all reasonable, and that based on very detailed analysis, verifying that the data we’re getting is usable.
Keith: There is also a certain amount of healthy finger pointing among researchers, where someone may challenge someone else by saying, “Hey, you’re doing this wrong.” The person being challenged then has to either fix the issue or prove that they are actually doing it correctly. Hopefully, all of that amounts to healthy competition and challenging among people that results in good physics.
Sean: Well, we’re getting kind of close to wrapping things up, and it would be a shame not to ask you something about what it’s actually like to be in Antarctica. For instance, when you got down there, what surprised you?
Keith: I guess the thing that surprised me most is that the sun never sets when you are down there in the summer time–it just circles over your head. I went to bed one night at midnight and my shadow was in front of me as I walked in the door. When I woke up the next morning at 9:00, my shadow was in front of me as I walked out the door.
Dave: That was a good thing for me, because I generally worked with the satellite, which was the graveyard shift. The satellite came up about 10:00 pm and went down at about 8:00 am, so that’s when I worked; since it was light out, it wasn’t really a big hassle.
Keith: Of course, you always want to go out and get photos of yourself standing next to the South Pole itself. It’s one of those things that you can do any time, because the light is always going to be pretty good since the sun is always up. It’s amazingly, starkly beautiful down there–just white and perfectly flat everywhere, with this deep, deep blue in the sky.
Dave: Your view is uninterrupted all the way to the horizon.
Keith: Yeah. I got the chance to go out on what they call the Snow Stakes Run, on a little snow-cat-like thing. We got a couple of miles away from the station, and it was pretty scary–perfectly empty, and perfectly silent.
Dave: It’s really, really cold down here.
Keith: But it’s the dryness that is the most difficult to acclimate to, I would say, believe it or not. You can deal with cold by putting on more clothes or staying inside. The altitude is a little bit challenging, but after a couple of days, you acclimate to it. On the other hand, the incredible dryness makes it so you just have to keep drinking water constantly.
Dave: That’s of course good because of the effect of the altitude. For the first couple of days, they give you altitude-fitness medicine, which makes you need to urinate more, so you need to drink even more. So, you’re up three or four times a night having to go to the bathroom.
And what’s worse is that when I first got there, I was staying out at what’s called Summer Camp, which is basically like an Army barracks. There is no indoor plumbing, and you have to go to a different building to go to the bathroom. So, in the middle of the night, a couple times a night, you have to walk out in the 30-degrees-below-zero weather to go to the bathroom.
Keith: You start to challenge yourself. [laughter] Do I actually need to put on my overalls and parka to walk the 30 feet?
Dave: And after a few weeks, you decide you’re a tough guy, and you just need a long sleeve shirt and some shoes.
Keith: You can stay warm for 30 feet if you don’t stop.
Keith: There was one time when I thought I was really tough, because I walked over to the bathroom with just my jeans and a long sleeved shirt on. When I got there, I saw one of the drillers–these are the guys who work outside all day long and stay there for three months at a time. He was standing there in the bathroom, which means he must have walked there as well, in shorts and flip flops. I just looked at him and said, “Man, that’s hardcore.”
Dave: We didn’t do it, but there’s also a thing called the 300 Degree Club. There’s a sauna down there, and the winter-overs (who keep the station going during the winter), when it gets down to 100 degrees below zero, they get the sauna up to 200 degrees and hang out in it for a while. Then they all go out naked; run around the South Pole, and then come back in.
Sean: So a 300 degree transition in temperature?
Keith: There aren’t very many members of the 300 Club.
Sean: Well, that’s a perfect note to finish on. Thanks for taking the time–this has been great.