Curious pasting in Mail.app

I just discovered something curious. I accidentally pasted when I meant to undo in Apple Mail, and it took the text on the clipboard and created a new mail message with the clipboard text as its contents. Not a new message ready to send, like if I had dragged the text to Mail's icon, but a new unread message in my mailbox that had the text as its body and no headers.

"That's strange," I thought, so I pasted again, into TextEdit, to see if I'd somehow copied a mail message or something - no, just text. I then added a Subject: and From: header line, copied and pasted again. The same thing, only this time with those headers. It only seems to work with plain text pasteboard contents, and I couldn't get it to work reliably in every kind of mail folder, but a local mailbox seems to work most consistently.

I feel like I've found Mail's vestigial tail - this is mostly harmless behavior, but a little confusing, and I can't really think of any use for it. I'm curious how it got in there.

What's hot in CS

Today, a group of graduating PhD students in our department met up to brief each other on what's new and hot in their respective fields, to remind each other of what's going on outside their respective specialties. The idea is that when interviewing for jobs, you have to hold up your end of a conversation with professors outside your specialty, and it helps to know a bit about their field.

To quote one professor in our department, "...there is a special circle of hell reserved for grad students interviewing for jobs who are unable to answer questions of the form, 'Oh, you're from UCSD. What's Professor so-and-so up to these days?'."

It took about six hours to get through talks from students working on Architecture, Bioinformatics, Systems, Graphics, Vision, Databases, Security, VLSI and more, and I'm not going to try to repeat any of the details, because frankly I'm numb. I will say that it was a great idea, and if you're a grad student and your department doesn't do something like this, you should start a tradition.

I will mention one thing: automatic detection and adaptation to network attacks is so hot right now.

Talk: Bart Miller and Dyninst

In a recent largescale systems seminar*, we had Bart Miller from Wisconsin talk about some of the upcoming work on DynInst. DynInst is an API for runtime code patching, which lets you do things like attach to a running program and insert your own code around every network call, or replace procedures with your own versions. You can even do something as insane as calling a function every time the instrumented program accesses memory. (We do roughly that here at PMaC with MetaSim.)

Bart talked about some of the challenges they've had to face with DynInst and the directions they're planning to take it in the future. The major news is that they're planning to improve support for binary rewriting, in which you use the same interface to instrument an object file and produce a new executable instead of just doing it in memory. Also, they are planning on breaking it up into a few smaller libraries so you don't have to link everything in if you're building a tool that doesn't need all of DynInst. These are both good news for users.

He discussed some interesting applications of DynInst (such as trapping and removing calls to license checking code), and some odd situations that have driven development, like users who needed to instrument binaries that were hundreds of megabytes large (not with linked libraries, just the one file). He also highlighted some cases where DynInst really shines, such as needing to instrument a program that you can't even relink, due to lack of source code access or simple overwhelming makefile confusion. The facility for removing instrumentation led to a clever code coverage tool that removed instrumentation on a block after the block was touched once, leading to a really impressive speedup.

They've also used it to observe viruses without allowing them to write to disk, including nifty tricks like waiting until the virus uncompresses itself, then saving the uncompressed virus for later analysis. The extensive work they've done on binary analysis is important here, because viruses don't really come with symbol tables for handy debugging.

I thought that the most interesting part of the talk was about the challenges they've addressed through the course of the project. For instance, it is surprising how often with production compilers, that the symbol tables contain entries which are totally bogus. An example he used was that the function size information in most symbol tables is never right. Few tools pay attention to this, so it goes unfixed. One reason for this inaccuracy is another reason to consider using DynInst or something like it to build program analysis tools - object code layout is getting pretty confusing, and they've done the hard work of analysis already. For instance, noncontiguous functions are common. Apparently that's rampant in Microsoft's products, due to optimizations that reorder hot basic-blocks. Other weird code arrangements are common, including functions sharing object code. Compilers sometimes appear to be active adversaries to program analysis tools, and many tools in common use are making assumptions about code layout that are increasingly less likely to be correct.

I asked how symbol table information needs to be improved to let tools keep up, and what more information one needs to get from the compiler, and his response was that expecting many-to-many relationships for mapping code to source is very important if your tools need to deal with real code.

Thanks to Bart Miller for the talk - any errors above are mine, the ideas and hard work described are all theirs!

*I had wanted to post about it right away, but I didn't get to it until two weeks later.

State of the Union

I completely missed this year's State of the Union address, but was pleased to see this quote from the speech:

First, I propose to double the federal commitment to the most critical basic research programs in the physical sciences over the next 10 years. This funding will support the work of America's most creative minds as they explore promising areas such as nanotechnology, supercomputing, and alternative energy sources.

Having supercomputing named specifically by the President ought to be a nice thing to be able to point to when selling our field. It happens less and less these days, but I've definitely had people ask me if supercomputing was a dead field - the answer is a resounding no, and this just adds to the list of indicators, including the DARPA HPCS program.

Yojimbo, hit and miss

A few people have commented about Yojimbo, including Brent, who gives it a place in his dock. His post has some good points in the comments thread. I've already paid for VoodooPad, and put a lot of my brain in there, so I'm not moving my notes anywhere soon. This is a good point to note if you're thinking of wading into the note-taking app market - if I can't move my data into your app, it's unlikely that I"m going to bother using it.

It's nice to see a real app shipped using Core Data - although the data model appears to be pretty simple. I wonder if they hit any snags developing it? I'd be interested to hear what kind of effort they put into it.

I have a couple of quick nits to pick, in case anyone cares: If you have a three-pane interface, the detail pane has to scroll when I hit the space bar. Especially if it displays web content. Let me say that more clearly: If you display web content, use different key shortcuts than Safari at your own risk.

There's a useful info inspector for the items, but Cmd-i doesn't bring it up - it is still trying to italicize something, even if I'm not selecting text. An issue of fit-and-finish I was surprised to see in a BareBones app.

Finally, it's simple and elegant - it gets the important things right, but I don't know if it's really solving a problem that many people have - casual users can store passwords and bookmarks already, and serious researchers have more powerful tools. Maybe the biggest missed opportunity is that a having a single place for all of this data would be a great start towards making it available to other programs, which I think is where the next big leap in computing experience is - think a combination of the iLife media browsers and bookmark/note taking apps, along with a bit of Onlife. That's what I think the future tastes like...

PLDI Papers I'm interested in, part one

I mentioned that I'd post about some of the papers I found interesting from this year's PLDI conference. Disclaimer: for the most part this is based on reading the abstracts only, so this shouldn't be considered a thorough review.

Session one is Transactions. I will probably look through these, especially the first paper, "The Atomos Transactional Programming Language" [1] from Stanford, because transactional memory and processing seems to be a consensus pick for the next big thing, and Burton Smith once told me that languages using transactional memory and invariants with respect to state are his bet for what can solve the parallel programming problem. (What problem? It's too hard to write good parallel code.) So, I want to see what a transactional language looks like.

There's a paper in the Compilers session that looks like a cool idea for improving analysis - "A Framework for Unrestricted Whole-Program Optimization" [2]. The abstract says they have a way for intra-procedural passes to work on arbitrary subgraphs of the program, so they're not just limited by procedural boundaries, and don't have to rely on inlining to optimize across calls. I'm curious what languages it supports, and how the scheme would work with dynamic languages.

A paper about dynamic software updating, "Practical Dynamic Software updating for C" [3] (project link) is also interesting, because it seems like a step towards the way things should work. Essentially, they compile a program so that it can be easily updated without stopping it. They do it in a way that doesn't violate type-safety and sounds reasonably efficient. It reminds me of Apple's ZeroLink and Fix & Continue (note that those aren't the first examples of such technology), and I'm curious how similar it is. Certainly I don't think Fix & Continue tries to guarantee type-safety.

The parallelism session should be interesting, and I'm most curious to see an abstract for "Shared Memory Programming for Large Scale Machines" [4], I can't tell from the title if they are introducing a new language or measuring an existing technique. I have a note to myself somewhere to look for a full copy of that paper.

Power has been a big deal in HPC and mobile devices for a while, and now it's everyone's problem, so "Reducing NoC Energy Consumption Through Compiler-Directed Channel Voltage Scaling" [5] caught my eye. I'm always interested to learn about power usage effects of different kinds of code, since I have found it to be satisfyingly unintuitive at times. (Maybe I should've taken more EE classes!) Also, this is a paper from Penn State, and I'm curious what research they've got going on back at my alma mater.

I'll probably read everything in the Runtime Optimization and Profiling session, but "Online Performance Auditing: Using Hot Optimizations Without Getting Burned" [6] is particularly interesting, since I know Brad Calder and his students do really good work, and I honestly didn't know what Jeremy was up to. I should probably be more social around the department. (These guys are at UCSD)

OK, I'm not out of interesting papers, but I'm going to stop here for now. Check out the program, let me know what you think is cool - am I missing something really great?

References

[1] "The Atomos Transactional Programming Language" Brian D. Carlstrom, JaeWoong Chung, Austen McDonald, Hassan Chafi, Christos Kozyrakis and Kunle Olukotun.

[2] "A Framework for Unrestricted Whole-Program Optimization" Spyridon Triantafyllis, Matthew J. Bridges, Easwaran Raman, Guilherme Ottoni, and David I. August

[3] "Practical Dynamic Software Updating for C" Iulian Neamtiu, Michael Hicks, Gareth Stoyle and Manuel Oriol

[4] "Shared Memory Programming for Large Scale Machines" Christopher Barton, Calin Cascaval, Siddhartha Chatterjee, George Almasi, Yili Zheng, Montse Farreras, Jose Amaral

[5] "Reducing NoC Energy Consumption Through Compiler-Directed Channel Voltage Scaling" Guangyu Chen, Feihui Li, Mahmut Kandemir, Mary Irwin

[6] "Online Performance Auditing: Using Hot Optimizations Without Getting Burned" Jeremy Lau, Matthew Arnold, Michael Hind, Brad Calder

PLDI 2006 Papers

The technical program for PLDI 2006 is out now - there are certainly a lot of interesting papers in there. I'm looking through them now and will probably comment on a few of the ones I think are cool in another post.

PLDI is traditionally a very competitive conference with an emphasis on experimental results, and this year they received 169 submissions and accepted 36. PLDI stands for "Programming Language Design and Implementation", and covers compilers, languages and runtime systems.

There are some interesting workshops co-located with PLDI this year: a Workshop on Transactional Memory Workloads (WTW), a Workshop on Programming Languages and Analysis for Security (PLAS), and the first ACM SIGPLAN Workshop on Languages, Compilers, and Hardware Support for Transactional Computing (TRANSACT). Does it sound like transactional computing is hot these days? Yes it does...

Update: for historical reference, this year's 21% acceptance rate puts it right at the average, according to the ACM's data from 1995-2003.

On Reviewing

Like most students, I've been asked to review papers in my area (and a few that were pretty far outside it), and I always try to do a good job - this is definitely a golden-rule situation. If I don't take it seriously, I am absolutely convinced that karma will get me in the end, denying a crucial publication that could have pushed me over the edge to tenure.

I've also been lucky enough to have the fascinating experience of helping out with the Program Committee of a major conference, something students don't usually get to do. It is the kind of experience that really gives you perspective, and it's harder to get upset about disappointing results since then. Important decisions often come down to the quality of the reviewers and practical constraints - for instance, you may have space for 12 papers in the area, and you might be looking at a paper has two 'strong accept' reviews and one 'weak', but is pretty good. It seems like a borderline paper that might get in, right? But there are probably 20 others that got three 'strong accept' reviews - this paper has no practical chance unless someone champions it and the 'weak accept' reviewer wasn't very convincing.

The point of that little anecdote was that every review counts, even student reviews, and good reviews make the program committee's job a lot easier.

Off the top of my head, here are a few rules of thumb to reviewing:

  • Read the whole thing. It's only fair.
  • No matter what the questions on the review form say, always include a summary in your own words of the main point and contribution. This is really helpful to the author to put your other comments in perspective.
  • Don't be a wimp. If you mean 'reject', say so. If you use 'weak accept', explain why.
  • If there's space to add comments to the program committee, use it. Especially if you could be convinced to change your opinion of the paper. That can be useful if another reviewer had a very different opinion and the committee needs to reconcile them.
  • You're reviewing for a specific venue - if the paper is good, but you can think of a better place for it, say that and name that place - maybe the authors won't have thought of it, and at least it'll soften the blow a bit if it doesn't make it.
  • Take the time to scan the references - if they cite their own or similar work, check it out. This could be the only way you can answer the novelty question - how else will you know if they wrote the same paper six months ago and only added one result just to get to go on a nice trip?
  • Be honest about your expertise - it can help with the decisions, and it's hard to tell if you were dismissive because the paper was crap or because you don't understand its importance.

Does anyone else have a good tip for reviewing? Let me know in the comments.

Focus

Maybe it's a little dramatic to think of it this way, but it has seemed like I have two computing personalities - the one that writes here about Macs, user-app programming and interfaces, goes to WWDC and hangs out with indie developers, and then the other one that actually gets paid - a Ph.D. candidate in Computer Science who works on compilers, performance tools and high-performance computing at UCSD and SDSC. OK, I don't get paid much, but that's who I am in real life.

In order to get real work done, I've had to cut back on the first guy, shelving a few projects I would love to release, and dropping out of sight for months at a time on the BibDesk project, which doesn't seem to miss me, really.

What this means is that I've been pretty silent here lately, which I think is a shame, because I'm hugely vain and love attention. And yet I don't post personal details. This is evidence that I am complex and fascinating. Nevertheless, you really have got to hear what I've got to say.

In order to help you with that, I'm going to start posting about research topics, both my own and good papers I read or talks I go to, and hopefully some of it will be interesting. I have no idea how dedicated I will be to this, and it could get touchy - don't expect anything too controversial, since I do want to get hired and like a fool, I used my full name as the domain for my blog.

Coming up next - some thoughts on peer review.

Universal I-Search

There are a couple minor improvements in the pipeline, but I wanted to get a universal binary version of the I-Search plugin out before you noticed that it wasn't universal already. It's the same as the last version, just twice as fat. Get it here.

Next up for this project is to move it to the stalled leverage project.