Re-Submissions Among Top Security Conferences: a Problem or a Problematic Solution?

I recently run into (yet another) discussion about the high number of papers that get re-submitted from one conference to the next in the system security field. The general feeling is that people tend to resubmit over and over the same paper, without ever improving it or making any change according to the reviewers comment - hoping that sooner or later they would simply “be lucky”.

If this is true (and we will discuss this later) then we should try to understand why researchers follow this bad practice.

The way I see it, the re-submission strategy seems to make sense for four important reasons:

  • First, resubmissions happen because the calendar allows them.
    The Top Four are (intentionally) perfectly aligned in a loop, with the notification of one conference preceding by only few days the deadline of the next. This encourages re-submissions and leaves no time for authors to address the comments they receive. Strangely, the very same people who complain about the re-submission problem make sure that the loop is preserved year after year (but maybe it is good that they do, as I will discuss later).

  • As many have already observed also in other CS fields, the peer review process in Computer Science is quite random.
    If you did not do it already, please stop and go read the really excellent post on the topic by Eric Price: http://blog.mrtz.org/2014/12/15/the-nips-experiment.html
    It is as sad as it is undeniable. Really bad papers are easily discarded and few really exceptional ones get in. For everything else in the middle… it is almost like tossing a coin.
    If we accept these premises, then the conclusion is quite obvious. The only way to increase the probability of winning a lottery is to buy more tickets, and the only way to increase the chances of getting a paper accepted is therefore to submit many papers and keep re-submitting them. However, one important consequence of this approach is that the more we resubmit, the more random the process becomes (because more papers means larger committees, more reviews to write, a higher probability of getting a non-confident reviewer, etc.).
    This is a dangerous positive feedback loop.

  • Well-written and well-organized papers rarely get very bad scores. So, it is not rare to see reviewers arguing for a paper to be rejected because it has many flaws - while still giving it a weak reject and sometimes even a borderline. Throw in a bit of randomness and a bad paper with no chances can be rejected with maybe one weak accept and two weak rejects.
    Unfortunately, this result sends a very misleading message to the authors - who often conclude that "you know what? At the end it was not too bad, if just one of the two weak rejects would have been slightly more positive, we could have made it".
    Ask around and most authors would tell you that their paper was rejected because of bad luck of reviewers who did not really understand the paper. But if everyone feels that their paper almost got accepted, then making substantial changes or targeting a lower tier conference does not make much sense.

  • Reviewers typically differ from one submission to the next and their comments are often confusing and sometimes contradictory. We all found ourselves in the situation in which we added one section to address some comments just to run into a different reviewer who complains about that part being irrelevant and ask us to remove it.
    Moreover, it is not clear if the extra work required to address all comments actually translates in better chances of getting the paper accepted at the next conference. If not, then again the best strategy is to limit changes to few cosmetic fixes and just play the game again.
    Anecdotal evidence seems to confirm this: over the year I saw it all, including papers going from early reject to best paper award.

I am no expert in game theory, but in an environment with a favorable calendar, mostly borderline or mildly negative reviews, and a sufficient amount of randomness in the reviewing process, a blind re-submission strategy sounds like a very good strategy.

If randomness and poor reviews are the problem, then re-submission is the (problematic) solution.

A Look at the Data

In the last couple of years I have been involved in all Top4 conferences as a reviewer. It was a lot of work (over 100 reviews per year, with no delegation) but it also gave me a good view over the papers submitted over time to these conferences. So, I decided to run some experiments to measure the size and impact of re-submissions.

I wrote some scripts to compare the titles and abstracts of submitted papers. If two titles match, well then it is a re-submission. If they do not, I compared the similarity of the abstracts (as list of words) and if there is a sufficient overlap.. well then again it is a re-submission. If the overlap is minimal (let’s say 10-20%) then I manually reviewed them to understand if they seemed to be the same paper (e.g., sometimes the tool had the same name, or the results were the same). It is not rocket science but the automated processes worked very well and I only found a handful of (likely) false matches in the low-similarity group.

Note
Possible Problems

I can see three possible sources of errors in the process:

  • My scripts would miss a re-submission if the authors change the name and completely rewrite 100% of the abstract. Which is possible but unlikely. And anyway in this case maybe the difference is large enough that it would not be correct to count it as a re-submission.

  • When a paper was re-submitted skipping two conferences, this does not necessarily means that the authors were working to produce an improved version. It may very well be possible that they just tried some other conference in the meantime (especially for crypto papers, whose authors have other top conferences outside the top system security ones, or who may be less interested in a conference like NDSS).

  • I did not have a copy of the abstract for the camera-ready versions (I could get them from the proceedings but it sounded too much work for the moment). So, to know which of the submitted papers were accepted, I could only compare the titles of the submitted papers with those appearing on the conference page. If authors completely changed the titles (small differences should be fine) for the camera-ready version, than I would not be able to match that with the re-submitted paper. I am sure it happens, I just hope it is not a very widespread phenomenon.

So, as always you should take the results with a grain of salt.

I applied this process to four conferences (Oakland 16, Usenix 16, CCS 16, and NDSS 17). + As I said there can be few errors here and there (so the number of re-submissions may be even higher), but I think these errors should not affect too much the global results.

Table 1. Resubmissions Stats
Security & Privacy 2016 Usenix Sec 2016 ACM CCS 2016 NDSS 2017 TOTAL

Submissions

400

463

831

423

2125

Resubmitted from previous four conf

28%

31.3%

26%

48.2%

32%

Rejected

345

391

700

357

1793

Resubmitted in the next four conf

43%

49.6%

36.3%

33%

40%

Acceptance rate for resubmissions

9.8%

13.8%

16.5%

16.1%

14.7%

Acceptance rate for new papers

15.3%

16.4%

16.5%

16.1%

16.1%

It took some time for me to digest these figures. First, numbers confirm that re-submissions (good or bad) are a large phenomenon that we cannot ignore.

In total, one third of the papers reviewed in the Top4 are re-submissions from previous Top4 conferences. And 40% of the papers we reject gets resubmitted over the next year. In particular, in a stunning 78% of these re-submissions cases, the authors sent the paper to the first available deadline (so, just few days after the reject notification). Interestingly, more than half of the re-submissions changed the paper title from one version to the next — often just few words apart, but sometimes to a completely new one that does not share anything with the previous title.

Second, and much more surprising to me, re-submitted papers do not have better chances of being accepted. Actually, they have a slightly lower acceptance rate. I’ll be honest, I was expecting the opposite: these papers were probably in the middle range already (so, not terrible ones). Second, one would expect them to get better over time. So? How can we explain these results? Even weirder, those papers that skipped a deadline had an even lower acceptance rate (even though in this case I have less data points).

I don’t know. Maybe since most re-submissions are close to the borderline area — where previous studies have found that acceptance if more random — small improvements are actually irrelevant. This is almost too sad to believe. Or maybe when reviewers notice that a paper is re-submitted multiple time they get a negative feeling that affect their decision to reject them again. If you have any other possible explanation, please let me know…

Finally, at Oakland 2017 authors who submitted more than three papers had to prepend the string "BULK SUBMISSION" before the title. So, I decided to test how many of those papers came from re-submissions. Strangely, less than 15% of the BULK papers came from the previous Top4 and the vast majority of re-submissions were not part of a bulk submission. This seems to show that not only the big group re-submit papers — everyone does.

Solving the Re-Submission Problem

I want to close this post with a short discussion on how this problem could be mitigated.

  • Early Rejects — Many conferences adopt a two-round review systems and send early-reject notifications if a paper does not reach the second phase. This helps reducing the load for reviewers and can give more time to authors to address the comments before the next deadline. It is unclear however how many re-submissions falls in the early reject category. And anyway, given the figures presented above, this does not seem to work at the moment to mitigate the problem.

  • Oakland Model — The entire post was based on the "old" single-deadline setup of the Top4. Starting this year, Oakland switched to a rolling monthly deadline, with the possibility to ask authors to revise the paper and resubmit (to the same reviewers). This should reduce the randomness of getting different reviewers every time. But again I am skeptical it will have a large impact on the re-submissions, as at the end the number of accepted papers remains constant. So, accepted papers will be better — but rejected papers will still be resubmitted.

  • Previous reviews — Several conferences have tried to ask authors to (voluntarily or mandatory) submit reviews obtained in previous submissions, along with a description of how they were addressed. I don’t have data on how many did so (if you are a chair and want to share those numbers, please do), but this could have a real impact on re-submissions. However, it is still unclear if it helps reducing the impact of a bad review, or if it is instead amplifying its effect. More data is needed to better understand this point.

The more I think about it, the more I see that this is a very complex phenomenon that includes many aspects, all somehow related to each other — but for which it is difficult to distinguish the causes from the consequences, and the problems from the solutions.

For instance, for years I thought that having the deadlines very close was a big mistake and one of the main reasons authors re-submitted without improving the papers. After seen the data above, and after having tried the new Oakland system (for which I was a very enthusiastic supporter), I am not sure anymore. I broke the loop submitting to Oakland, I got some poor reviews, and now the student has to wait six months to be able to resubmit (which in a 3-year Ph.D. program is kind of a big deal). So, maybe close deadlines served the purpose of mitigating poor quality reviews, by allowing the authors to re-try without delays. Good papers are bashed for wrong reasons all the time, and short re-submission delays make the decision.. if not better, at least more tolerable.

Some says that the only solutions is to accept more papers: get all the decent ones, even if some bad ones will slip through as well. It is an interesting idea which is worth trying. But again it is hard to foresee all possible consequences. A system that accepts all good papers but, let’s say, 10% of the bad ones would provide an incentive to authors to submit all their work (including the bad ones) to top conferences, as there are chances to get in (especially if you keep re-trying). So, this may actually result in an even large flooding of submissions to the Top4 — with all the annex problems.

All ideas are good until we try them. So, if you are chairing a conference, please experiment and share your experience so that others can benefit from your experience, avoid the same mistakes, and maybe eventually find a way that can improve the system.

And if you are thinking that it is what it is, we all know and play the game, and it is not worth the effort to try to change the system… well you are wrong. In one year alone, the best minds in our field spent over 1000 working days (assuming 3 reviews and a 4h review per paper) just reading and evaluating re-submitted papers!! This is huge. If we succeed in reducing this largely wasted effort, these researchers could invest their time on actually doing research (or in making better reviews for other papers).