Women in the Security Circus

I am no expert in gender bias or gender discrimination, but it is not a secret that the system security community is male-dominated and does not have a good reputation for being a friendly environment for women.

There are tons of great initiative and groups to support women in cybersecurity and many many people more knowledgeable than me on the topic. But the Security Circus contains a lot of data - so I though I could use that to provide some actual measurement of the extent of the problem in our community.

Let’s go.

Base Rate for Comparison

Let’s start by getting some general figures to put our results into context.

Already at the bachelor level, computer science is doing particularly bad. Pretty much everywhere in the world the percentage of women is below 20% and this value is considerably less than in other STEM fields. For instance, the average for CS in Europe is 19.8% (vs 26.7% in Engineering and 54.8% in Natural Sciences and Mathematics). source

However, we can assume most researchers who publish in the Top4 have a Ph.D., or are in the process of getting one. Here I could not find figures worldwide, but in the US, roughly 20% of the Ph.D. degrees in Computer Science are awarded to women, and this has been more or less stable over the past 20 years (source).

A recent study published in 2020 in the Communication of ACM ("A Bibliometric Approach for Detecting the Gender Gap in Computer Science", Mattauch et al.) measured the gender distribution among the authors of 19 CS conferences (the only one in security was NDSS) between 2012 and 2017. The authors found that only 9.6% of the authors were women. However, this value could be inaccurate as the authors decided to remove from their analysis all Asian names, as they were too difficult to classify automatically.

Dataset

For this study, I took all papers published in the Top4 System security conferences (IEEE S&P, Usenix Security, NDSS, ACM CCS) between 2000 and 2020. This includes 4299 papers written by 7524 distinct researchers (if you want to know more about the data, please check the Security Circus page).

I then used Namsor, a popular service that provides APIs to infer the gender of a given name. Since the first name alone is not very indicative, I always considered the pair firstname-lastname. The service returns a likelyGender as well as a probability the provides an indication of the confidence of their classifier. For roughly 40% of the authors, the confidence was above 90%. But still for a stunning 12% of the authors the confidence was below 55% (so the algorithm was giving almost 50-50 percent chances).

As already mentioned by Mattauch et al., the problem affected mostly asian names which are often gender-neutral. But since those are a very large part of the dataset I did not want to simply remove them. Thus.. I started a very long and super-tedious process of manual validation of roughly 2K names. The process (not very scientific) was the following. I googled "name lastname" + security on google images. If all results were consistent with the result returned by Namsor I moved on. Otherwise, if there was a single researcher profile returned, I took that as a result. If not, I extracted the most recent affiliation of the researcher from my DB and added that to the query. If more profiles were returned I tried to briefly check their pages to take a decision. The process was difficult as sometimes two different but perfectly reasonable results existed. For instance, Xin Liu (female professor at UCDavis) published a paper at NDSS 08 while the homonym Xin Liu (male Ph.D. student at Univ. of Maryland) co-authored one at CCS 21. It was also error prone and not always successful, as sometimes the query returned no results I could use to take a decision.

Overall, I manually modified 654 entries, 142 from Male to Female and 512 from Female to Male.

The process toke many many evenings, and I am sure there are still quite a few misclassification in the system, but overall I am confident I addressed most of the mistakes. Or at least I hope so…

Results

Overall, women accounted for only 13.8% (1036 out of 7524) of all researchers in the dataset.

Out of 4299 publications, 1502 (34.9%) had a woman as co-author. But only 23 papers over 21 years had only women authors!!

Breakdown Per Year

./images/yearly_plot.png

The yearly trend seems to suggest that after 2017 the number of women in our conferences increased significantly.

I am optimistic and I like to believe this is in fact the case.

But for those of you more pessimistic, another possible explanation is that the number of authors is increasing exponentially over the years, and over the past five years China became the second country for number of papers (it was almost negligible until 2010). And since most of the errors in the gender classification are on asian names, and prevalently from males classified as females.. small errors in the data could result in an inflation on the women percentage over the past 3-4 years. Just saying.

Breakdown Per Conference

IEEE S&P	13.3%
Usenix Security	12.3
NDSS	12.8
ACM CCS	13.1

While I was not expecting to find much here.. there is a slightly higher percentage of women in conferences that focus less on system topics - wrt other security topics, such as applied crypto.

But I don’t know.. the difference is small and it could just be a random fluctuation.

Breakdown per Country of Affiliation

This shows the percentage of women grouped by country (of affiliation, not origin) for those countries with > 50 papers:

./images/countries_plot.png

Numbers here vary quite a lot, ranging from 6.7% in Austria to 21.9% of women in Canadian institutions.

Why? I really wish I knew…

Breakdown per Number of Papers

Finally, I divided the data according to the number of papers published by each researcher in the Top4 and re-computed the percentage of women in each group:

10+ papers	6.5%
5-9 papers	11.8%
2-4 papers	12.4%
1 paper	15.1%

The data shows that the lack of women is much more serious at the top of the scale, where we find mostly professors who lead large groups, than at the bottom, where we find mainly young students !!

Take Aways

The number of women in the system security field (as observed from its top four conferences) is lower than in CS in general. This is bad, and we should try to fix it. Then we should work on attracting more women in CS. Other STEM fields already achieved parity, so it must be possible.
The percentage of women is higher at the bottom range of the publication count and decreases sharply going up. This probably means that female students are less likely to become professors. This is a bias we should investigate and correct.
Some countries seem to to better than others. Is there a reason? If so, we should copy what they are doing.