Unorthodox Intelligence & Personality Research: Gender&Personality
Nearly 90% of the time, 4 personality traits can determine someone's sex.
This email/post is a combination of a non-technical discussion of gender and personality in light of our research and a technical paper of our results. The general discussion will be first and the technical paper will be below.
Non-Technical Overview Of Our Results
We found that women tend to be less intuitive (meaning less interested in abstract ideas), more agreeable, more neurotic (more likely to experience negative emotions like anxiety), and more judging (more likely to plan rather than improvise) than men. We also found that men and women can be fairly cleanly divided by these traits put together, with fairly few women having male personalities or vice versa. We also found, for now very tentatively, that personality is more related to neurological sex (a judgement call we made) and to biological sex than it is to gender identity or presentation.
Why Differences Between Men and Women Are Larger Than The Data (Naively) Suggests
In discussing differences between men and women, it's common to note how much overlap there is on many traits. "Sure", the argument goes, "men are less agreeable than women on average" (to use an example from our research), "but there's so much overlap that the general trend doesn't tell you much about specific examples." This fact, while sometimes overstated, often isn't wholly wrong. Obviously some traits, personality and otherwise, will have much more or less overlap, but a quarter of people in our dataset have agreeableness values typical of the opposite sex. That's certainly still a trend, but it's one with many exceptions.
This argument, however, only considers single traits. In our research and others', there's significant overlap between men and women on every one of the MBTI and Big 5 traits. But when considering several personality traits in aggregate, that overlap shrinks significantly: in our dataset, only an eighth of people have aggregate personalities atypical of their sex. Using better tests of personality, this gap might shrink even further.
It's worth noting that some people view mentioning or emphasizing differences between men and women as a slight against one, or a statement about a supposed inferiority of one sex. This is not necessarily true, and we do ourselves a massive disservice by assuming that pointing out differences is implying a value judgment. Men and women are different in ways deeper than some care to recognize, but that need not be to the detriment of either. Better understanding sex differences is a necessary step in learning how we can best make use of each sex's strengths, and how they can work together to compensate for each other's weaknesses.
Technical Paper
Abstract
We found that a logistic predictor using 4 personality traits from personality tests (Intuitiveness and Judgingness from 16 Personalities and Agreeableness and Neuroticism from the Big 5) is able to correctly predict suspected neurological sex (hereafter NS) in 87.5% of cases (21/24). For this model, we found a 2.2σ significance for those traits correlating with NS. We also found a 2.4σ significance that a 2-trait (Intuitiveness and Agreeableness) predictor model correlates with NS.
Testing Methods
We asked participants to self-report results from two online personality tests: the 16 Personalities test, similar to the MBTI, and the Big 5 test. We used a convenience sample of 24 people, 13 neurological females and 11 neurological males.
Mathematical Methods
We performed a logistic regression to predict NS as a function of Intuitiveness, Judgingness, Agreeableness, and Neuroticism. The predictor is of the form:
where iN, Ag, Ne, and Ju are the four personality traits (normalized from the original scale of 0-100 to be on a scale of 0-1), NSp is the NS value predicted, and a,b,c,d&f are constants. This predictor is trying to fit the model to values of 1 for male and 0 for female.
We used maximum likelihood estimation to perform the regression, which is equivalent to finding the predictor coefficients (a,b,c,d&f) that minimize the total surprisal across all data points:
For ease of calculation, we used this equivalent expression:
The optimization was performed numerically using Python.
To test significance, we used a permutation test on NS. We repeated the regression at least 10,000 times with the NS values randomly shuffled. This allows us to determine whether the true arrangement of NS values actually produces a better predictor than you expect to be possible with random data.
Results
All NSp values above .5 indicate that the model suggests the person is more likely male; below .5 indicates the person is more likely female. As shown in fig. 1, the model correctly predicts 21 out of 24 people in our dataset. This gives an 87.5% correct prediction rate.
Figure 1: Predicted NS (NSp) as a function of NS for all people in our dataset. The line is at y=.5; those above the line are predicted to be male, and those below female. If the prediction were perfect, we'd expect all points to be in either the bottom-left or top-right. The two points in the bottom-right and the one in the top-left are the only incorrect predictions of the model.
The coefficients in the optimal model are 7.387506 (iN), -7.131535 (Ag), -5.329079 (Ne), -3.551129 (Ju), and 3.73431669 (constant). This means that those who are more intuitive, less agreeable, less neurotic, and less judging are more likely to be predicted as male, with the strongest effects from iN and Ag.
The NSp values have a hybrid tau and weighted hybrid tau both of .41 with the NS values. Taking a step function of the NSp values for a fairer comparison (any values above .5 become 1; any below .5 become 0), h.t. and w.h.t. both become .61.
The correct arrangement of NS values performed better than 97.55% of random arrangements, giving a p-value of 2.45% and significance of 2.2σ for the hypothesis of NS being correlated with these personality traits.
A 2-trait model (iN and Ag) only correctly predicted 17/24 people (71%). This worse performance is also indicated in optimizing to a total surprisal value of 11.67087, compared to 9.748109 for the 4-trait model. Though it fared much worse as a predictor, its correlation with gender had higher significance: the correct NS arrangement performed better than 98.58% of random arrangements, giving a p-value of 1.42% and a significance of 2.4σ.
We also tried 1-, 3- and 5-trait models. The aforementioned 2-trait model produced the highest significance. The strongly suspected reason for the 2-trait model having a higher significance is that, while four traits form a much stronger predictor, it's also easier to fit four traits to random data.
Compared to the 4-trait model, 5-trait models failed to give meaningfully better predictions, but had drastically reduced significance (around 94%), raising potential concerns about overfitting. Given these tradeoffs, the 4-trait model is generally preferable given the size of our dataset.
A Note On Our Use Of Neurological Sex
For our analysis, we used suspected neurological sex. In cases where biological sex and gender identity/presentation (everyone in our dataset has matching identity and presentation, so those terms will be used interchangeably) align, this is equivalent to those. In cases where biological sex and identity differ, we made a consensus judgment call about the person's neurological sex. (In order to avoid inadvertently assigning neurological sex to better fit the model, we made these calls prior to seeing the predictors.)
One might wonder whether this judgment call corresponds to something real. This question could be answered by assessing the quality of the NS predictor compared to the others.
Preliminarily, the answer seems to be yes. The NS predictor correctly predicts 21/24 NS values, compared with the biological sex predictor correctly predicting 20 biological sex values, and 19 for gender identity. Additionally, the predictor significance is .9755 for NS, compared to .9711 for biological sex and .8521 for gender identity. Total surprisal breaks from this pattern, but only slightly: it's 9.748109 for NS, 9.538800 for biological sex, and 12.493885 for identity. NS forms the overall best predictor, hence our choice to use it for analysis.
Importantly, we do not yet have enough data to confidently make claims about which of these three would form the best predictor for people not in our dataset, or answer the related questions of which capture something real. This question, however, is very worthy of future study.
Possible Future Research On This Subject
Aside from attempting to replicate a larger version of this study and compare NS to other measures once we have more data, we have several other avenues we hope to investigate in the near future. We intend to investigate used staged regressions to allow us to incorporate more personality variables without potential for overfitting and significance issues. Given that we're trying to produce a binary output, we may investigate trying to use a step function instead of a logistic in our regression. Obviously, since our primary aim in the larger project is intelligence research, we intend to investigate relationships between intelligence, gender, and personality.
Data
If you want to supply data for our research, please fill out this form.
If you wish to verify our results or perform your own analysis, you can view our full data here.