The original “2016 New Coder Survey” dataset consists of 113 variables. Most of these variables are answers to survey questions, though a few are computer-generated (e.g. respondent ID and survey start/end times). Over 15,000 observations (i.e. respondents) exist.
The str
function output is long and messy, so I won’t print it here. Please consult Free Code Camp’s survey data dictionary. Boolean, numeric, and categorical types are the majority.
I created six new variables from existing variables:
ifelse
statementscut
function on HoursLearningThese new variables bring our total to 119 variables.
## [1] 15620 119
646 respondents answered “Data Scientist/Data Engineer” to the question: “Which one of these roles are you most interested in?”
## [1] 646 119
Free Code Camp’s article structure is intentionally mimicked for the purpose of direct comparison. Additional comments are included where the results significantly differ. A few bonus plots are included too!
CodeNewbie and Free Code Camp designed the survey, and dozens of coding-related organizations publicized it to their members.
Of the 646 developing data scientists and data engineers who responded to the survey:
Data science and engineering appear to draw a few more females, as 21% of new coders in general are women.
## female
## 0.2447917
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 14.00 22.00 26.00 27.72 31.25 65.00 74
The median of 26 years is clearer once the long-tail data is log transformed.
This average is 5 months longer than the full survey dataset.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 3.00 8.00 16.17 20.00 360.00 31
Like the age plot, the median programming experience of 8 months is much clearer once logarithmically transformed.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 5.00 10.00 14.41 20.00 80.00 30
Again, log transformation makes the right-skewed data’s distribution clearer. The first quartile, median, and third quartile of 5, 10, and 20, respectively, are easily detectable.
Compared to 40% for the full new coder survey, this is a bit shocking. I understand the demand for data scientists and engineers in industry, but I have a hunch these zero counts are caused by the survey’s design. Every respondent that answered the job role of interest question has zero counts for “start your own business” and “freelance.”
The data-related subset has a longer time horizon than the full survey dataset, where 65% are applying within the next year.
Those interested in data science and/or engineering use Coursera, edX, and Udacity more frequently than new coders in general. These companies have a wider range of subject areas than the some of the coding-specific resources listed.
64% of developing data scientists and engineers have used at least one of Coursera, edX, or Udacity.
## [1] 0.6393189
Only 46% of new coders in general have used at least one of these resources.
## [1] 0.4591549
6% of new coders from the full survey dataset have attended a bootcamp.
The dominating percentage of North Americans should be expected because Free Code Camp is based in the United States.
Compared to 58% for the full new coder survey, the data-focused subset is more skewed towards post-secondary studies.
Diversity amongst majors is greater compared to the full survey, where Computer Science and Information Technology checked in at #1 and #2 with 17% and 5%, respectively.
Two-thirds of the new coder population are currently working.
There is a higher variety of employment fields compared to the full dataset, where 50% of respondents work in software development and IT.
The median current salary for the full dataset is $37k.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 25000 43600 48420 60000 200000 390
The median for the full survey dataset is $50k. With data science/engineering being notoriously lucrative in 2016, some respondents might be seeking higher wages.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 40000 60000 61110 80000 200000 65
## has served in military
## 0.06501548
## has children
## 0.1346749
## financially supporting
## 0.03250774
## no spouse
## 0.2137405
This is 5% higher than new coders in general.
## is underemployed
## 0.4705882
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 76000 150000 194400 240000 1000000 591
This average is $3k more than the full survey dataset.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 10000 20000 36880 45000 1000000 485
Removing the million dollar outlier, the distribution is much clearer with the majority of debt under $75k. I hope that outlier is a joke.
## has high-speed internet
## 0.8573913
## is receiving disability benefits
## 0.02608696
There isn’t really a singular main feature of interest in the “2016 New Coder Survey” dataset. There are several smaller features, but nothing stands out like diamond price and its relationship to carat weight, cut, colour, etc. in the R diamonds dataset, for example. The diamonds dataset covers two time periods (the existence of the diamond pre-sale and post-sale), whereas the survey dataset only covers a single period (the early stages of an individual’s coding career).
If we could fast-forward several years and survey the same respondents, the main feature of interest might be career earnings (adjusted for cost of living, preferably) and/or self-reported career satisfaction. A predictive model using a combination of variables from the 2016 survey could then be built to estimate career success.
If the survey asked “Are you already working as a data scientist/engineer?” instead of “Are you already working as a software developer?”, the current income variable might be a main feature of interest. Unfortunately, the answer to that question cannot be extracted from the existing variables.
Though there isn’t a main feature of interest, we can separate the respondents who answered something other than “Data Scientist/Data Engineer” to the job role interest question and compare the subsets using bivariate and multivariate plots.
I will also explore several of the smaller features, six of them numerical and four of them categorical.
The numerical:
The categorical:
There is a lot of long-tail data that requires transformation to view the details of the distribution. Programming experience, for example, is really positively skewed. Some respondents have coded for one month, others for 20+ years.
That no respondents want to freelance or start their own business seems strange. Perhaps a survey design choice caused these zero counts. Every respondent that answered the job role of interest question has zero counts for “start your own business” and “freelance.”
The following operations were performed to tidy, adjust, or change the form of the data:
gather()
to transform the data from a wide format to a long format. Then I transformed the long data into factor format, using the replicate function with the number of yeses as the multiplier. This data is used to create the code event, resource, and podcast bar charts. The first five operations were performed so bar charts could be created, which wasn’t possible with the original data format. The Americas separation was performed for additional insight.
14974 respondents did not answer “Data Scientist/Data Engineer” to the question: “Which one of these roles are you most interested in?”
## [1] 14974 119
The next two plots are created using pairs.panels() from the psych package. They display a scatter plot of matrices (SPLOM), with bivariate scatter plots below the diagonal, histograms on the diagonal, and the Pearson correlation above the diagonal.
For the data science subset of the survey, all correlations are below 0.4, which supports my statement that no main feature exists. The strongest of the correlations are:
The phenomena revealed are intuitive, but not groundbreaking: you tend to make more money when you are older, you tend to expect your next job to have a high salary if your current one does, and expensive schooling can lead to higher income levels.
For the non-data science subset of the survey, all correlations are again below 0.4. Most of the correlations are within 0.1 of the data science subset, except for three:
Interesting. Student debt levels are involved in all three correlations. I bet the aforementioned skew towards post-secondary studies for the data science subset plays a role here, where higher levels of student debt are expected. Expensive schooling has not led to higher salaries as frequently for those not interested in data science and engineering.
Let’s zoom in on the strong age-income correlation, this time for the full survey dataset. Note that the strength exists despite the majority of $200k salaries belonging to respondents under 40.
The earnings vs. age trend, however, isn’t maintained as these individuals prepare to transition to their new job of choice. Younger individuals seem willing to capitalize on lucrative tech salaries and older individuals seem willing to take a pay cut.
We’ll ditch the data science-only focus that complemented Free Code Camp’s univariate exploration of new coders, and switch to profiling new coders in general.
First, we’ll explore hours dedicated to learning per week and expected next salary across gender and continent citizenship. These former two variables are dependent upon the quality of the coding resources, whereas the other numerical ones (e.g. age, income, and programming experience) are set previously.
For the following logarithmic boxplots, the horizontal line is the median and the “x” is the mean. The top of the box is the third quartile and the bottom is the first quartile. Whisker length is the interquartile range multiplied by 1.5.
Statistical tests and inferences follow the majority of the bivariate plots. Alpha levels of 0.05 are used for the pairwise t-tests, which are appropriate because the variables analyzed are normally distributed.
Hours dedicated to learning results are nearly identical across genders.
##
## male female genderqueer agender trans
## 10766 2840 66 38 36
Do transgender new coders actually spend more time learning? A pairwise t-test says the difference is not significant, as all trans p-values are greater than the aforementioned alpha value of 0.05.
## male female genderqueer agender
## female 0.0001183222 NA NA NA
## genderqueer 0.7009298469 0.7911540 NA NA
## agender 0.3421146251 0.6519387 0.6006503 NA
## trans 0.3485157466 0.1554507 0.3260318 0.1812888
Not much differentiation in weekly hours dedicated to learning for continents as well. All have a median of 10 hours. Asian and African students have the highest means, at 16.4 and 16.8 hours, respectively.
##
## North America Europe Asia South America Africa
## 6744 3358 2178 567 506
## Oceania
## 301
The higher Asian and African means are both significant compared to the other continents’ means at a significance level of 0.05.
## North America Europe Asia South America
## Europe 0.7366938549 NA NA NA
## Asia 0.0006050985 0.0008652786 NA NA
## South America 0.3219580345 0.4255435379 0.006543667 NA
## Africa 0.0136500683 0.0113202845 0.553219616 0.01015108
## Oceania 0.0958745963 0.1305238713 0.002926231 0.44185526
## Africa
## Europe NA
## Asia NA
## South America NA
## Africa NA
## Oceania 0.003587357
Females actually expect higher salaries than males, with a $9k gap in medians and a $4k gap in means. There is a huge gap in first quartiles, where the 25th percentile female expects $14k more than her male equivalent. As with hours dedicated to learning, transgender new coders have relatively higher expected salaries. Did a particularly ambitious set of trans individuals respond to the survey or are these their true traits?
## Gender: male
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 30000 50000 52620 70000 200000 6763
## --------------------------------------------------------
## Gender: female
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 43650 59000 56620 70000 200000 1532
## --------------------------------------------------------
## Gender: genderqueer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 7000 50000 60000 66970 70000 200000 37
## --------------------------------------------------------
## Gender: agender
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 24000 36000 46500 58220 67500 200000 20
## --------------------------------------------------------
## Gender: trans
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 20000 44250 67500 67230 76250 200000 20
The gap between females and males is significant, according to a pairwise t-test, as are the gaps between genderqueer respondents and males and trans respondents and males. I wonder why minority-gendered respondents expect higher salaries for their next job.
## male female genderqueer agender
## female 2.061149e-05 NA NA NA
## genderqueer 8.998896e-03 0.06141337 NA NA
## agender 4.205845e-01 0.81824355 0.3227977 NA
## trans 4.773836e-02 0.15208424 0.9768457 0.373584
Whoa. Expected next salary by continent varies way more compared to the above three boxplots. Given the previously listed sample sizes for each continent, I would assume nearly all of these gaps are statistically significant. North Americans expect the highest range of salaries, with their interquartile range spanning from $50k to $70k. Europe’s 75th percentile is North America’s 25th percentile (I wonder if some European respondents forgot to convert from pounds or euros to US dollars). Expectations in Asia are all over the board.
A lot of these individuals are using similar, if not the same, online educational resources. Labour market economics are cruel.
The median new coder that dedicates 40+ hours per week expects $10k more than the median new coders from the other brackets.
##
## (0,10] (10,20] (20,40] (40,100]
## 8175 3564 2394 694
This expectations gap, however, might be due to random chance, as the (0,10] and (40,100] means comparison failed to show significance.
## (0,10] (10,20] (20,40]
## (10,20] 0.01094413 NA NA
## (20,40] 0.05523308 0.710986652 NA
## (40,100] 0.10608015 0.003341339 0.008889624
Let’s dig into that 40-100 hour bracket. Less than 5% of new coders are dedicating 40+ hours to learning each week. Below are the most common ages and educational backgrounds for this bracket. The bottom row is number of respondents.
##
## 25 21 26 23 22 20 24 32 30 28
## 43 42 39 37 34 33 33 29 28 27
##
## bachelor's degree
## 270
## some college credit, no degree
## 102
## high school diploma or equivalent (GED)
## 71
## master's degree (non-professional)
## 45
## some high school
## 35
## professional degree (MBA, MD, JD, etc.)
## 24
Most of these respondents are in their early twenties and have a bachelor’s degree. It appears that they are forgoing traditional forms of higher education (like master’s and professional degrees) and using those 40+ hour weeks to learn code.
This is the exact situation I’m in with my personalized data science master’s degree. The quality and affordability of online education in 2016 is incredible, though many still aren’t aware of the existence of resources like Free Code Camp, Udacity, and Coursera. If this survey was performed in a few years, I would expect more respondents to be in the higher brackets.
Let’s now explore job roles of interest, first categorically, then numerically. Again, the most common roles are:
##
## Full-Stack Web Developer Front-End Web Developer
## 2571 1379
## Back-End Web Developer Data Scientist / Data Engineer
## 704 646
## Mobile Developer User Experience Designer
## 414 275
## DevOps / SysAdmin Product Manager
## 219 191
## Quality Assurance Engineer
## 104
User experience designer is by far the most diverse discipline in terms of gender, with 52% males, 46% females, and the highest percentage of agender, genderqueer, and trans respondents (2%). Mobile development is the most male-dominated discipline at 81%, though full-stack and back-end development are close.
The highest relative popularity for North America (read: biggest blue bar segment) is user experience design. Europe’s is back-end development. Asia’s, South America’s, and Africa’s is mobile development. Oceania’s is data science/engineering. Mobile developer is the most diverse discipline in terms of citizenship.
The skew towards post-secondary studies for data science and data engineering is much clearer here. Mobile development has the highest percentage of respondents with no, some, or only a high school education, though back-end development is a close second. I wonder if these skews will reflect themselves in the subsequent age boxplot.
Mobile developers are indeed the youngest with a first quartile two years younger than the next youngest role, but back-end developers are not second-youngest. Mobile being a relatively new discipline likely has something to do with this phenomena. Front-end development is the oldest discipline with a mean age of 29 years.
## JobRoleInterest: Full-Stack Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 11.00 23.00 27.00 28.94 33.00 70.00 294
## --------------------------------------------------------
## JobRoleInterest: Front-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 12.00 24.00 27.00 29.08 33.00 64.00 193
## --------------------------------------------------------
## JobRoleInterest: Back-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 13.00 22.00 27.00 28.03 32.00 59.00 103
## --------------------------------------------------------
## JobRoleInterest: Data Scientist / Engineer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 14.00 22.00 26.00 27.72 31.25 65.00 74
## --------------------------------------------------------
## JobRoleInterest: Mobile Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 12.0 20.0 24.0 26.2 31.0 54.0 77
## --------------------------------------------------------
## JobRoleInterest: UX Designer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 12.00 22.00 26.00 28.74 32.00 73.00 38
Based on the results of a pairwise t-test, we are inclined to conclude that mobile developers are the youngest, but front-end developers being the oldest might be caused by random chance.
## Full-Stack Developer Front-End Developer
## Front-End Developer 6.362326e-01 NA
## Back-End Developer 1.733113e-02 1.181725e-02
## Data Scientist / Engineer 1.815968e-03 1.384457e-03
## Mobile Developer 2.017238e-08 2.421440e-08
## UX Designer 7.266237e-01 5.663622e-01
## Back-End Developer Data Scientist / Engineer
## Front-End Developer NA NA
## Back-End Developer NA NA
## Data Scientist / Engineer 0.528965428 NA
## Mobile Developer 0.001311138 0.008054328
## UX Designer 0.266165761 0.114106993
## Mobile Developer
## Front-End Developer NA
## Back-End Developer NA
## Data Scientist / Engineer NA
## Mobile Developer NA
## UX Designer 0.0003372604
Data scientists-, data engineers-, and back-end developers-in-training have programmed the longest with a median experience of eight months. UX designers have the lowest first quartile by two whole months at two months, but front-end developers have the lowest average of 9.5 months. Programming experience is so positively skewed that some of the means are above their third quartile.
## JobRoleInterest: Full-Stack Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.250 0.500 1.043 1.000 40.830 88
## --------------------------------------------------------
## JobRoleInterest: Front-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.2500 0.5000 0.7917 1.0000 15.0000 43
## --------------------------------------------------------
## JobRoleInterest: Back-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.3333 0.6667 1.2680 1.6250 20.0000 33
## --------------------------------------------------------
## JobRoleInterest: Data Scientist / Engineer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.2500 0.6667 1.3470 1.6670 30.0000 31
## --------------------------------------------------------
## JobRoleInterest: Mobile Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.250 0.500 1.049 1.250 13.330 15
## --------------------------------------------------------
## JobRoleInterest: UX Designer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.1667 0.5000 1.0610 1.0000 36.0000 20
For all disciplines except back-end development, the results of the t-tests suggest that those interested in data science and engineering have indeed programmed for longer. Front-end developers have programmed the least, unanimously, according to p-values.
## Full-Stack Developer Front-End Developer
## Front-End Developer 7.616578e-05 NA
## Back-End Developer 5.808766e-03 7.899438e-08
## Data Scientist / Engineer 3.202567e-04 1.242137e-09
## Mobile Developer 9.513062e-01 1.588332e-02
## UX Designer 8.831383e-01 3.515075e-02
## Back-End Developer Data Scientist / Engineer
## Front-End Developer NA NA
## Back-End Developer NA NA
## Data Scientist / Engineer 0.45064220 NA
## Mobile Developer 0.06479446 0.01348069
## UX Designer 0.13350983 0.04064788
## Mobile Developer
## Front-End Developer NA
## Back-End Developer NA
## Data Scientist / Engineer NA
## Mobile Developer NA
## UX Designer 0.9366481
Full-stack developers dedicate the most time to learning each week, with 25% of respondents dedicating 30+ hours weekly. UX designers spend the least amount of time learning per week with a mean of 12 hours per week.
## JobRoleInterest: Full-Stack Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 10.00 15.00 19.94 30.00 100.00 108
## --------------------------------------------------------
## JobRoleInterest: Front-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 6.0 12.0 16.7 20.0 100.0 48
## --------------------------------------------------------
## JobRoleInterest: Back-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 8.00 15.00 18.77 25.00 100.00 40
## --------------------------------------------------------
## JobRoleInterest: Data Scientist / Engineer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 5.00 10.00 14.41 20.00 80.00 30
## --------------------------------------------------------
## JobRoleInterest: Mobile Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 5.00 12.00 17.76 25.00 100.00 21
## --------------------------------------------------------
## JobRoleInterest: UX Designer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 5.00 10.00 12.04 15.00 63.00 19
Statistical significance unanimously supports that full-stack developers spend the most time learning and UX designers the least.
## Full-Stack Developer Front-End Developer
## Front-End Developer 2.120848e-10 NA
## Back-End Developer 7.321617e-02 3.656253e-03
## Data Scientist / Engineer 2.701196e-16 1.656064e-03
## Mobile Developer 7.364865e-03 2.170544e-01
## UX Designer 1.038213e-15 5.003092e-06
## Back-End Developer Data Scientist / Engineer
## Front-End Developer NA NA
## Back-End Developer NA NA
## Data Scientist / Engineer 1.942677e-07 NA
## Mobile Developer 2.905685e-01 0.0005175278
## UX Designer 1.021364e-09 0.0331516928
## Mobile Developer
## Front-End Developer NA
## Back-End Developer NA
## Data Scientist / Engineer NA
## Mobile Developer NA
## UX Designer 1.937195e-06
Respondents interested in data science and/or engineering clearly have the highest current salaries. I would be shocked if a pairwise t-test suggested otherwise. Their third quartile of $60k per year is $8k higher than the next highest discipline. There isn’t much income differentiation between the remaining job roles of interest, though all are above the 2014 US median income of $28.9k.
## JobRoleInterest: Full-Stack Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 20000 35000 41010 52000 200000 1508
## --------------------------------------------------------
## JobRoleInterest: Front-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 20000 35000 37020 48000 200000 806
## --------------------------------------------------------
## JobRoleInterest: Back-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 17750 32000 36990 49250 200000 436
## --------------------------------------------------------
## JobRoleInterest: Data Scientist / Engineer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 25000 43600 48420 60000 200000 390
## --------------------------------------------------------
## JobRoleInterest: Mobile Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 20000 33800 36420 46500 155000 286
## --------------------------------------------------------
## JobRoleInterest: UX Designer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 20000 31500 35730 50000 90000 175
The inference that developing data scientists and engineers have the highest current salaries, as expected, is supported by statistical inference. UX designers having the lowest, on average, might be caused by random chance, however.
## Full-Stack Developer Front-End Developer
## Front-End Developer 0.0057234012 NA
## Back-End Developer 0.0345620393 9.869959e-01
## Data Scientist / Engineer 0.0001364637 5.710562e-08
## Mobile Developer 0.0782325486 8.259973e-01
## UX Designer 0.0699231222 6.691043e-01
## Back-End Developer Data Scientist / Engineer
## Front-End Developer NA NA
## Back-End Developer NA NA
## Data Scientist / Engineer 2.781493e-06 NA
## Mobile Developer 8.502432e-01 7.102127e-05
## UX Designer 7.002592e-01 1.145311e-04
## Mobile Developer
## Front-End Developer NA
## Back-End Developer NA
## Data Scientist / Engineer NA
## Mobile Developer NA
## UX Designer 0.8524358
Those interested in data science/engineering expect to earn the most at their next job. Given the aforementioned correlation between current salaries and expected salaries, this is not a surprise. Front-end developers appear to be the least optimistic in terms of next salary, though it is pretty close. Note that expected salaries are higher than current salaries across the board.
## JobRoleInterest: Full-Stack Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 40000 55000 54670 70000 200000 225
## --------------------------------------------------------
## JobRoleInterest: Front-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 30000 50000 48070 60000 200000 118
## --------------------------------------------------------
## JobRoleInterest: Back-End Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 30000 50000 50060 65000 200000 73
## --------------------------------------------------------
## JobRoleInterest: Data Scientist / Engineer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 40000 60000 61110 80000 200000 65
## --------------------------------------------------------
## JobRoleInterest: Mobile Developer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 30000 50000 52740 70000 200000 48
## --------------------------------------------------------
## JobRoleInterest: UX Designer
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6000 40000 51000 55100 70000 200000 40
Interesting. At a significance level of 0.05, future front-end developers can be said to expect the lowest next salaries. Data-focused respondents retain their place in the salary-related kingdom.
## Full-Stack Developer Front-End Developer
## Front-End Developer 8.252229e-11 NA
## Back-End Developer 4.106259e-04 1.587079e-01
## Data Scientist / Engineer 1.731529e-06 4.490202e-19
## Mobile Developer 2.377850e-01 6.742147e-03
## UX Designer 8.283726e-01 6.602333e-04
## Back-End Developer Data Scientist / Engineer
## Front-End Developer NA NA
## Back-End Developer NA NA
## Data Scientist / Engineer 4.043780e-11 NA
## Mobile Developer 1.606523e-01 1.595757e-05
## UX Designer 2.332536e-02 7.428145e-03
## Mobile Developer
## Front-End Developer NA
## Back-End Developer NA
## Data Scientist / Engineer NA
## Mobile Developer NA
## UX Designer 0.3314679
The data science/engineering subset of the survey is largely similar to the non-data science/engineering subset, except for three correlations involving student debt owed. The skew towards post-secondary studies for the data-focused subset is the likely culprit.
The correlation between current salary and age is stronger than the correlation between expected next salary and age for new coders in general, and I expect that would be true for the data-focused subset as well.
Hours dedicated to learning doesn’t appear to vary much with gender or continent, with consistent medians of ten hours weekly.
Expected salary for a new coder’s next job varies strongly by continent. Females appear to have a much higher bottom line than males. Those who dedicate more than 40 hours a week to learning might expect higher next salaries, but sample size issues prevent this statement from being definitive.
The majority of new coders for all job roles of interest are male, North American, and have bachelor’s degrees. Age, programming experience, hours dedicated to learning, current salary, and expected next salary all vary depending on job role of interest. One or two of the roles stands out from the pack for each of the five quantitative variables.
No exceedingly strong relationship exists. All correlations are below 0.4.
Current salary and expected next salary has the strongest relationship for both subsets with correlations of 0.36 and 0.38.
Europe’s 75th percentile for expected next salary is North America’s 25th percentile ($50k USD). Perhaps some European respondents forgot to convert from pounds or euros to US dollars.
Let’s dig deeper into the two salary variables: current salary and expected next salary. Again, the latter salary is for the first new job where the respondent, presumably, will advertise their new coding skills.
For both males vs. females and ethnic majorities vs. minorities, three faceted scatter plots, in succession, follow:
Since respondents that are 65+ years old are the outliers, I removed them to tighten up the linear model.
Female new coders have a higher median current salary ($38k vs. $36k) than males. They are also slightly older (28 vs. 27 years old). Pearson’s r correlations, which are appropriate given that income and age are both normally distributed, tell us that male salaries tend to increase with age more so than female salaries do for this new coder dataset.
Despite the abundance of male data points (79% of survey respondents are male), male respondents clearly have a higher-than-the-population-average proportion of $150k+ current salaries. The split, below, is 89/11.
##
## male female
## 0.893617 0.106383
As with current salary, females have a higher median expected next salary. Correlations are similar across genders this time, however. Both are low, indicating that there isn’t much of a relationship between expected salary and age. Young new coders, both male and female, expect similar salaries as older new coders.
Male new coders do again have an above average proportion of expected next salaries above $150k. The 82/18 split isn’t as extreme as the previous 89/11 split, however.
##
## male female
## 0.8181818 0.1818182
Plotting the two salary-related variables against each other, we are left with the impression that the gender wage gap does not exist in this dataset. Females have both a higher median current salary and expected next salary. Male new coders do dominate the elite ($150k+) salary lines but also notice the cluster of blue circles and the absence of red circles near the origin. Pearson’s r correlations tell us that a higher percentage of a female new coder’s expected salary can be explained by her current salary. Both correlations are still relatively high.
Plotting current salary vs. age for ethnicity tells a much different story than the plot for gender. Whereas the correlations are both near 0.25 for ethnic majorities and minorities, for males vs. females they were drastically different. For new coders of all races, the degree at which age contributes to your current salary is similar. Median ages are both 27, while the 50th percentile minority actually has a higher current salary than their majority equivalent by $4k.
As with gender, there is an abundance of data points for the majority demographic (76/24 is the survey’s ethnic majority/minority split). The split is a bit more extreme as we isolate those with current salaries above $150k.
##
## No Yes
## 0.8105263 0.1894737
Again, the correlations with age are lower for expected next salary compared to current salary. There is a bit of a gap between groups this time, indicating that minorities in today’s workplace might expect their salary to increase at a slower rate as they age. Ethnic minorities definitively have a higher median by $10k. They appear optimistic about the changing diversity landscape in the workplace.
Unlike gender, the proportion of ethnic majorities vs. minorities remains constant near 76/24 when we isolate those with expected next salaries above $150k.
##
## No Yes
## 0.7692308 0.2307692
We are again left with the impression that the wage gap, this time the racial one, does not exist in this dataset. Minorities have both a higher median current salary and expected next salary. The majority demographic owns only a slightly higher than average percentage of the $150k+ salaries. The minority’s correlation below is higher as well, suggesting that these new coders expect to convert their current salary to their next salary at a higher rate.
Let’s combine all of the job roles of interest boxplots (the blue ones) into one radar chart. The mean for each numerical variable normalized between 0 and 1 is plotted.
One thing jumps out immediately: developing data scientists/engineers lead the pack for programming experience, current salary, and expected next salary. Beyond that, however, overplotting is an issue, which makes it difficult to internalize other patterns in the data. Let’s fix that by faceting the plot next.
Ah, that’s better. Full-stack developers have high normalized age and hours dedicated to learning means. They also have middle-of-the-road means for all other variables, which contribute to their notable polygon area. Front-end and mobile developers have the smallest areas, thanks to the lowest programming experience and expected next salary means for the former, and the lowest and second-lowest age and current salary means for the latter.
Perception of strength based on overall area is a common misinterpretation of radar plots. Note that we are strictly using this plot to efficiently compare roles across several numerical variables, and not to determine which role is better if such a determination even exists.
The wage gaps, both gender and racial, do not present themselves in this new coder dataset via current and expected next salary medians. Maybe new coders aren’t an accurate representation of the working population in general. Data suggests that both wage gaps still exist in 2016.
That the female correlation between income and age (0.192) is lower than the male one (0.267), but the ethnic minority correlation (0.243) is nearly identical to the ethnic majority one (0.253) is interesting as well. I wonder why, all else equal, the minority demographic for race performs better salary-wise as they age compared to the minority demographic for gender.
Though this was previously noted in the bivariate section, the strong current salary vs. age relationship isn’t maintained as new coders across all genders and ethnic representations transition to their next job. Through expected next salary results, younger individuals seem willing to capitalize on lucrative coding-related salaries and older individuals seem willing to take a pay cut.
This segmented bar chart conveys gender representation across job roles of interest.
Overall, the majority of new coder survey respondents are males. The vast majority are either male or female.
Mobile development leads the way with the highest percentage of males at 81%, though full-stack and front-end development are close, at 79% and 78%, respectively.
User experience design is the most diverse discipline, with 52% males and 46% females, and the highest proportion of agender, genderqueer, and trans respondents (2%). Front-end development has a notable percentage of females as well with 35%, which is 14% higher than the full survey dataset.
This faceted radar chart, where the normalized mean (between 0 and 1) for each numerical variable is plotted for each job role of interest, clarifies the differences between disciplines. A common misinterpretation of radar plots is the perception of strength based on overall area. This plot should strictly be used to efficiently compare roles across numerical variables, not to determine which role is better.
Developing data scientists/engineers make the most money, expect the most money for their next job, and have the most programming experience. They have the largest amount of area within their polygon.
Full-stack developers are relatively older and dedicate the most amount of time to learning weekly. They also have a large polygon area.
Front-end developers are most green in terms of programming experience and have the lowest salary expectations for their first job where they advertise their new web development skills. They also have relatively low current salaries. These three factors contribute to the smallest polygon area.
Mobile developers are the youngest and currently do not make much money. These characteristics are expected of the discipline with the highest proportion of respondents with no, some, or only a high school education. They have the second smallest polygon area.
This expected next salary vs. current salary scatter plot, faceted by gender, has a best-fit line labeled with Pearson’s r correlation, as well as dashed lines representing the median for each axis.
The dashed median lines inform us that the gender wage gap does not exist in this dataset. Females have a $2k lead in current salary and a $9k lead in expected salary for their next job, post-coding skills acquisition.
Though male new coders do have the highest proportion of elite ($150k+) current salaries and expected next salaries, they also have a notable cluster near the origin that females do not.
The correlation between expected and current salary is stronger for female new coders. This gap suggests that females expect to convert their current salary to a similar salary for their next job at a higher rate than males. They are both strong overall, however, as the correlation between these two salary variables represents the strongest correlation in the “2016 New Coder Survey” dataset.
Developing data scientists and engineers are slightly different than new coders in general.
The two subsets do share plenty of common trends. Most are willing to relocate. Most don’t use podcasts or attend events yet. Similar proportions are ethnic minorities in their country.
Older new coders are willing to take a pay cut when transitioning to a job where they advertise their new coding skills. Younger new coders intend to increase their earning potential by capitalizing on demand for coding.
Weekly hours dedicated to learning doesn’t differ much across genders and citizenships by continent. Next expected salary does, however. Most people aren’t replacing the traditional college/university route with full-time online education…yet. Those that are seem to expect higher salaries, though we can’t be sure because of sample size issues.
Gender and continent distributions across job roles of interest vary. Females appear drawn to user experience design. Asians, South Americans, and Africans appear drawn to mobile development. School degree obtained does not vary much by discipline overall, though data science/engineering and mobile development stick out as the most and least seasoned in terms of education, respectively.
Developing data scientists/engineers have the highest current salaries, expect the highest next salaries, and have the most programming experience. Front-end developers are the oldest, but not significantly. Full-stack developers dedicate the most amount of time to learning per week.
Front-end developers are the least experienced coders and expect the lowest next salaries. UX designers spend the least amount of hours learning weekly and have the lowest current salaries, but not significantly for the latter. Mobile developers are the youngest.
The gender and racial wage gaps do not present themselves in this dataset. Perhaps new coders aren’t reflective of the working population in general, where data suggests that both wage gaps still exist in 2016.
The successes of this exploration are largely due to the detailed design of the Free Code Camp survey.
The main struggle I encountered in this exploration was the lack of a main feature of interest, like the diamond dataset’s price variable. It would be awesome if we could survey the same respondents in a decade or so. We could combine career earnings and career satisfaction with the 2016 survey’s results to build a predictive model to estimate career success.
These are the people who are learning to code. Free, self-paced learning resources are definitely important.