Exploring Free Code Camp’s “2016 New Coder Survey”

Structure of Dataset

The original “2016 New Coder Survey” dataset consists of 113 variables. Most of these variables are answers to survey questions, though a few are computer-generated (e.g. respondent ID and survey start/end times). Over 15,000 observations (i.e. respondents) exist.

The str function output is long and messy, so I won’t print it here. Please consult Free Code Camp’s survey data dictionary. Boolean, numeric, and categorical types are the majority.

New Variables

I created six new variables from existing variables:

  • ContinentCitizen and ContinentLive from CountryCitizen and CountryLive using Vincent Arel-Bundock’s countrycode R package
  • PodcastPartiallyDerivative, PodcastBecomingDataSci, and PodcastTalkingMachines from PodcastOther using ifelse statements
  • HoursLearningBucket using the cut function on HoursLearning

These new variables bring our total to 119 variables.

## [1] 15620   119

Data Science/Engineering Subset

646 respondents answered “Data Scientist/Data Engineer” to the question: “Which one of these roles are you most interested in?

## [1] 646 119

The following analysis first explores the characteristics of these developing data scientists/engineers, which complements Free Code Camp’s univariate exploration of new coders in general.

Free Code Camp’s article structure is intentionally mimicked for the purpose of direct comparison. Additional comments are included where the results significantly differ. A few bonus plots are included too!

We’ll then dive deeper into the characteristics of new coders in general via bivariate and multivariate exploration.

Univariate Plots

Who Participated

CodeNewbie and Free Code Camp designed the survey, and dozens of coding-related organizations publicized it to their members.

Of the 646 developing data scientists and data engineers who responded to the survey:

A quarter are women.

Data science and engineering appear to draw a few more females, as 21% of new coders in general are women.

##    female 
## 0.2447917

Their median age is 26.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   14.00   22.00   26.00   27.72   31.25   65.00      74

The median of 26 years is clearer once the long-tail data is log transformed.