Review: John Hopkins Reproducible Research on Coursera

It’s been a few months, but it is time for the next instalment of the Data Science Specialisation. Next up off the back of the interestingly named, but ultimately mundane Exploratory Data Analysis is Reproducible Research.

The Course

The main focus of the four weeks is really just the concept of “literate programming”. As a Java developer this is an interesting concept, as the rule of thumb for me is – if you can’t understand the code, you as the author should rewrite it. The difference when writing code for statistical analysis however is there is a lot more focus on the “why” rather than the typical comment of the “how”.

Literate program does sit slightly uneasily with me though, as it doesn’t tie in well with reuse. The basic premise is that the document reads like a script. You’re telling the story of the analysis a view into someone’s thought process; however if you want to break out your repeated code blocks into functions, where do they fit in?

From a technical standpoint you can include them however you do in plain R, at the bottom of the script or in another file. Then then metaphor “reading into someone’s thoughts” breaks down as you then have to jump around within the document or just skip over parts. Obviously just my opinion, no doubt the literate programming propaganda will discredit that statement…

Blending Markdown and R is pretty neat; being a fan of markdown in general I think this works really well. Every R program I write from now on will be and Rmd file I imagine.

The course also introduces RPubs, which is nice little facet of the R community. It allows you to easily publish research from the IDE with next to not effort. Once in the wild it then has the ability for others to comment on the work, undoubtedly a valuable tool.

So the main positives from the course spoken of, I ultimately feel that this was another filler course.

Roger spends a lot of time going on about the importance of reproducibility – as if this is not bleeding obvious. There are a few nuisances that you perhaps might not have come across, the finer points of reproducible vs replicable for example. There is just nothing really taught in the module other than “this is what literate programming is” and getting people into the mind set of “reproducibility of analysis is important”.

For me, this and Exploratory Data Analysis could have easily been one module. Two weeks on the finer points of plotting and one assignment, followed by two weeks on reproducible research, with one assignment using R Markdown. Neither of these topics really needed two assignments and the lecture material never tops an hour total for a given week. Week four in both cases now have a tendency just to be a lengthy case study discussion video.

The Assignments

The assignments are a decent length. A lot of people were getting bogged down with cleaning the data from looking at the discussions on the forum. I took the standpoint that, there was a whole other module dedicated to that, and it seems that those I peer assessed were also of that ilk. So there is room to go beyond the minimum required if you are looking to practice previously acquired skills.

The advice I’ll also give with assignments is always read the marking rubric. This is really meant to be for when you are doing your peer assessments, however there can be criteria assessed that are not explicitly asked for in the assignment description.

The Quizzes

These are the usual Coursera fare of 5-10 multiple choice questions, with the usual sloppiness. Asking questions in week one, which are not discussed until week three in the lecture slides.

Final Thoughts

All in all, I’d put the course currently running at two hits and three misses for the modules so far. See:

R Programming, Getting & Cleaning Data,Exploratory Data Analysis,

That said I consider the big ticket items to be up next: Statistical inference, Regression Models and Practical Machine Learning.

Review: John Hopkins Reproducible Research on Coursera