We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.

Carla • 6 years ago

I would suggest a discussion of two topics: 1) the importance of subject matter knowledge, i.e. understanding of the big picture question that is to be answered using the findings is very important to guide many of the decisions in the analysis; and 2) legal restrictions on the use of data mining/machine learning tools.

Andrew Ross • 6 years ago

Related to the idea of legal restrictions, I'm hoping you will include some discussion of the ethics of data science, along the lines of "Weapons of Math Destruction" and related books and topics.

David Kane • 6 years ago

> We’ve chosen to focus on the use of R and RStudio in our blog, but other environments (e.g., python) are equally flexible, powerful, and attractive.

Exactly equal? That seems highly unlikely. Impossible, in fact.

I would be interested in reading what you (and others) think about the Python versus R choice for an intro course. Platitudes, while perhaps necessary in something like a National Academies Report, are not interesting in a blog.

Nicholas Horton • 6 years ago

I would concur that "equally" wasn't the best choice. The back and forth between the Python and R universes has improved both environments: either provide an excellent foundation for teaching introductory students. We do plan to compare and contrast in future posts.

Andrew Ross • 6 years ago

I wish I remember who said it, but I recently read something like: it's important to learn two or more languages, not just to have the skills in each, but to see how things vary (and stay the same) across languages--it can help you pick up new languages more easily in the future. I suppose it's like the difference between a sample size of n=1 and n=2; only with n=2 can you even start to estimate the variability!

The most important aspects for the under grad students is to understand the underlying concepts and apply it in different circumstances. I think the technology comes next - what tool to be used : R or Python. It's essential to be able to execute all the algorithms using at least one tool in the first place. Then redo the same thing using another tool, if necessary at all.

Nicholas Horton • 6 years ago

I agree that it's important for students to see more than one tool (so they can start to understand the strengths and limitations of their toolset).

Felista Nganga • 6 years ago

Besides knowing the strengths and limitations of the different tools, I think it's important to let students know these tools are all important in the job market. Looking at the current trend of data science jobs from different companies/ industries, etc, some require R, others Python, and others both. So, let the students not limit themselves to one, but if possible, they should learn both to get ready for the data science job market.

Hunter Glanz • 6 years ago

Thanks for your comment David. We plan to explicitly discuss R, RStudio, Python, and Jupyter, among many other things, in this blog. So stay tuned!