Codementor Events

Data science books - theory and practice

Published Jun 29, 2018
Data science books - theory and practice

In this post I’d like to share some of my recommended books for learning data science and machine learning, both in theory and and practice.

Theory

These are all foundational textbooks in machine learning. If you study at least one of them in depth, by which I mean formulating models, deriving and implementing the main inference algorithms, and doing the exercises, you’ll have a solid background. The books can be quite technical if you’re new to machine learning, but once you stick through one, you’ll find others quite accessible.

  • The Elements of Statistical Learning (ESL), by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie
    One of the classics, there’s also an online course and a new textbook accompanied by R code.

  • Pattern recognition and machine learning (PRML), by Christopher Bishop
    Similar to ESL, this highly regarded book is another must-read.

  • Machine Learning: A Probabilistic Perspective, by Kevin R Murphy
    If you study PRML thoroughly, you’ll be familiar with most contents in Murphy’s book. Nevertheless a fun and comprehensive book with a strong focus on principled, probabilistic approach to modelling. It also comes with code in Matlab.

  • Probabilistic Graphical Models, Daphne Koller and Nir Friedman
    Graphical models provide a framework for representation, inference, as well as learning of probabilistic models. This powerful framework provides a unifying view to many ML models which otherwise may be viewed as just a bunch of disparate models. There’s also an online course on Coursera.

  • Reinforcement learning, an introduction, by Richard S. Sutto and Andrew G. Barto
    Despite still a draft, the second release is well-written and motivates the concepts and applications of RL really well.

  • Neural networks and deep learning, by Michael Nielsen

  • Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville
    Michael Nielsen’s book is more hands-on and contains some cool interactive contents to aid understanding, while Goodfellow et al is more comprehensive. I recommend reading them in the given order.

Practice

Data science for business, by Foster Provost and Tom Fawcett
This book is accessible to non-technical audience like business managers. It also provides some sound principles on how to execute data science projects. Highly recommended.

  • R for data science, by Garrett Grolemund and Hadley Wickham, http://r4ds.had.co.nz/
    This is a must read especially for R users.

  • Applied predictive modelling, by Kjell Johnson and Max Kuhn
    Written by the author of the popular R package caret, this is another must-read. It contains many practical tricks and advices for not only modelling but also data preparation suitable for different model classes.

  • Data Mining Techniques: For Marketing, Sales, and Customer Relationship, by Gordon S. Linoff and Michael J. A. Berry
    Don’t let the title mislead you, this is a good read on data science techniques in general, not just for CRM.

  • Data preparation for data mining, by Dorian Pyle
    Published in 1999 but still very relevant today, this book provides a good checklist of things to inspect when preparing data for analysis.

  • Bandit algorithms for website optimization, by John Myles White
    This book presents standard multi-armed bandit algorithms and comes with implementations in several languages.

  • Practical data science with R, John Mount and Nina Zumel
    Not as polished as Johnson and Kuhn’s book but has few neat techniques worth knowing.

Discover and read more posts from Trung Nguyen
get started