ISBD view

Data science from scratch :

Grus, Joel,

Data science from scratch : - First edition. - Sebastopol, CA : O'Reilly, [2015] ©2015 - xvi, 311 pages : illustrations ;

Includes index. Subtitle from cover.

Introduction --
A crash course in Python --
Visualizing data --
Linear algebra --
Statistics --
Probability --
Hypothesis and inference --
Getting data --
Working with data --
Machine learning --
k-Nearest neighbors --
Naive bayes --
Simple linear regression --
Multiple regression --
Logistic regression --
Decision trees --
Neural networks --
Clustering --
Natural language processing --
Network analysis --
Recommender systems --
Databases and SQL --
MapReduce --
Go forth and do data science. 1. Introduction : The ascendance of data ; What is data science? ; Motivating hypothetical: DataSciencester : Finding key connectors; Data scientists you may know; Salaries and experience; Paid accounts; Topics of interest ; Onward --
2. A crash course in Python : The basics : Getting Python; The Zen of python; Whitespace formatting; Modules; Arithmetic; Functions; Strings; Exceptions; Lists; Tuples; Dictionaries; Sets; Control flow ; Truthiness ; The not-so-basics : Sorting; List comprehensions; Generators and iterators; Randomness; Regular expressions; Object-Oriented Programming; Functional tools; enumerate; zip and argument unpacking; args and kwargs; Welcome to DataSciencester! --
3. Visualizing data : matplotlib ; Bar charts ; Line charts ; Scatterplots --
4. Linear algebra : Vectors ; Matrices --
5. Statistics : Describing a single set of data ; Correlation ; Simpson's paradox ; Some other correlational caveats ; Correlation and causation --
6. Probability : Dependence and independence ; Conditional probability ; Bayes's theorem ; Random variables ; Continuous distributions ; The normal distribution ; The central theorem --
7. Hypothesis and inference : Statistical hypothesis testing ; Example: Flipping a coin ; p-values ; Confidence intervals ; P-hacking ; Example: Running an A/B test ; Bayesian inference --
8. Gradient descent : The idea behind gradient descent ; Estimating the gradient ; Using the gradient ; Choosing the right step size ; Putting it all together ; Stochastic gradient descent --
9. Getting data : stdin and stdout ; Reading files : The basics of text files; Delimited files ; Scraping the web : HTML and the parsing thereof; Example: O'Reilly books about data ; Using APIs : JSON (and XML); Using an unauthenticated API; Finding APIs ; Example: Using the Twitter APIs : Getting credentials. 10. Working with data : Exploring your data : Exploring one-dimensional data; Two dimensions; Many dimensions ; Cleaning and munging ; Manipulating data ; Rescaling ; Dimensionality reduction --
11. Machine learning : Modeling ; What is machine learning? ; Overfitting and underfitting ; Correctness ; The bias-variance trade-off ; Feature extraction and selection --
12. k-nearest neighbors : The model ; Example: Favorite languages ; The curse of dimensionality --
13. Naive Bayes : A really dumb spam filter ; A more sophisticated spam filter ; Implementation ; Testing our model --
14. Simple linear regression : The model ; Using gradient descent ; Maximum likelihood estimation --
15. Multiple regression : The model ; Further assumptions of the least squares model ; Fitting the model ; Interpreting the model ; Goodness of fit ; Digression: the bootstrap ; Standard errors of regression coefficients ; Regularization --
16. Logistic regression : The problem ; The logistic function ; Applying the model ; Goodness of fit ; Support vector machines --
17. Decision trees : What is a decision tree? ; Entropy ; The entropy of a partition ; Creating a decision tree ; Putting it all together ; Random forests --
18. Neural networks : Perceptrons ; Feed-forward neural networks ; Backpropagation ; Example: Defeating a CAPTCHA. 19. Clustering : The idea ; The model ; Example: Meetups ; Choosing k ; Example: Clustering colors ; Bottom-up hierarchical clustering --
20. Natural language processing : Word clouds ; n-gram models ; Grammars ; An aside: Gibbs sampling ; Topic modeling --
21. Network analysis : Betweenness centrality ; Eigenvector centrality : Matrix multiplication; Centrality ; Directed graphs and PageRank --
22. Recommender systems : Manual curation ; Recommending what's popular ; User-based collaborative filtering ; Item-based collaborative filtering --
23. Databases and SQL : CREATE TABLE and INSERT ; UPDATE ; DELETE ; SELECT ; GROUP BY ; ORDER BY ; JOIN ; Subqueries ; Indexes ; Query optimization ; NoSQL --
24. MapReduce : Example: Word count ; Why MapReduce? ; MapReduce more generally ; Example: Analyzing status updates ; Example: Matrix multiplication ; An aside: Combiners --
25. Go forth and do data science : IPython ; Mathematics ; Not from scratch : NumPy; pandas; scikit-learn; Visualization; R ; Find data ; Do data science : Hacker news; Fire trucks; T-shirts; And you?

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they're also a good way to dive into the discipline without actually understanding data science. In this book, you'll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today's messy glut of data holds answers to questions no one's even thought to ask. This book provides you with the know-how to dig those answers out

9781491901427 149190142X 9789352130962

Python (Computer program language)
Database management.
Data structures (Computer science)

005.7565 / GRU