Review of Python Courses (Part 6)
Posted by Mark on December 4, 2020 at 06:53 | Last modified: January 21, 2021 13:14In Part 5, I summarized my Datacamp courses 13-15. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #16 was Introduction to Data Science in Python. This course covers:
- Creating variables
- What is a function?
- What is pandas?
- Selecting columns
- Select rows with logic
- Creating line plots
- Adding labels and legends
- Adding some style (line color, width, style, markers, template)
- Making a scatter plot (marker transparency)
- Making a bar chart (horizontal, error bars, stacked)
- Making a histogram (bins, range, normalizing)
>
My course #17 was Joining Data with Pandas. This course covers:
- Inner join (changing df values with .loc accessor)
- One to many relationships
- Merging multiple DataFrames
- Left join (count number of rows in a column with missing data)
- Right and outer joins
- Merging a table to itself (i.e. self join)
- Merging on indexes
- Filtering joins (semi-joins, anti-joins)
- Concatenate DataFrames together vertically [.append()]
- verify_integrity=True identifies accidental duplicates while validate arg helps to identify relationship type
- Using merge_ordered() (for ordered/time-series data and to fill in missing values)
- Using .merge_asof() (matches on nearest-value rather than equal-value columns)
- Selecting data with .query()
- Reshaping data with .melt()
>
Introduction to Linear Modeling in Python was my eighteenth course. This covers:
- Introductory concepts about models (interpolation, extrapolation)
- Visualizing linear relationships [object-oriented (OOP) approach to matplotlib]
- Quantifying linear relationships (covariance, correlation, normalization)
- What makes a model linear (Taylor series, overfitting, defining function to plot graph)
- Interpreting slope and intercept
- Model optimization (RSS: sum of squared residuals)
- Least-squares optimization (by numpy, Scipy, Statsmodels)
- Modeling real data
- The limits of prediction
- Goodness of fit (deviations, residuals, and R-squared in code)
- Standard error (RMSE measures spread of residuals whereas SE measures uncertainty in model params)
- Inferential statistics concepts
- Model estimation and likelihood
- Model uncertainty and sample distributions (bootstrap in code)
- Model errors and randomness
>