Review of Python Courses (Part 11)
Posted by Mark on December 21, 2020 at 07:41 | Last modified: February 1, 2021 11:34In Part 10, I summarized my Datacamp courses 28-30. Today I will continue with the next four.
As a reminder, I introduced you to my recent work learning Python here.
My course #31 was Customer Analytics and A/B Testing in Python. This course covers:
- What is A/B testing?
- Identifying and understanding KPIs
- Exploratory analysis of KPIs
- Calculating KPIs—a practical example
- Working with time series data in pandas
- Creating time series graphs with matplotlib
- Understanding and visualizing trends in customer data
- Events and releases
- Introduction to A/B testing
- Initial A/B test design
- Preparing to run an A/B test
- Calculating sample size
- Analyzing the A/B test results
- Understanding statistical significance (get_pvalue, get_ci)
- Interpreting your test results
>
My course #32 was Machine Learning with Tree-Based Models in Python. This course covers:
- Decision-tree for classification (from sklearn.tree import DecisionTreeClassifier)
- Classification-tree learning
- Decision-tree for regression
- Generalization error (bias-variance tradeoff)
- Diagnosing bias and variance problems
- Ensemble learning
- Bagging (from sklearn.ensemble import BaggingClassifier)
- Out of bag evaluation
- Random forests
- AdaBoost (from sklearn.ensemble import AdaBoostClassifier)
- Gradient boosting (from sklearn.ensemble import GradientBoostingRegressor)
- Stochastic gradient boosting
- Tuning a CART’s hyperparameters
- Tuning an RF’s hyperparameters
>
My course #33 was Introduction to PySpark. This is a data engineering course—a field in which I found myself not very enthusiastic. This course covers:
- What is Spark, anyway?
- Using Spark in Python
- Using dataframes
- Joining
- Maching learning pipelines
- Data types
- Strings and factors
>
My course #34 was Cleaning Data with PySpark. This course covers:
- Intro to data cleaning with Apache Spark
- Immutability and lazy processing
- Understanding Parquet
- Dataframe column operations
- Conditional dataframe column operations
- User defined functions
- Partitioning and lazy processing
- Caching
- Improve import performance
- Cluster sizing tips
- Performance improvements
- Introduction to data pipelines
- Data handling techniques
- Data validation
- Final analysis and delivery
>
I will review more classes next time.
Categories: Python | Comments (0) | Permalink