Review of Python Courses (Part 33)
Posted by Mark on March 9, 2021 at 06:54 | Last modified: February 21, 2021 07:26In Part 32, I summarized my Datacamp courses 95-97. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #98 was Practicing Statistics Interview Questions in Python. This is a very comprehensive course that covers the following topics:
- Conditional probabilities
- Central limit theorem (from numpy.random import randint)
- Probability distributions
- Descriptive statistics
- Categorical data (from sklearn import preprocessing)
- Two or more variables
- Confidence intervals (import scipy.stats as st; from sm.stats.proportion import proportion_conf)
- Hypothesis testing (from scipy.stats import sem, t, ttest_ind)
- Power and sample size (from statsmodels.stats.power import zt_ind_solve_power)
- Multiple testing (from statsmodels.sandbox.stats.multicomp import multipletests)
- Regression models (from sklearn.linear_model import LinearRegression, LogisticRegression)
- Evaluating models (from sklearn.metrics import mean_squared_error, confusion_matrix, recall_score)
- Missing data and outliers
- Bias-variance tradeoff
>
My course #99 was Intermediate Spreadsheets for Google Sheets. Topics covered in this course include:
- Data types for data science
- Convert or die!
- Common data transformations
- Rounding numbers
- Generating random numbers
- Logical operations
- Flow control
- Blanks, missing values, and errors
- Cell addresses
- Lookups and matching
- Bringing it all together
>
My course #100 was Practicing Coding Interview Questions in Python. This is probably the most comprehensive and dense course of all. I took a long time getting through this, but the amount of material covered is really incredible. Props to instructor Kirill Smirnov! The course covers:
- What are the main data structures in Python?
- What are common ways to manipulate strings?
- How to write regular expressions in Python?
- What are iterable objects?
- What is a list comprehension?
- What is a zip object?
- What is a generator and how to create one?
- How to pass a variable number of arguments to a function?
- What is a lambda expression?
- What are the functions .map(), .filter(), and .reduce()?
- What is recursion?
- What is the difference between a NumPy array and a list?
- How to use the .apply() method on a dataframe?
- How to use the .groupby() method on a dataframe?
- How to visualize data in Python?
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 32)
Posted by Mark on March 4, 2021 at 07:48 | Last modified: February 19, 2021 10:46In Part 31, I summarized my Datacamp courses 92-94. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #95 was Dimensionality Reduction in Python. This course covers:
- Introduction
- Feature selection vs. feature extraction
- t-SNE visualization of high-dimensional data (from sklearn.manifold import TSNE)
- The curse of dimensionality (from sklearn.model_selection import train_test_split; from sklearn.svm import SVC)
- Features with missing values or little variance (from sklearn.feature_selection import VarianceThreshold)
- Pairwise correlation (hide correlation matrix redundancy)
- Removing highly correlated features
- Selecting features for model performance (from sklearn.feature_selection import RFE)
- Tree-based feature selection (from sklearn.ensemble import RandomForestClassifier)
- Regularized linear regression (from sklearn.linear_model import Lasso)
- Combining feature selectors (from sklearn.linear_model import LassoCV)
- Feature extraction
- Principal component analysis (from sklearn.decomposition import PCA)
- PCA applications (from sklearn.pipeline import Pipeline)
- Principal component selection
>
My course #96 was Writing Efficient Python Code. Topics covered in this course include:
- Defining efficient
- Building with built-ins
- The power of NumPy arrays
- Examining runtime (%timeit)
- Code profiling for runtime [pip install line_profiler; %lprun -f foo(args)]
- Code profiling for memory usage (import sys, pip install memory_profiler)
- Efficiently combining, counting, and iterating (from collections import Counter; from itertools import combinations)
- Set theory
- Eliminating loops
- Writing better loops
- Intro to pandas dataframe iteration
- Another iterator method: .itertuples() [faster than .iterrows()]
- Pandas alternative to looping (use .apply() on an entire dataframe)
- Optimal pandas iterating (use .values to get array rather than series)
- Final tips
>
My course #97 was Machine Learning for Finance in Python. This course covers:
- Predict the future (e.g. stock price changes) with machine learning
- Data transforms, features, and targets (import talib)
- Linear modeling with financial data
- Engineering features (from sklearn.model_selection import ParameterGrid)
- Decision trees (from sklearn.tree import DecisionTreeRegressor)
- Random forests (from sklearn.ensemble import RandomForestRegressor)
- Feature importances and gradient boosting [np.argsort(); from sklearn.ensemble import GradientBoostingRegressor]
- Scaling data and KNN regression (from sklearn.preprocessing import scaler)
- Neural networks (from keras.models import Sequential; from keras.layers import Dense)
- Custom loss functions (import tensorflow as tf; import keras.losses)
- Overfitting and ensembling (from keras.layers import Dropout; from sklearn.metrics import r2_score)
- Modern Portfolio Theory (MPT) and efficient frontiers (review this complex code involving covariance)
- Sharpe Ratios, features, and targets
- Machine learning for MPT
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 31)
Posted by Mark on March 1, 2021 at 07:37 | Last modified: February 18, 2021 13:43In Part 30, I summarized my Datacamp courses 89-91. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #92 was Market Basket Analysis in Python. This course covers:
- What is market basket analysis (using lambda function for string processing)?
- Identifying association rules
- The simplest metric (from mlxtend.preprocessing import TransactionEncoder)
- Confidence and lift
- Leverage and conviction
- Association and dissociation
- Advanced rules
- Aggregation
- The Apriori algorithm (from mlxtend.frequent_patterns import apriori)
- Basic Apriori results: pruning (from mlxtend.frequent_patterns import association_rules)
- Advanced Apriori results: pruning
- Heatmaps
- Scatterplots
- Parallel coordinates plot (from pandas.plotting import parallel_coordinates)
>
My course #93 was Winning a Kaggle Competition in Python. Topics covered in this course include:
- Competitions overview
- Prepare your first submission (from sklearn.linear_model import LinearRegression)
- Public vs. private leaderboard
- Understand the problem (from sklearn.metrics import roc_auc_score, f1_score, mean_squared_error, log_loss)
- Initial EDA (create/extract hour feature)
- Local validation (from sklearn.model_selection import KFold, StratifiedKFold)
- Validation usage (from sklearn.model_selection import TimeSeriesSplit)
- Feature engineering
- Categorical features (from sklearn.preprocessing import LabelEncoder)
- Target encoding
- Missing data (from sklearn.impute import SimpleImputer)
- Baseline model (import train_test_split; from sklearn.ensemble import GradientBoostingRegressor)
- Hyperparameter tuning (from sklearn.linear_model import Ridge)
- Model ensembling
- Final tips
>
My course #94 was Machine Learning for Time Series in Python. This course covers:
- Kinds of time series and applications
- Machine learning basics [from sklearn.svc import LinearSVC, .reshape()]
- Combining time series data with machine learning (from glob import glob; import librosa as lr)
- Classification and feature engineering (axis = -1)
- Improving the features we use for classification (from sklearn.model_selection import cross_val_score)
- The spectrogram—spectral changes to sound over time (from librosa.core import stft, amplitude_to_db)
- Predicting data over time (from sklearn.metrics import r2_score)
- Cleaning and improving your data (percent_change function, visualizing outlier thresholds)
- Creating features over time (from functools import partial)
- Time-delayed features and auto-regressive models
- Cross-validating time series data (from sklearn.model_selection import ShuffleSplit, TimeSeriesSplit)
- Stationarity and stability (from sklearn.utils import resample)
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 30)
Posted by Mark on February 26, 2021 at 07:32 | Last modified: February 17, 2021 13:32In Part 29, I summarized my Datacamp courses 86-88. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #89 was a case study in Python machine learning. This course covers:
- Introducing the challenge
- Exploring the data
- Looking at the datatypes (converting dtype for all dataframe categories)
- How do we measure success?
- It’s time to build a model (from sklearn.linear_model import LogisticRegression)
- Making predictions (from sklearn.multiclass import OneVsRestClassifier)
- A very brief introduction to NLP
- Representing text numerically (from sklearn.feature_extraction.text import CountVectorizer)
- Pipelines, feature and text preprocessing (from sklearn.pipeline import Pipeline, FeatureUnion)
- Text features and feature unions (from sklearn.preprocessing import FunctionTransformer, Imputer)
- Choosing a classification model (from sklearn.ensemble import RandomForestClassifier)
- Learning from the expert: processing
- Learning from the expert: a stats trick (from sklearn.preprocessing import PolynomialFeatures)
- Learning from the expert: the winning model (from sklearn.feature_extraction.text import HashingVectorizer)
- Next steps and the social impact of your work
>
My course #90 was Ensemble Methods in Python. Topics covered in this course include:
- Introduction to ensemble methods (from sklearn.ensemble import MetaEstimator)
- Voting (from sklearn.ensemble import VotingClassifier, VotingRegresssor)
- Averaging
- The strength of “weak” models
- Bootstrap aggregating
- Bagging classifier: nuts and bolts
- Bagging parameters: tips and tricks
- The effectiveness of gradual learning
- Adaptive boosting: award winning model (from sklearn.ensemble import AdaBoostClassifier, AdaBoostRegressor)
- Gradient boosting (from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor)
- Gradient boosting flavors (import xgboost as xgb; import lightgbm as lgb; import catboost as cb)
- The intuition behind stacking
- Build your first stacked ensemble
- Let’s mlxtend it (from mlxtend.classifier import StackingClassifier; from mlxtend.regressor import StackingRegressor)!
>
My course #91 was Data Analysis in Spreadsheets. This course covers:
- First function: ROUND
- Function composition: SQRT
- Functions and ranges: MIN, MAX
- Selecting ranges: SUM, AVERAGE, MEDIAN
- Multiple arguments: RANK
- String manipulation: LEFT, RIGHT
- String information: LEN, SEARCH
- Combining strings: CONCATENATE
- Date functions: WEEKDAY
- Comparing dates
- Combining functions
- Flow control: IF
- Nested logical functions: IF
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 29)
Posted by Mark on February 23, 2021 at 17:38 | Last modified: February 17, 2021 09:27In Part 28, I summarized my Datacamp courses 83-85. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #86 was Hyperparameter Tuning in Python. This course covers:
- Introduction (visualize Random Forest)
- Hyperparameters overview
- Hyperparameters values
- Introducing grid search
- Grid search with Scikit Learn (from sklearn import metrics)
- Understanding grid search output
- Introducing random search (from itertools import product)
- Random search in Scikit Learn (from sklearn.model_selection import RandomizedSearchCV)
- Comparing grid and random search
- Informed search: coarse to fine
- Informed methods: Bayesian statistics
- Informed methods: genetic algorithms (from tpot import TPOTClassifier)
>
My course #87 was Case Studies in Statistical Thinking. Topics covered in this course include:
- Activity of zebrafish and melatonin
- Bootstrap confidence intervals
- Hypothesis tests (permutation test)
- Linear regression and pairs bootstrap
- Introduction to swimming data
- Do swimmers go faster in the finals (how to do permutation test exercise)?
- How does the performance of swimmers decline over long events?
- Introduction to the current controversy
- The zigzag effect
- Recap of swimming analysis
- Introduction to statistical seismology
- Timing of major earthquakes
- How are the Parkfield interearthquake times distributed?
- Variations in earthquake frequency and seismicity
- Earthquake magnitudes in Oklahoma
>
My course #88 was Analyzing Police Activity with Pandas. This course covers:
- Stanford Open Policing Project dataset (count missing values in each column)
- Using proper data types
- Creating a Datetime index
- Do the genders commit different violations?
- Does gender affect who gets a ticket for speeding?
- Does gender affect whose vehicle is searched?
- Does gender affect who is frisked during a search?
- Does time of day affect arrest rate?
- Are drug-related stops on the rise?
- What violations are caught in each district?
- How long might you be stopped for a violation?
- Exploring the weather dataset
- Categorizing the weather
- Merging datasets
- Does weather affect the arrest rate?
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 28)
Posted by Mark on February 18, 2021 at 07:13 | Last modified: February 16, 2021 13:35In Part 27, I summarized my Datacamp courses 80-82. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #83 was GARCH Models in Python. This course covers:
- Why do we need GARCH models?
- What are ARCH and GARCH (from statsmodels.graphics.tsaplots import plot_acf)?
- How to implement GARCH models in Python (from arch import arch_model)?
- Distribution assumptions
- Mean model specifications
- Volatility models for asymmetric shocks
- GARCH rolling window forecast
- Significance testing of model parameters
- Validation of GARCH model assumptions (from statsmodels.stats.diagnostic import acorr_ljungbox)
- Goodness of fit measures
- GARCH model backtesting (from sklearn.metrics import mean_absolute_error, mean_squared_error)
- VaR in financial risk management
- Dynamic covariance in portfolio optimization
- Dynamic beta in portfolio management
>
My course #84 was Cleaning Data in Python. Topics covered in this course include:
- Data type constraints (dataframe preprocessing)
- Data range constraints (use .loc[] to assign column values)
- Uniqueness constraints (needs further review)
- Membership constraints
- Categorical variables
- Cleaning text data
- Uniformity
- Cross field validation
- Completeness [count .isna(); import missingno as msno]
- Comparing strings (from fuzzywuzzy import fuzz, process)
- Generating pairs (import recordlinkage)
- Linking dataframes
>
My course #85 was Quantitative Risk Management in Python. This was deep and needs more study. The course covers:
- What is quantitative risk management?
- Risk management and the financial crisis (import statsmodels.api as sm)
- Modern portfolio theory (from pypfopt.expected_returns import mean_historical_return)
- Measuring risk (from pypfopt.risk_models import CovarianceShrinkage)
- Risk exposure and loss (from scipy.stats import t)
- Risk management using VaR and CVaR (from pypfopt.objective_functions import negative_cvar)
- Portfolio hedging: offsetting risk
- Parametric estimation (from scipy.stats import norm, anderson, skewnorm, skewtest)
- Historical and Monte Carlo simulation
- Structural breaks
- Volatility and extreme values
- Extreme value theory (from scipy.stats import genextreme)
- Kernel density estimation (from scipy.stats import gaussian_kde)
- Neural network risk management (from keras.models import Sequential; from keras.layers import Dense)
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 27)
Posted by Mark on February 16, 2021 at 06:59 | Last modified: February 15, 2021 17:10In Part 26, I summarized my Datacamp courses 77-79. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #80 was Model Validation in Python. This course covers:
- Introduction to model validation
- Regression models (from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier)
- Classification models
- Creating train, test, and validation datasets (creating a holdout set)
- Accuracy metrics: regression models (from sklearn.metrics import mean_absolute_error, mean_squared_error)
- Classification metrics (from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score)
- Problems with holdout sets
- Cross-validation (from sklearn.model_selection import KFold)
- cross_val_score (from sklearn.model_selection import cross_val_score; from sklearn.metrics import make_scorer)
- Leave-one-out-cross-validation
- Introduction to hyperparameter tuning
- RandomizedSearchCV
- Selecting your final model
>
My course #81 was Image Processing in Python. Topics covered in this course include:
- Make images come alive with scikit-image (from skimage import data, color)
- NumPy for images
- Getting started with thresholding (from skimage.filters import try_all_threshold)
- Jump into filtering (from skimage.filters import sobel, gaussian)
- Contrast enhancement (from skimage import exposure)
- Transformations (from skimage.transform import rotate, rescale, resize)
- Morphology (from skimage import morphology)
- Image restoration (from skimage.restoration import inpaint)
- Noise (from skimage.util import random_noise; from skimage.restoration import denoise_tv_chambolle)
- Superpixels and segmentation (from skimage.segmentation import slic; from skimage.color import label2rgb)
- Finding contours (from skimage import measure)
- Finding the edges with Canny (from skimage.feature import canny)
- Right around the corner (from skimage.feature import corner_harris)
- Face detection (from skimage.feature import Cascade)
- Real-world applications
>
My course #82 was Recurrent Neural Networks for Language Modeling in Python. This course covers:
- Introduction to the course
- Introduction to RNN inside Keras (from keras.models import Sequential; from keras.layers import Dense)
- Vanishing and exploding gradients
- GRU and LSTM cells (from keras.layers import GRU, Dense; from keras.layers.recurrent import LSTM)
- The embedding layer (from keras.layers import Embedding; from keras.initializers import Constant)
- Sentiment classification revisited
- Data pre-processing (from keras.utils.np_utils import to_categorical)
- Transfer learning for language models (from gensim.models import word2vec, fasttext)
- Multi-class classification models (from sklearn.datasets import fetch_20newsgroups)
- Assessing the model’s performance (from sklearn.metrics import confusion_matrix, f1_score, classification_report)
- Sequence to sequence models
- Text generation models (from keras.preprocessing.text import Tokenizer)
- Neural machine translation (from keras.preprocessing.sequence import pad_sequences)
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 26)
Posted by Mark on February 12, 2021 at 07:29 | Last modified: February 15, 2021 11:54In Part 25, I summarized my Datacamp courses 74-76. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #77 was Visualizing Time Series Data in Python. This course covers:
- Plot your first time series
- Customize your time series plot
- Clean your time series data (counting missing values in df)
- Plot aggregates of your data
- Summarizing the value in your time series data
- Autocorrelation and partial autocorrelation (from statsmodels.graphics import tsaplots)
- Seasonality, noise, and trend in time series data [from pylab import RCparams, sm.tsa.seasonal_decompose()]
- Working with more than one time series
- Plot multiple time series (adding statistical summaries to your plots)
- Find relationships between multiple time series [sns.heatmap(), sns.clustermap()]
- Apply your knowledge to a new dataset
- Beyond summary statistics
- Decompose time series data
- Compute correlations between time series
>
My course #78 was Financial Forecasting in Python. Topics covered in this course include:
- Introduction to financial statements
- Calculating sales and the cost of goods sold
- Working with raw datasets
- Introduction to the balance sheet
- Balance sheet efficiency ratios
- Financial periods and how to work with them
- The datetime library and Split function
- Tips and tricks when working with datasets
- Building sensitive forecast models and common forecast assumptions
- Dependencies and sensitivity in financial forecasting
- Working with variances in the forecast
>
My course #79 was Foundations of Probability in Python. This course covers:
- Let’s flip a coin in Python (from scipy.stats import bernoulli, binom)
- Probability mass and distribution functions
- Expected value, mean, and variance (from scipy.stats import describe)
- Calculating probabilities of two events (from scipy.stats import find_repeats, relfreq)
- Conditional probabilities
- Total probability law
- Bayes’ rule
- Normal distributions (from scipy.stats import norm, import matplotlib.pyplot as plt, import seaborn as sns)
- Risk factors
- Factor models
- Portfolio analysis tools
- Normal probabilities
- Poisson distributions (from scipy.stats import poisson)
- Geometric distributions (from scipy.stats import geom)
- From sample mean to population mean (from scipy.stats import binom, describe)
- Adding random variables
- Linear regression (from sklearn.linear_model import LinearRegression, from scipy.stats import linregress)
- Logistic regression (from sklearn.linear_model import LogisticRegression)
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 25)
Posted by Mark on February 9, 2021 at 07:29 | Last modified: February 12, 2021 09:25In Part 24, I summarized my Datacamp courses 71-73. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #74 was Writing Functions in Python. Overall, I found this content to be quite challenging. The course covers:
- Docstrings (require string)
- DRY and “do one thing” [standardize function, mean_and_median()]
- Pass by assignment
- Using context managers
- Writing context managers
- Advanced topics
- Functions as objects
- Scope
- Closures
- Decorators
- Real-world examples
- Decorators and metadata (from functools import wraps)
- Decorators that take arguments
- Timeout(): a real-world example
>
My course #75 was AI Fundamentals. Topics covered in this course include:
- What is all the AI fuss about?
- All models are wrong but some are useful
- Three flavors of machine learning
- Supervised learning fundamentals
- Training and evaluating classification models (confusion matrix, true/false positives/negatives)
- Training and evaluating regression models (from sklearn.preprocessing import PolynomialFeatures)
- Dimensionality reduction
- Clustering
- Anomaly detection
- Selecting the right model
- Deep learning and beyond
- Convolutional neural networks
>
My course #76 was Introduction to Portfolio Analysis in Python. This course covers:
- Welcome to portfolio analysis
- Portfolio returns
- Measuring risk of a portfolio (formatting as percentage)
- Annualized returns
- Risk-adjusted returns (calculating SR)
- Non-normal distribution of returns
- Alternative measures of risk
- Comparing against a benchmark
- Risk factors
- Factor models
- Portfolio analysis tools
- MPT (from pypfopt.efficient_frontier import EfficientFrontier; from pypfopt import risk_models, expected_returns)
- Maximum Sharpe vs. minimum volatility
- Alternative portfolio optimization
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 24)
Posted by Mark on February 4, 2021 at 07:41 | Last modified: February 10, 2021 16:23In Part 23, I summarized my Datacamp courses 68-70. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #71 was Improving Your Data Visualizations in Python. This course covers:
- Highlighting data
- Comparing groups
- Annotations
- Color in visualizations
- Continuous color palettes
- Categorical palettes
- Point estimate intervals
- Confidence bands
- Beyond 95% (visualizing multiple confidence bands at once)
- Visualizing the bootstrap
- Looking at the farmers market data
- Exploring the patterns
- Making your visualizations efficient
- Tweaking your plots
>
My course #72 was Command Line Automation in Python. Because I don’t use the shell much, I don’t see a whole lot of application here for me and I’m not sure how much I absorbed. In any case, topics covered in this course include:
- Learn the Python interpreter
- Capture IPython shell output
- Automate with SList
- Execute shell commands in subprocess (import subprocess; import os)
- Capture output of shell commands (from subprocess import Popen, PIPE)
- Sending input to processes
- Passing arguments safely to shell commands
- Dealing with file systems
- Find files matching a pattern (from pathlib import Path; import fnmatch, re)
- High-level file and directory operations (from shutil import copytree, ignore_patterns, rmtree, make_archive)
- Using pathlib (from pathlib import Path)
- Using functions for automation (from functools import wraps)
- Understand script input
- Introduction to click (import click)
- Using click to write command line tools (from click.testing import CliRunner)
>
My course #73 was Unit Testing for Data Science in Python. This course covers:
- Why unit test?
- Write a simple unit test using pytest
- Understanding test result report
- More benefits and test types
- Mastering assert statements
- Testing for exceptions instead of return values
- The well-tested function
- Test driven development (TDD)
- How to organize a growing set of tests?
- Mastering test execution
- Expected failures and conditional skipping
- Continuous integration and code coverage
- Beyond assertion: setup and teardown
- Mocking (from unittest.mock import call)
- Testing models
- Testing plots
>
I will review more courses next time.
Categories: Python | Comments (0) | Permalink