» Python Option Fanatic

Review of Python Courses (Part 33)

Posted by Mark on March 9, 2021 at 06:54 | Last modified: February 21, 2021 07:26

In Part 32, I summarized my Datacamp courses 95-97. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #98 was Practicing Statistics Interview Questions in Python. This is a very comprehensive course that covers the following topics:

Conditional probabilities
Central limit theorem (from numpy.random import randint)
Probability distributions
Descriptive statistics
Categorical data (from sklearn import preprocessing)
Two or more variables
Confidence intervals (import scipy.stats as st; from sm.stats.proportion import proportion_conf)
Hypothesis testing (from scipy.stats import sem, t, ttest_ind)
Power and sample size (from statsmodels.stats.power import zt_ind_solve_power)
Multiple testing (from statsmodels.sandbox.stats.multicomp import multipletests)
Regression models (from sklearn.linear_model import LinearRegression, LogisticRegression)
Evaluating models (from sklearn.metrics import mean_squared_error, confusion_matrix, recall_score)
Missing data and outliers
Bias-variance tradeoff

My course #99 was Intermediate Spreadsheets for Google Sheets. Topics covered in this course include:

Data types for data science
Convert or die!
Common data transformations
Rounding numbers
Generating random numbers
Logical operations
Flow control
Blanks, missing values, and errors
Cell addresses
Lookups and matching
Bringing it all together

My course #100 was Practicing Coding Interview Questions in Python. This is probably the most comprehensive and dense course of all. I took a long time getting through this, but the amount of material covered is really incredible. Props to instructor Kirill Smirnov! The course covers:

What are the main data structures in Python?
What are common ways to manipulate strings?
How to write regular expressions in Python?
What are iterable objects?
What is a list comprehension?
What is a zip object?
What is a generator and how to create one?
How to pass a variable number of arguments to a function?
What is a lambda expression?
What are the functions .map(), .filter(), and .reduce()?
What is recursion?
What is the difference between a NumPy array and a list?
How to use the .apply() method on a dataframe?
How to use the .groupby() method on a dataframe?
How to visualize data in Python?

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 32)

Posted by Mark on March 4, 2021 at 07:48 | Last modified: February 19, 2021 10:46

In Part 31, I summarized my Datacamp courses 92-94. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #95 was Dimensionality Reduction in Python. This course covers:

Introduction
Feature selection vs. feature extraction
t-SNE visualization of high-dimensional data (from sklearn.manifold import TSNE)
The curse of dimensionality (from sklearn.model_selection import train_test_split; from sklearn.svm import SVC)
Features with missing values or little variance (from sklearn.feature_selection import VarianceThreshold)
Pairwise correlation (hide correlation matrix redundancy)
Removing highly correlated features
Selecting features for model performance (from sklearn.feature_selection import RFE)
Tree-based feature selection (from sklearn.ensemble import RandomForestClassifier)
Regularized linear regression (from sklearn.linear_model import Lasso)
Combining feature selectors (from sklearn.linear_model import LassoCV)
Feature extraction
Principal component analysis (from sklearn.decomposition import PCA)
PCA applications (from sklearn.pipeline import Pipeline)
Principal component selection

My course #96 was Writing Efficient Python Code. Topics covered in this course include:

Defining efficient
Building with built-ins
The power of NumPy arrays
Examining runtime (%timeit)
Code profiling for runtime [pip install line_profiler; %lprun -f foo(args)]
Code profiling for memory usage (import sys, pip install memory_profiler)
Efficiently combining, counting, and iterating (from collections import Counter; from itertools import combinations)
Set theory
Eliminating loops
Writing better loops
Intro to pandas dataframe iteration
Another iterator method: .itertuples() [faster than .iterrows()]
Pandas alternative to looping (use .apply() on an entire dataframe)
Optimal pandas iterating (use .values to get array rather than series)
Final tips

My course #97 was Machine Learning for Finance in Python. This course covers:

Predict the future (e.g. stock price changes) with machine learning
Data transforms, features, and targets (import talib)
Linear modeling with financial data
Engineering features (from sklearn.model_selection import ParameterGrid)
Decision trees (from sklearn.tree import DecisionTreeRegressor)
Random forests (from sklearn.ensemble import RandomForestRegressor)
Feature importances and gradient boosting [np.argsort(); from sklearn.ensemble import GradientBoostingRegressor]
Scaling data and KNN regression (from sklearn.preprocessing import scaler)
Neural networks (from keras.models import Sequential; from keras.layers import Dense)
Custom loss functions (import tensorflow as tf; import keras.losses)
Overfitting and ensembling (from keras.layers import Dropout; from sklearn.metrics import r2_score)
Modern Portfolio Theory (MPT) and efficient frontiers (review this complex code involving covariance)
Sharpe Ratios, features, and targets
Machine learning for MPT

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 31)

Posted by Mark on March 1, 2021 at 07:37 | Last modified: February 18, 2021 13:43

In Part 30, I summarized my Datacamp courses 89-91. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #92 was Market Basket Analysis in Python. This course covers:

What is market basket analysis (using lambda function for string processing)?
Identifying association rules
The simplest metric (from mlxtend.preprocessing import TransactionEncoder)
Confidence and lift
Leverage and conviction
Association and dissociation
Advanced rules
Aggregation
The Apriori algorithm (from mlxtend.frequent_patterns import apriori)
Basic Apriori results: pruning (from mlxtend.frequent_patterns import association_rules)
Advanced Apriori results: pruning
Heatmaps
Scatterplots
Parallel coordinates plot (from pandas.plotting import parallel_coordinates)

My course #93 was Winning a Kaggle Competition in Python. Topics covered in this course include:

Competitions overview
Prepare your first submission (from sklearn.linear_model import LinearRegression)
Public vs. private leaderboard
Understand the problem (from sklearn.metrics import roc_auc_score, f1_score, mean_squared_error, log_loss)
Initial EDA (create/extract hour feature)
Local validation (from sklearn.model_selection import KFold, StratifiedKFold)
Validation usage (from sklearn.model_selection import TimeSeriesSplit)
Feature engineering
Categorical features (from sklearn.preprocessing import LabelEncoder)
Target encoding
Missing data (from sklearn.impute import SimpleImputer)
Baseline model (import train_test_split; from sklearn.ensemble import GradientBoostingRegressor)
Hyperparameter tuning (from sklearn.linear_model import Ridge)
Model ensembling
Final tips

My course #94 was Machine Learning for Time Series in Python. This course covers:

Kinds of time series and applications
Machine learning basics [from sklearn.svc import LinearSVC, .reshape()]
Combining time series data with machine learning (from glob import glob; import librosa as lr)
Classification and feature engineering (axis = -1)
Improving the features we use for classification (from sklearn.model_selection import cross_val_score)
The spectrogram—spectral changes to sound over time (from librosa.core import stft, amplitude_to_db)
Predicting data over time (from sklearn.metrics import r2_score)
Cleaning and improving your data (percent_change function, visualizing outlier thresholds)
Creating features over time (from functools import partial)
Time-delayed features and auto-regressive models
Cross-validating time series data (from sklearn.model_selection import ShuffleSplit, TimeSeriesSplit)
Stationarity and stability (from sklearn.utils import resample)

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 30)

Posted by Mark on February 26, 2021 at 07:32 | Last modified: February 17, 2021 13:32

In Part 29, I summarized my Datacamp courses 86-88. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #89 was a case study in Python machine learning. This course covers:

Introducing the challenge
Exploring the data
Looking at the datatypes (converting dtype for all dataframe categories)
How do we measure success?
It’s time to build a model (from sklearn.linear_model import LogisticRegression)
Making predictions (from sklearn.multiclass import OneVsRestClassifier)
A very brief introduction to NLP
Representing text numerically (from sklearn.feature_extraction.text import CountVectorizer)
Pipelines, feature and text preprocessing (from sklearn.pipeline import Pipeline, FeatureUnion)
Text features and feature unions (from sklearn.preprocessing import FunctionTransformer, Imputer)
Choosing a classification model (from sklearn.ensemble import RandomForestClassifier)
Learning from the expert: processing
Learning from the expert: a stats trick (from sklearn.preprocessing import PolynomialFeatures)
Learning from the expert: the winning model (from sklearn.feature_extraction.text import HashingVectorizer)
Next steps and the social impact of your work

My course #90 was Ensemble Methods in Python. Topics covered in this course include:

Introduction to ensemble methods (from sklearn.ensemble import MetaEstimator)
Voting (from sklearn.ensemble import VotingClassifier, VotingRegresssor)
Averaging
The strength of “weak” models
Bootstrap aggregating
Bagging classifier: nuts and bolts
Bagging parameters: tips and tricks
The effectiveness of gradual learning
Adaptive boosting: award winning model (from sklearn.ensemble import AdaBoostClassifier, AdaBoostRegressor)
Gradient boosting (from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor)
Gradient boosting flavors (import xgboost as xgb; import lightgbm as lgb; import catboost as cb)
The intuition behind stacking
Build your first stacked ensemble
Let’s mlxtend it (from mlxtend.classifier import StackingClassifier; from mlxtend.regressor import StackingRegressor)!

My course #91 was Data Analysis in Spreadsheets. This course covers:

First function: ROUND
Function composition: SQRT
Functions and ranges: MIN, MAX
Selecting ranges: SUM, AVERAGE, MEDIAN
Multiple arguments: RANK
String manipulation: LEFT, RIGHT
String information: LEN, SEARCH
Combining strings: CONCATENATE
Date functions: WEEKDAY
Comparing dates
Combining functions
Flow control: IF
Nested logical functions: IF

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 29)

Posted by Mark on February 23, 2021 at 17:38 | Last modified: February 17, 2021 09:27

In Part 28, I summarized my Datacamp courses 83-85. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #86 was Hyperparameter Tuning in Python. This course covers:

Introduction (visualize Random Forest)
Hyperparameters overview
Hyperparameters values
Introducing grid search
Grid search with Scikit Learn (from sklearn import metrics)
Understanding grid search output
Introducing random search (from itertools import product)
Random search in Scikit Learn (from sklearn.model_selection import RandomizedSearchCV)
Comparing grid and random search
Informed search: coarse to fine
Informed methods: Bayesian statistics
Informed methods: genetic algorithms (from tpot import TPOTClassifier)

My course #87 was Case Studies in Statistical Thinking. Topics covered in this course include:

Activity of zebrafish and melatonin
Bootstrap confidence intervals
Hypothesis tests (permutation test)
Linear regression and pairs bootstrap
Introduction to swimming data
Do swimmers go faster in the finals (how to do permutation test exercise)?
How does the performance of swimmers decline over long events?
Introduction to the current controversy
The zigzag effect
Recap of swimming analysis
Introduction to statistical seismology
Timing of major earthquakes
How are the Parkfield interearthquake times distributed?
Variations in earthquake frequency and seismicity
Earthquake magnitudes in Oklahoma

My course #88 was Analyzing Police Activity with Pandas. This course covers:

Stanford Open Policing Project dataset (count missing values in each column)
Using proper data types
Creating a Datetime index
Do the genders commit different violations?
Does gender affect who gets a ticket for speeding?
Does gender affect whose vehicle is searched?
Does gender affect who is frisked during a search?
Does time of day affect arrest rate?
Are drug-related stops on the rise?
What violations are caught in each district?
How long might you be stopped for a violation?
Exploring the weather dataset
Categorizing the weather
Merging datasets
Does weather affect the arrest rate?

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 28)

Posted by Mark on February 18, 2021 at 07:13 | Last modified: February 16, 2021 13:35

In Part 27, I summarized my Datacamp courses 80-82. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #83 was GARCH Models in Python. This course covers:

Why do we need GARCH models?
What are ARCH and GARCH (from statsmodels.graphics.tsaplots import plot_acf)?
How to implement GARCH models in Python (from arch import arch_model)?
Distribution assumptions
Mean model specifications
Volatility models for asymmetric shocks
GARCH rolling window forecast
Significance testing of model parameters
Validation of GARCH model assumptions (from statsmodels.stats.diagnostic import acorr_ljungbox)
Goodness of fit measures
GARCH model backtesting (from sklearn.metrics import mean_absolute_error, mean_squared_error)
VaR in financial risk management
Dynamic covariance in portfolio optimization
Dynamic beta in portfolio management

My course #84 was Cleaning Data in Python. Topics covered in this course include:

Data type constraints (dataframe preprocessing)
Data range constraints (use .loc[] to assign column values)
Uniqueness constraints (needs further review)
Membership constraints
Categorical variables
Cleaning text data
Uniformity
Cross field validation
Completeness [count .isna(); import missingno as msno]
Comparing strings (from fuzzywuzzy import fuzz, process)
Generating pairs (import recordlinkage)
Linking dataframes

My course #85 was Quantitative Risk Management in Python. This was deep and needs more study. The course covers:

What is quantitative risk management?
Risk management and the financial crisis (import statsmodels.api as sm)
Modern portfolio theory (from pypfopt.expected_returns import mean_historical_return)
Measuring risk (from pypfopt.risk_models import CovarianceShrinkage)
Risk exposure and loss (from scipy.stats import t)
Risk management using VaR and CVaR (from pypfopt.objective_functions import negative_cvar)
Portfolio hedging: offsetting risk
Parametric estimation (from scipy.stats import norm, anderson, skewnorm, skewtest)
Historical and Monte Carlo simulation
Structural breaks
Volatility and extreme values
Extreme value theory (from scipy.stats import genextreme)
Kernel density estimation (from scipy.stats import gaussian_kde)
Neural network risk management (from keras.models import Sequential; from keras.layers import Dense)

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 27)

Posted by Mark on February 16, 2021 at 06:59 | Last modified: February 15, 2021 17:10

In Part 26, I summarized my Datacamp courses 77-79. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #80 was Model Validation in Python. This course covers:

Introduction to model validation
Regression models (from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier)
Classification models
Creating train, test, and validation datasets (creating a holdout set)
Accuracy metrics: regression models (from sklearn.metrics import mean_absolute_error, mean_squared_error)
Classification metrics (from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score)
Problems with holdout sets
Cross-validation (from sklearn.model_selection import KFold)
cross_val_score (from sklearn.model_selection import cross_val_score; from sklearn.metrics import make_scorer)
Leave-one-out-cross-validation
Introduction to hyperparameter tuning
RandomizedSearchCV
Selecting your final model

My course #81 was Image Processing in Python. Topics covered in this course include:

Make images come alive with scikit-image (from skimage import data, color)
NumPy for images
Getting started with thresholding (from skimage.filters import try_all_threshold)
Jump into filtering (from skimage.filters import sobel, gaussian)
Contrast enhancement (from skimage import exposure)
Transformations (from skimage.transform import rotate, rescale, resize)
Morphology (from skimage import morphology)
Image restoration (from skimage.restoration import inpaint)
Noise (from skimage.util import random_noise; from skimage.restoration import denoise_tv_chambolle)
Superpixels and segmentation (from skimage.segmentation import slic; from skimage.color import label2rgb)
Finding contours (from skimage import measure)
Finding the edges with Canny (from skimage.feature import canny)
Right around the corner (from skimage.feature import corner_harris)
Face detection (from skimage.feature import Cascade)
Real-world applications

My course #82 was Recurrent Neural Networks for Language Modeling in Python. This course covers:

Introduction to the course
Introduction to RNN inside Keras (from keras.models import Sequential; from keras.layers import Dense)
Vanishing and exploding gradients
GRU and LSTM cells (from keras.layers import GRU, Dense; from keras.layers.recurrent import LSTM)
The embedding layer (from keras.layers import Embedding; from keras.initializers import Constant)
Sentiment classification revisited
Data pre-processing (from keras.utils.np_utils import to_categorical)
Transfer learning for language models (from gensim.models import word2vec, fasttext)
Multi-class classification models (from sklearn.datasets import fetch_20newsgroups)
Assessing the model’s performance (from sklearn.metrics import confusion_matrix, f1_score, classification_report)
Sequence to sequence models
Text generation models (from keras.preprocessing.text import Tokenizer)
Neural machine translation (from keras.preprocessing.sequence import pad_sequences)

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 26)

Posted by Mark on February 12, 2021 at 07:29 | Last modified: February 15, 2021 11:54

In Part 25, I summarized my Datacamp courses 74-76. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #77 was Visualizing Time Series Data in Python. This course covers:

Plot your first time series
Customize your time series plot
Clean your time series data (counting missing values in df)
Plot aggregates of your data
Summarizing the value in your time series data
Autocorrelation and partial autocorrelation (from statsmodels.graphics import tsaplots)
Seasonality, noise, and trend in time series data [from pylab import RCparams, sm.tsa.seasonal_decompose()]
Working with more than one time series
Plot multiple time series (adding statistical summaries to your plots)
Find relationships between multiple time series [sns.heatmap(), sns.clustermap()]
Apply your knowledge to a new dataset
Beyond summary statistics
Decompose time series data
Compute correlations between time series

My course #78 was Financial Forecasting in Python. Topics covered in this course include:

Introduction to financial statements
Calculating sales and the cost of goods sold
Working with raw datasets
Introduction to the balance sheet
Balance sheet efficiency ratios
Financial periods and how to work with them
The datetime library and Split function
Tips and tricks when working with datasets
Building sensitive forecast models and common forecast assumptions
Dependencies and sensitivity in financial forecasting
Working with variances in the forecast

My course #79 was Foundations of Probability in Python. This course covers:

Let’s flip a coin in Python (from scipy.stats import bernoulli, binom)
Probability mass and distribution functions
Expected value, mean, and variance (from scipy.stats import describe)
Calculating probabilities of two events (from scipy.stats import find_repeats, relfreq)
Conditional probabilities
Total probability law
Bayes’ rule
Normal distributions (from scipy.stats import norm, import matplotlib.pyplot as plt, import seaborn as sns)
Risk factors
Factor models
Portfolio analysis tools
Normal probabilities
Poisson distributions (from scipy.stats import poisson)
Geometric distributions (from scipy.stats import geom)
From sample mean to population mean (from scipy.stats import binom, describe)
Adding random variables
Linear regression (from sklearn.linear_model import LinearRegression, from scipy.stats import linregress)
Logistic regression (from sklearn.linear_model import LogisticRegression)

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 25)

Posted by Mark on February 9, 2021 at 07:29 | Last modified: February 12, 2021 09:25

In Part 24, I summarized my Datacamp courses 71-73. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #74 was Writing Functions in Python. Overall, I found this content to be quite challenging. The course covers:

Docstrings (require string)
DRY and “do one thing” [standardize function, mean_and_median()]
Pass by assignment
Using context managers
Writing context managers
Advanced topics
Functions as objects
Scope
Closures
Decorators
Real-world examples
Decorators and metadata (from functools import wraps)
Decorators that take arguments
Timeout(): a real-world example

My course #75 was AI Fundamentals. Topics covered in this course include:

What is all the AI fuss about?
All models are wrong but some are useful
Three flavors of machine learning
Supervised learning fundamentals
Training and evaluating classification models (confusion matrix, true/false positives/negatives)
Training and evaluating regression models (from sklearn.preprocessing import PolynomialFeatures)
Dimensionality reduction
Clustering
Anomaly detection
Selecting the right model
Deep learning and beyond
Convolutional neural networks

My course #76 was Introduction to Portfolio Analysis in Python. This course covers:

Welcome to portfolio analysis
Portfolio returns
Measuring risk of a portfolio (formatting as percentage)
Annualized returns
Risk-adjusted returns (calculating SR)
Non-normal distribution of returns
Alternative measures of risk
Comparing against a benchmark
Risk factors
Factor models
Portfolio analysis tools
MPT (from pypfopt.efficient_frontier import EfficientFrontier; from pypfopt import risk_models, expected_returns)
Maximum Sharpe vs. minimum volatility
Alternative portfolio optimization

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 24)

Posted by Mark on February 4, 2021 at 07:41 | Last modified: February 10, 2021 16:23

In Part 23, I summarized my Datacamp courses 68-70. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #71 was Improving Your Data Visualizations in Python. This course covers:

Highlighting data
Comparing groups
Annotations
Color in visualizations
Continuous color palettes
Categorical palettes
Point estimate intervals
Confidence bands
Beyond 95% (visualizing multiple confidence bands at once)
Visualizing the bootstrap
Looking at the farmers market data
Exploring the patterns
Making your visualizations efficient
Tweaking your plots

My course #72 was Command Line Automation in Python. Because I don’t use the shell much, I don’t see a whole lot of application here for me and I’m not sure how much I absorbed. In any case, topics covered in this course include:

Learn the Python interpreter
Capture IPython shell output
Automate with SList
Execute shell commands in subprocess (import subprocess; import os)
Capture output of shell commands (from subprocess import Popen, PIPE)
Sending input to processes
Passing arguments safely to shell commands
Dealing with file systems
Find files matching a pattern (from pathlib import Path; import fnmatch, re)
High-level file and directory operations (from shutil import copytree, ignore_patterns, rmtree, make_archive)
Using pathlib (from pathlib import Path)
Using functions for automation (from functools import wraps)
Understand script input
Introduction to click (import click)
Using click to write command line tools (from click.testing import CliRunner)

My course #73 was Unit Testing for Data Science in Python. This course covers:

Why unit test?
Write a simple unit test using pytest
Understanding test result report
More benefits and test types
Mastering assert statements
Testing for exceptions instead of return values
The well-tested function
Test driven development (TDD)
How to organize a growing set of tests?
Mastering test execution
Expected failures and conditional skipping
Continuous integration and code coverage
Beyond assertion: setup and teardown
Mocking (from unittest.mock import call)
Testing models
Testing plots

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Older Entries Newer Entries

Review of Python Courses (Part 33)

Review of Python Courses (Part 32)

Review of Python Courses (Part 31)

Review of Python Courses (Part 30)

Review of Python Courses (Part 29)

Review of Python Courses (Part 28)

Review of Python Courses (Part 27)

Review of Python Courses (Part 26)

Review of Python Courses (Part 25)

Review of Python Courses (Part 24)

Pages

Recent Posts

Categories