» 2021 » February Option Fanatic

Review of Python Courses (Part 30)

Posted by Mark on February 26, 2021 at 07:32 | Last modified: February 17, 2021 13:32

In Part 29, I summarized my Datacamp courses 86-88. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #89 was a case study in Python machine learning. This course covers:

Introducing the challenge
Exploring the data
Looking at the datatypes (converting dtype for all dataframe categories)
How do we measure success?
It’s time to build a model (from sklearn.linear_model import LogisticRegression)
Making predictions (from sklearn.multiclass import OneVsRestClassifier)
A very brief introduction to NLP
Representing text numerically (from sklearn.feature_extraction.text import CountVectorizer)
Pipelines, feature and text preprocessing (from sklearn.pipeline import Pipeline, FeatureUnion)
Text features and feature unions (from sklearn.preprocessing import FunctionTransformer, Imputer)
Choosing a classification model (from sklearn.ensemble import RandomForestClassifier)
Learning from the expert: processing
Learning from the expert: a stats trick (from sklearn.preprocessing import PolynomialFeatures)
Learning from the expert: the winning model (from sklearn.feature_extraction.text import HashingVectorizer)
Next steps and the social impact of your work

My course #90 was Ensemble Methods in Python. Topics covered in this course include:

Introduction to ensemble methods (from sklearn.ensemble import MetaEstimator)
Voting (from sklearn.ensemble import VotingClassifier, VotingRegresssor)
Averaging
The strength of “weak” models
Bootstrap aggregating
Bagging classifier: nuts and bolts
Bagging parameters: tips and tricks
The effectiveness of gradual learning
Adaptive boosting: award winning model (from sklearn.ensemble import AdaBoostClassifier, AdaBoostRegressor)
Gradient boosting (from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor)
Gradient boosting flavors (import xgboost as xgb; import lightgbm as lgb; import catboost as cb)
The intuition behind stacking
Build your first stacked ensemble
Let’s mlxtend it (from mlxtend.classifier import StackingClassifier; from mlxtend.regressor import StackingRegressor)!

My course #91 was Data Analysis in Spreadsheets. This course covers:

First function: ROUND
Function composition: SQRT
Functions and ranges: MIN, MAX
Selecting ranges: SUM, AVERAGE, MEDIAN
Multiple arguments: RANK
String manipulation: LEFT, RIGHT
String information: LEN, SEARCH
Combining strings: CONCATENATE
Date functions: WEEKDAY
Comparing dates
Combining functions
Flow control: IF
Nested logical functions: IF

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 29)

Posted by Mark on February 23, 2021 at 17:38 | Last modified: February 17, 2021 09:27

In Part 28, I summarized my Datacamp courses 83-85. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #86 was Hyperparameter Tuning in Python. This course covers:

Introduction (visualize Random Forest)
Hyperparameters overview
Hyperparameters values
Introducing grid search
Grid search with Scikit Learn (from sklearn import metrics)
Understanding grid search output
Introducing random search (from itertools import product)
Random search in Scikit Learn (from sklearn.model_selection import RandomizedSearchCV)
Comparing grid and random search
Informed search: coarse to fine
Informed methods: Bayesian statistics
Informed methods: genetic algorithms (from tpot import TPOTClassifier)

My course #87 was Case Studies in Statistical Thinking. Topics covered in this course include:

Activity of zebrafish and melatonin
Bootstrap confidence intervals
Hypothesis tests (permutation test)
Linear regression and pairs bootstrap
Introduction to swimming data
Do swimmers go faster in the finals (how to do permutation test exercise)?
How does the performance of swimmers decline over long events?
Introduction to the current controversy
The zigzag effect
Recap of swimming analysis
Introduction to statistical seismology
Timing of major earthquakes
How are the Parkfield interearthquake times distributed?
Variations in earthquake frequency and seismicity
Earthquake magnitudes in Oklahoma

My course #88 was Analyzing Police Activity with Pandas. This course covers:

Stanford Open Policing Project dataset (count missing values in each column)
Using proper data types
Creating a Datetime index
Do the genders commit different violations?
Does gender affect who gets a ticket for speeding?
Does gender affect whose vehicle is searched?
Does gender affect who is frisked during a search?
Does time of day affect arrest rate?
Are drug-related stops on the rise?
What violations are caught in each district?
How long might you be stopped for a violation?
Exploring the weather dataset
Categorizing the weather
Merging datasets
Does weather affect the arrest rate?

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 28)

Posted by Mark on February 18, 2021 at 07:13 | Last modified: February 16, 2021 13:35

In Part 27, I summarized my Datacamp courses 80-82. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #83 was GARCH Models in Python. This course covers:

Why do we need GARCH models?
What are ARCH and GARCH (from statsmodels.graphics.tsaplots import plot_acf)?
How to implement GARCH models in Python (from arch import arch_model)?
Distribution assumptions
Mean model specifications
Volatility models for asymmetric shocks
GARCH rolling window forecast
Significance testing of model parameters
Validation of GARCH model assumptions (from statsmodels.stats.diagnostic import acorr_ljungbox)
Goodness of fit measures
GARCH model backtesting (from sklearn.metrics import mean_absolute_error, mean_squared_error)
VaR in financial risk management
Dynamic covariance in portfolio optimization
Dynamic beta in portfolio management

My course #84 was Cleaning Data in Python. Topics covered in this course include:

Data type constraints (dataframe preprocessing)
Data range constraints (use .loc[] to assign column values)
Uniqueness constraints (needs further review)
Membership constraints
Categorical variables
Cleaning text data
Uniformity
Cross field validation
Completeness [count .isna(); import missingno as msno]
Comparing strings (from fuzzywuzzy import fuzz, process)
Generating pairs (import recordlinkage)
Linking dataframes

My course #85 was Quantitative Risk Management in Python. This was deep and needs more study. The course covers:

What is quantitative risk management?
Risk management and the financial crisis (import statsmodels.api as sm)
Modern portfolio theory (from pypfopt.expected_returns import mean_historical_return)
Measuring risk (from pypfopt.risk_models import CovarianceShrinkage)
Risk exposure and loss (from scipy.stats import t)
Risk management using VaR and CVaR (from pypfopt.objective_functions import negative_cvar)
Portfolio hedging: offsetting risk
Parametric estimation (from scipy.stats import norm, anderson, skewnorm, skewtest)
Historical and Monte Carlo simulation
Structural breaks
Volatility and extreme values
Extreme value theory (from scipy.stats import genextreme)
Kernel density estimation (from scipy.stats import gaussian_kde)
Neural network risk management (from keras.models import Sequential; from keras.layers import Dense)

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 27)

Posted by Mark on February 16, 2021 at 06:59 | Last modified: February 15, 2021 17:10

In Part 26, I summarized my Datacamp courses 77-79. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #80 was Model Validation in Python. This course covers:

Introduction to model validation
Regression models (from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier)
Classification models
Creating train, test, and validation datasets (creating a holdout set)
Accuracy metrics: regression models (from sklearn.metrics import mean_absolute_error, mean_squared_error)
Classification metrics (from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score)
Problems with holdout sets
Cross-validation (from sklearn.model_selection import KFold)
cross_val_score (from sklearn.model_selection import cross_val_score; from sklearn.metrics import make_scorer)
Leave-one-out-cross-validation
Introduction to hyperparameter tuning
RandomizedSearchCV
Selecting your final model

My course #81 was Image Processing in Python. Topics covered in this course include:

Make images come alive with scikit-image (from skimage import data, color)
NumPy for images
Getting started with thresholding (from skimage.filters import try_all_threshold)
Jump into filtering (from skimage.filters import sobel, gaussian)
Contrast enhancement (from skimage import exposure)
Transformations (from skimage.transform import rotate, rescale, resize)
Morphology (from skimage import morphology)
Image restoration (from skimage.restoration import inpaint)
Noise (from skimage.util import random_noise; from skimage.restoration import denoise_tv_chambolle)
Superpixels and segmentation (from skimage.segmentation import slic; from skimage.color import label2rgb)
Finding contours (from skimage import measure)
Finding the edges with Canny (from skimage.feature import canny)
Right around the corner (from skimage.feature import corner_harris)
Face detection (from skimage.feature import Cascade)
Real-world applications

My course #82 was Recurrent Neural Networks for Language Modeling in Python. This course covers:

Introduction to the course
Introduction to RNN inside Keras (from keras.models import Sequential; from keras.layers import Dense)
Vanishing and exploding gradients
GRU and LSTM cells (from keras.layers import GRU, Dense; from keras.layers.recurrent import LSTM)
The embedding layer (from keras.layers import Embedding; from keras.initializers import Constant)
Sentiment classification revisited
Data pre-processing (from keras.utils.np_utils import to_categorical)
Transfer learning for language models (from gensim.models import word2vec, fasttext)
Multi-class classification models (from sklearn.datasets import fetch_20newsgroups)
Assessing the model’s performance (from sklearn.metrics import confusion_matrix, f1_score, classification_report)
Sequence to sequence models
Text generation models (from keras.preprocessing.text import Tokenizer)
Neural machine translation (from keras.preprocessing.sequence import pad_sequences)

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 26)

Posted by Mark on February 12, 2021 at 07:29 | Last modified: February 15, 2021 11:54

In Part 25, I summarized my Datacamp courses 74-76. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #77 was Visualizing Time Series Data in Python. This course covers:

Plot your first time series
Customize your time series plot
Clean your time series data (counting missing values in df)
Plot aggregates of your data
Summarizing the value in your time series data
Autocorrelation and partial autocorrelation (from statsmodels.graphics import tsaplots)
Seasonality, noise, and trend in time series data [from pylab import RCparams, sm.tsa.seasonal_decompose()]
Working with more than one time series
Plot multiple time series (adding statistical summaries to your plots)
Find relationships between multiple time series [sns.heatmap(), sns.clustermap()]
Apply your knowledge to a new dataset
Beyond summary statistics
Decompose time series data
Compute correlations between time series

My course #78 was Financial Forecasting in Python. Topics covered in this course include:

Introduction to financial statements
Calculating sales and the cost of goods sold
Working with raw datasets
Introduction to the balance sheet
Balance sheet efficiency ratios
Financial periods and how to work with them
The datetime library and Split function
Tips and tricks when working with datasets
Building sensitive forecast models and common forecast assumptions
Dependencies and sensitivity in financial forecasting
Working with variances in the forecast

My course #79 was Foundations of Probability in Python. This course covers:

Let’s flip a coin in Python (from scipy.stats import bernoulli, binom)
Probability mass and distribution functions
Expected value, mean, and variance (from scipy.stats import describe)
Calculating probabilities of two events (from scipy.stats import find_repeats, relfreq)
Conditional probabilities
Total probability law
Bayes’ rule
Normal distributions (from scipy.stats import norm, import matplotlib.pyplot as plt, import seaborn as sns)
Risk factors
Factor models
Portfolio analysis tools
Normal probabilities
Poisson distributions (from scipy.stats import poisson)
Geometric distributions (from scipy.stats import geom)
From sample mean to population mean (from scipy.stats import binom, describe)
Adding random variables
Linear regression (from sklearn.linear_model import LinearRegression, from scipy.stats import linregress)
Logistic regression (from sklearn.linear_model import LogisticRegression)

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 25)

Posted by Mark on February 9, 2021 at 07:29 | Last modified: February 12, 2021 09:25

In Part 24, I summarized my Datacamp courses 71-73. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #74 was Writing Functions in Python. Overall, I found this content to be quite challenging. The course covers:

Docstrings (require string)
DRY and “do one thing” [standardize function, mean_and_median()]
Pass by assignment
Using context managers
Writing context managers
Advanced topics
Functions as objects
Scope
Closures
Decorators
Real-world examples
Decorators and metadata (from functools import wraps)
Decorators that take arguments
Timeout(): a real-world example

My course #75 was AI Fundamentals. Topics covered in this course include:

What is all the AI fuss about?
All models are wrong but some are useful
Three flavors of machine learning
Supervised learning fundamentals
Training and evaluating classification models (confusion matrix, true/false positives/negatives)
Training and evaluating regression models (from sklearn.preprocessing import PolynomialFeatures)
Dimensionality reduction
Clustering
Anomaly detection
Selecting the right model
Deep learning and beyond
Convolutional neural networks

My course #76 was Introduction to Portfolio Analysis in Python. This course covers:

Welcome to portfolio analysis
Portfolio returns
Measuring risk of a portfolio (formatting as percentage)
Annualized returns
Risk-adjusted returns (calculating SR)
Non-normal distribution of returns
Alternative measures of risk
Comparing against a benchmark
Risk factors
Factor models
Portfolio analysis tools
MPT (from pypfopt.efficient_frontier import EfficientFrontier; from pypfopt import risk_models, expected_returns)
Maximum Sharpe vs. minimum volatility
Alternative portfolio optimization

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 24)

Posted by Mark on February 4, 2021 at 07:41 | Last modified: February 10, 2021 16:23

In Part 23, I summarized my Datacamp courses 68-70. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #71 was Improving Your Data Visualizations in Python. This course covers:

Highlighting data
Comparing groups
Annotations
Color in visualizations
Continuous color palettes
Categorical palettes
Point estimate intervals
Confidence bands
Beyond 95% (visualizing multiple confidence bands at once)
Visualizing the bootstrap
Looking at the farmers market data
Exploring the patterns
Making your visualizations efficient
Tweaking your plots

My course #72 was Command Line Automation in Python. Because I don’t use the shell much, I don’t see a whole lot of application here for me and I’m not sure how much I absorbed. In any case, topics covered in this course include:

Learn the Python interpreter
Capture IPython shell output
Automate with SList
Execute shell commands in subprocess (import subprocess; import os)
Capture output of shell commands (from subprocess import Popen, PIPE)
Sending input to processes
Passing arguments safely to shell commands
Dealing with file systems
Find files matching a pattern (from pathlib import Path; import fnmatch, re)
High-level file and directory operations (from shutil import copytree, ignore_patterns, rmtree, make_archive)
Using pathlib (from pathlib import Path)
Using functions for automation (from functools import wraps)
Understand script input
Introduction to click (import click)
Using click to write command line tools (from click.testing import CliRunner)

My course #73 was Unit Testing for Data Science in Python. This course covers:

Why unit test?
Write a simple unit test using pytest
Understanding test result report
More benefits and test types
Mastering assert statements
Testing for exceptions instead of return values
The well-tested function
Test driven development (TDD)
How to organize a growing set of tests?
Mastering test execution
Expected failures and conditional skipping
Continuous integration and code coverage
Beyond assertion: setup and teardown
Mocking (from unittest.mock import call)
Testing models
Testing plots

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 23)

Posted by Mark on February 1, 2021 at 07:34 | Last modified: February 10, 2021 10:35

In Part 22, I summarized my Datacamp courses 65-67. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #68 was Linear Classifiers in Python. This course covers:

Introduction (import sklearn.datasets)
Applying logistic regression and SVM (general process, from sklearn.svm import LinearSVC)
Linear decision boundaries
Linear classifiers: prediction equations
What is a loss function (from scipy.optimize import minimize)?
Loss function diagrams
Logistic regression and regularization
Logistic regression and probabilities
Multi-class logistic regression
Support vectors
Kernel SVMs
Comparing logistic regression and SVM (from sklearn.linear_model import SGDClassifier)

My course #69 was Analyzing Social Media Data in Python. While I found this somewhat interesting, it seemed to incorporate as much JSON as it did Python. I have a hard enough time studying one new language—adding a second on top of that made things even more confusing for me:

Analyzing Twitter data
Collecting data through the Twitter API (from tweepy import Stream, OAuthHandler, API)
Understanding Twitter JSON
Processing Twitter text
Counting words
Time series
Sentiment analysis
Twitter networks
Importing and visualizing Twitter networks (import networkx as nx)
Node-level metrics
Maps and Twitter data
Geographical data in Twitter JSON
Creating Twitter maps (from mpl_toolkits.basemap import Basemap)

My course #70 was Fraud Detection in Python. This course covers:

Introduction to fraud detection
Increasing successful detections using data resampling (from imblearn.over_sampling import RandomOverSampler)
Fraud detection algorithms in action (from imblearn.pipeline import Pipeline)
Review of classification methods
Performance evaluation (from sklearn.metrics import precision_recall_curve, average_precision_score)
More performance evaluation (from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score)
Adjusting your algorithm weights
Performance evaluation (from sklearn.model_selection import GridSearchCV)
Ensemble methods (from sklearn.ensemble import VotingClassifier)
Normal versus abnormal behavior
Clustering methods (from sklearn.preprocessing import MinMaxScaler; from sklearn.cluster import MiniBatchKMeans)
Assigning fraud versus non-fraud
Other clustering fraud detection methods (from sklearn.cluster import DBSCAN)
Using text data (from nltk import word_tokenize; import string)
Text mining to detect fraud (from nltk.corpus import stopwords; from nltk.stem.wordnet import WordNetLemmatizer)
Topic modeling on fraud (from gensim import corpora)
Flagged fraud based on topics (import pyLDAvis.gensim for use with Jupyter Notebooks only)

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 30)

Review of Python Courses (Part 29)

Review of Python Courses (Part 28)

Review of Python Courses (Part 27)

Review of Python Courses (Part 26)

Review of Python Courses (Part 25)

Review of Python Courses (Part 24)

Review of Python Courses (Part 23)

Pages

Recent Posts

Categories