Review of Python Courses (Part 30)
Posted by Mark on February 26, 2021 at 07:32 | Last modified: February 17, 2021 13:32In Part 29, I summarized my Datacamp courses 86-88. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #89 was a case study in Python machine learning. This course covers:
- Introducing the challenge
- Exploring the data
- Looking at the datatypes (converting dtype for all dataframe categories)
- How do we measure success?
- It’s time to build a model (from sklearn.linear_model import LogisticRegression)
- Making predictions (from sklearn.multiclass import OneVsRestClassifier)
- A very brief introduction to NLP
- Representing text numerically (from sklearn.feature_extraction.text import CountVectorizer)
- Pipelines, feature and text preprocessing (from sklearn.pipeline import Pipeline, FeatureUnion)
- Text features and feature unions (from sklearn.preprocessing import FunctionTransformer, Imputer)
- Choosing a classification model (from sklearn.ensemble import RandomForestClassifier)
- Learning from the expert: processing
- Learning from the expert: a stats trick (from sklearn.preprocessing import PolynomialFeatures)
- Learning from the expert: the winning model (from sklearn.feature_extraction.text import HashingVectorizer)
- Next steps and the social impact of your work
>
My course #90 was Ensemble Methods in Python. Topics covered in this course include:
- Introduction to ensemble methods (from sklearn.ensemble import MetaEstimator)
- Voting (from sklearn.ensemble import VotingClassifier, VotingRegresssor)
- Averaging
- The strength of “weak” models
- Bootstrap aggregating
- Bagging classifier: nuts and bolts
- Bagging parameters: tips and tricks
- The effectiveness of gradual learning
- Adaptive boosting: award winning model (from sklearn.ensemble import AdaBoostClassifier, AdaBoostRegressor)
- Gradient boosting (from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor)
- Gradient boosting flavors (import xgboost as xgb; import lightgbm as lgb; import catboost as cb)
- The intuition behind stacking
- Build your first stacked ensemble
- Let’s mlxtend it (from mlxtend.classifier import StackingClassifier; from mlxtend.regressor import StackingRegressor)!
>
My course #91 was Data Analysis in Spreadsheets. This course covers:
- First function: ROUND
- Function composition: SQRT
- Functions and ranges: MIN, MAX
- Selecting ranges: SUM, AVERAGE, MEDIAN
- Multiple arguments: RANK
- String manipulation: LEFT, RIGHT
- String information: LEN, SEARCH
- Combining strings: CONCATENATE
- Date functions: WEEKDAY
- Comparing dates
- Combining functions
- Flow control: IF
- Nested logical functions: IF
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 29)
Posted by Mark on February 23, 2021 at 17:38 | Last modified: February 17, 2021 09:27In Part 28, I summarized my Datacamp courses 83-85. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #86 was Hyperparameter Tuning in Python. This course covers:
- Introduction (visualize Random Forest)
- Hyperparameters overview
- Hyperparameters values
- Introducing grid search
- Grid search with Scikit Learn (from sklearn import metrics)
- Understanding grid search output
- Introducing random search (from itertools import product)
- Random search in Scikit Learn (from sklearn.model_selection import RandomizedSearchCV)
- Comparing grid and random search
- Informed search: coarse to fine
- Informed methods: Bayesian statistics
- Informed methods: genetic algorithms (from tpot import TPOTClassifier)
>
My course #87 was Case Studies in Statistical Thinking. Topics covered in this course include:
- Activity of zebrafish and melatonin
- Bootstrap confidence intervals
- Hypothesis tests (permutation test)
- Linear regression and pairs bootstrap
- Introduction to swimming data
- Do swimmers go faster in the finals (how to do permutation test exercise)?
- How does the performance of swimmers decline over long events?
- Introduction to the current controversy
- The zigzag effect
- Recap of swimming analysis
- Introduction to statistical seismology
- Timing of major earthquakes
- How are the Parkfield interearthquake times distributed?
- Variations in earthquake frequency and seismicity
- Earthquake magnitudes in Oklahoma
>
My course #88 was Analyzing Police Activity with Pandas. This course covers:
- Stanford Open Policing Project dataset (count missing values in each column)
- Using proper data types
- Creating a Datetime index
- Do the genders commit different violations?
- Does gender affect who gets a ticket for speeding?
- Does gender affect whose vehicle is searched?
- Does gender affect who is frisked during a search?
- Does time of day affect arrest rate?
- Are drug-related stops on the rise?
- What violations are caught in each district?
- How long might you be stopped for a violation?
- Exploring the weather dataset
- Categorizing the weather
- Merging datasets
- Does weather affect the arrest rate?
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 28)
Posted by Mark on February 18, 2021 at 07:13 | Last modified: February 16, 2021 13:35In Part 27, I summarized my Datacamp courses 80-82. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #83 was GARCH Models in Python. This course covers:
- Why do we need GARCH models?
- What are ARCH and GARCH (from statsmodels.graphics.tsaplots import plot_acf)?
- How to implement GARCH models in Python (from arch import arch_model)?
- Distribution assumptions
- Mean model specifications
- Volatility models for asymmetric shocks
- GARCH rolling window forecast
- Significance testing of model parameters
- Validation of GARCH model assumptions (from statsmodels.stats.diagnostic import acorr_ljungbox)
- Goodness of fit measures
- GARCH model backtesting (from sklearn.metrics import mean_absolute_error, mean_squared_error)
- VaR in financial risk management
- Dynamic covariance in portfolio optimization
- Dynamic beta in portfolio management
>
My course #84 was Cleaning Data in Python. Topics covered in this course include:
- Data type constraints (dataframe preprocessing)
- Data range constraints (use .loc[] to assign column values)
- Uniqueness constraints (needs further review)
- Membership constraints
- Categorical variables
- Cleaning text data
- Uniformity
- Cross field validation
- Completeness [count .isna(); import missingno as msno]
- Comparing strings (from fuzzywuzzy import fuzz, process)
- Generating pairs (import recordlinkage)
- Linking dataframes
>
My course #85 was Quantitative Risk Management in Python. This was deep and needs more study. The course covers:
- What is quantitative risk management?
- Risk management and the financial crisis (import statsmodels.api as sm)
- Modern portfolio theory (from pypfopt.expected_returns import mean_historical_return)
- Measuring risk (from pypfopt.risk_models import CovarianceShrinkage)
- Risk exposure and loss (from scipy.stats import t)
- Risk management using VaR and CVaR (from pypfopt.objective_functions import negative_cvar)
- Portfolio hedging: offsetting risk
- Parametric estimation (from scipy.stats import norm, anderson, skewnorm, skewtest)
- Historical and Monte Carlo simulation
- Structural breaks
- Volatility and extreme values
- Extreme value theory (from scipy.stats import genextreme)
- Kernel density estimation (from scipy.stats import gaussian_kde)
- Neural network risk management (from keras.models import Sequential; from keras.layers import Dense)
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 27)
Posted by Mark on February 16, 2021 at 06:59 | Last modified: February 15, 2021 17:10In Part 26, I summarized my Datacamp courses 77-79. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #80 was Model Validation in Python. This course covers:
- Introduction to model validation
- Regression models (from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier)
- Classification models
- Creating train, test, and validation datasets (creating a holdout set)
- Accuracy metrics: regression models (from sklearn.metrics import mean_absolute_error, mean_squared_error)
- Classification metrics (from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score)
- Problems with holdout sets
- Cross-validation (from sklearn.model_selection import KFold)
- cross_val_score (from sklearn.model_selection import cross_val_score; from sklearn.metrics import make_scorer)
- Leave-one-out-cross-validation
- Introduction to hyperparameter tuning
- RandomizedSearchCV
- Selecting your final model
>
My course #81 was Image Processing in Python. Topics covered in this course include:
- Make images come alive with scikit-image (from skimage import data, color)
- NumPy for images
- Getting started with thresholding (from skimage.filters import try_all_threshold)
- Jump into filtering (from skimage.filters import sobel, gaussian)
- Contrast enhancement (from skimage import exposure)
- Transformations (from skimage.transform import rotate, rescale, resize)
- Morphology (from skimage import morphology)
- Image restoration (from skimage.restoration import inpaint)
- Noise (from skimage.util import random_noise; from skimage.restoration import denoise_tv_chambolle)
- Superpixels and segmentation (from skimage.segmentation import slic; from skimage.color import label2rgb)
- Finding contours (from skimage import measure)
- Finding the edges with Canny (from skimage.feature import canny)
- Right around the corner (from skimage.feature import corner_harris)
- Face detection (from skimage.feature import Cascade)
- Real-world applications
>
My course #82 was Recurrent Neural Networks for Language Modeling in Python. This course covers:
- Introduction to the course
- Introduction to RNN inside Keras (from keras.models import Sequential; from keras.layers import Dense)
- Vanishing and exploding gradients
- GRU and LSTM cells (from keras.layers import GRU, Dense; from keras.layers.recurrent import LSTM)
- The embedding layer (from keras.layers import Embedding; from keras.initializers import Constant)
- Sentiment classification revisited
- Data pre-processing (from keras.utils.np_utils import to_categorical)
- Transfer learning for language models (from gensim.models import word2vec, fasttext)
- Multi-class classification models (from sklearn.datasets import fetch_20newsgroups)
- Assessing the model’s performance (from sklearn.metrics import confusion_matrix, f1_score, classification_report)
- Sequence to sequence models
- Text generation models (from keras.preprocessing.text import Tokenizer)
- Neural machine translation (from keras.preprocessing.sequence import pad_sequences)
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 26)
Posted by Mark on February 12, 2021 at 07:29 | Last modified: February 15, 2021 11:54In Part 25, I summarized my Datacamp courses 74-76. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #77 was Visualizing Time Series Data in Python. This course covers:
- Plot your first time series
- Customize your time series plot
- Clean your time series data (counting missing values in df)
- Plot aggregates of your data
- Summarizing the value in your time series data
- Autocorrelation and partial autocorrelation (from statsmodels.graphics import tsaplots)
- Seasonality, noise, and trend in time series data [from pylab import RCparams, sm.tsa.seasonal_decompose()]
- Working with more than one time series
- Plot multiple time series (adding statistical summaries to your plots)
- Find relationships between multiple time series [sns.heatmap(), sns.clustermap()]
- Apply your knowledge to a new dataset
- Beyond summary statistics
- Decompose time series data
- Compute correlations between time series
>
My course #78 was Financial Forecasting in Python. Topics covered in this course include:
- Introduction to financial statements
- Calculating sales and the cost of goods sold
- Working with raw datasets
- Introduction to the balance sheet
- Balance sheet efficiency ratios
- Financial periods and how to work with them
- The datetime library and Split function
- Tips and tricks when working with datasets
- Building sensitive forecast models and common forecast assumptions
- Dependencies and sensitivity in financial forecasting
- Working with variances in the forecast
>
My course #79 was Foundations of Probability in Python. This course covers:
- Let’s flip a coin in Python (from scipy.stats import bernoulli, binom)
- Probability mass and distribution functions
- Expected value, mean, and variance (from scipy.stats import describe)
- Calculating probabilities of two events (from scipy.stats import find_repeats, relfreq)
- Conditional probabilities
- Total probability law
- Bayes’ rule
- Normal distributions (from scipy.stats import norm, import matplotlib.pyplot as plt, import seaborn as sns)
- Risk factors
- Factor models
- Portfolio analysis tools
- Normal probabilities
- Poisson distributions (from scipy.stats import poisson)
- Geometric distributions (from scipy.stats import geom)
- From sample mean to population mean (from scipy.stats import binom, describe)
- Adding random variables
- Linear regression (from sklearn.linear_model import LinearRegression, from scipy.stats import linregress)
- Logistic regression (from sklearn.linear_model import LogisticRegression)
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 25)
Posted by Mark on February 9, 2021 at 07:29 | Last modified: February 12, 2021 09:25In Part 24, I summarized my Datacamp courses 71-73. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #74 was Writing Functions in Python. Overall, I found this content to be quite challenging. The course covers:
- Docstrings (require string)
- DRY and “do one thing” [standardize function, mean_and_median()]
- Pass by assignment
- Using context managers
- Writing context managers
- Advanced topics
- Functions as objects
- Scope
- Closures
- Decorators
- Real-world examples
- Decorators and metadata (from functools import wraps)
- Decorators that take arguments
- Timeout(): a real-world example
>
My course #75 was AI Fundamentals. Topics covered in this course include:
- What is all the AI fuss about?
- All models are wrong but some are useful
- Three flavors of machine learning
- Supervised learning fundamentals
- Training and evaluating classification models (confusion matrix, true/false positives/negatives)
- Training and evaluating regression models (from sklearn.preprocessing import PolynomialFeatures)
- Dimensionality reduction
- Clustering
- Anomaly detection
- Selecting the right model
- Deep learning and beyond
- Convolutional neural networks
>
My course #76 was Introduction to Portfolio Analysis in Python. This course covers:
- Welcome to portfolio analysis
- Portfolio returns
- Measuring risk of a portfolio (formatting as percentage)
- Annualized returns
- Risk-adjusted returns (calculating SR)
- Non-normal distribution of returns
- Alternative measures of risk
- Comparing against a benchmark
- Risk factors
- Factor models
- Portfolio analysis tools
- MPT (from pypfopt.efficient_frontier import EfficientFrontier; from pypfopt import risk_models, expected_returns)
- Maximum Sharpe vs. minimum volatility
- Alternative portfolio optimization
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 24)
Posted by Mark on February 4, 2021 at 07:41 | Last modified: February 10, 2021 16:23In Part 23, I summarized my Datacamp courses 68-70. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #71 was Improving Your Data Visualizations in Python. This course covers:
- Highlighting data
- Comparing groups
- Annotations
- Color in visualizations
- Continuous color palettes
- Categorical palettes
- Point estimate intervals
- Confidence bands
- Beyond 95% (visualizing multiple confidence bands at once)
- Visualizing the bootstrap
- Looking at the farmers market data
- Exploring the patterns
- Making your visualizations efficient
- Tweaking your plots
>
My course #72 was Command Line Automation in Python. Because I don’t use the shell much, I don’t see a whole lot of application here for me and I’m not sure how much I absorbed. In any case, topics covered in this course include:
- Learn the Python interpreter
- Capture IPython shell output
- Automate with SList
- Execute shell commands in subprocess (import subprocess; import os)
- Capture output of shell commands (from subprocess import Popen, PIPE)
- Sending input to processes
- Passing arguments safely to shell commands
- Dealing with file systems
- Find files matching a pattern (from pathlib import Path; import fnmatch, re)
- High-level file and directory operations (from shutil import copytree, ignore_patterns, rmtree, make_archive)
- Using pathlib (from pathlib import Path)
- Using functions for automation (from functools import wraps)
- Understand script input
- Introduction to click (import click)
- Using click to write command line tools (from click.testing import CliRunner)
>
My course #73 was Unit Testing for Data Science in Python. This course covers:
- Why unit test?
- Write a simple unit test using pytest
- Understanding test result report
- More benefits and test types
- Mastering assert statements
- Testing for exceptions instead of return values
- The well-tested function
- Test driven development (TDD)
- How to organize a growing set of tests?
- Mastering test execution
- Expected failures and conditional skipping
- Continuous integration and code coverage
- Beyond assertion: setup and teardown
- Mocking (from unittest.mock import call)
- Testing models
- Testing plots
>
I will review more courses next time.
Categories: Python | Comments (0) | PermalinkReview of Python Courses (Part 23)
Posted by Mark on February 1, 2021 at 07:34 | Last modified: February 10, 2021 10:35In Part 22, I summarized my Datacamp courses 65-67. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #68 was Linear Classifiers in Python. This course covers:
- Introduction (import sklearn.datasets)
- Applying logistic regression and SVM (general process, from sklearn.svm import LinearSVC)
- Linear decision boundaries
- Linear classifiers: prediction equations
- What is a loss function (from scipy.optimize import minimize)?
- Loss function diagrams
- Logistic regression and regularization
- Logistic regression and probabilities
- Multi-class logistic regression
- Support vectors
- Kernel SVMs
- Comparing logistic regression and SVM (from sklearn.linear_model import SGDClassifier)
>
My course #69 was Analyzing Social Media Data in Python. While I found this somewhat interesting, it seemed to incorporate as much JSON as it did Python. I have a hard enough time studying one new language—adding a second on top of that made things even more confusing for me:
- Analyzing Twitter data
- Collecting data through the Twitter API (from tweepy import Stream, OAuthHandler, API)
- Understanding Twitter JSON
- Processing Twitter text
- Counting words
- Time series
- Sentiment analysis
- Twitter networks
- Importing and visualizing Twitter networks (import networkx as nx)
- Node-level metrics
- Maps and Twitter data
- Geographical data in Twitter JSON
- Creating Twitter maps (from mpl_toolkits.basemap import Basemap)
>
My course #70 was Fraud Detection in Python. This course covers:
- Introduction to fraud detection
- Increasing successful detections using data resampling (from imblearn.over_sampling import RandomOverSampler)
- Fraud detection algorithms in action (from imblearn.pipeline import Pipeline)
- Review of classification methods
- Performance evaluation (from sklearn.metrics import precision_recall_curve, average_precision_score)
- More performance evaluation (from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score)
- Adjusting your algorithm weights
- Performance evaluation (from sklearn.model_selection import GridSearchCV)
- Ensemble methods (from sklearn.ensemble import VotingClassifier)
- Normal versus abnormal behavior
- Clustering methods (from sklearn.preprocessing import MinMaxScaler; from sklearn.cluster import MiniBatchKMeans)
- Assigning fraud versus non-fraud
- Other clustering fraud detection methods (from sklearn.cluster import DBSCAN)
- Using text data (from nltk import word_tokenize; import string)
- Text mining to detect fraud (from nltk.corpus import stopwords; from nltk.stem.wordnet import WordNetLemmatizer)
- Topic modeling on fraud (from gensim import corpora)
- Flagged fraud based on topics (import pyLDAvis.gensim for use with Jupyter Notebooks only)
>
I will review more courses next time.
Categories: Python | Comments (0) | Permalink