» Python Option Fanatic

Review of Python Courses (Part 13)

Posted by Mark on December 29, 2020 at 07:48 | Last modified: February 2, 2021 11:44

In Part 12, I summarized my Datacamp courses 35-37. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #38 was Web Scraping in Python. This gets complicated with some objected-oriented stuff that still throws me for a loop (no pun intended). I don’t think I will be using this anytime soon so I skimmed it in this review:

Web scraping with Python
HyperText Markup Language (HTML)
HTML tags and attributes
Crash course X
Off the beaten XPath
Introduction to the scrapy Selector (from scrapy import Selector)
“Inspecting the HTML”
CSS locators
Attribute and text selection
Getting ready to crawl
Scraping for reals
A classy spider (from scrapy.crawler import CrawlerProcess)
A request for service
Move your bloomin’ parse
Capstone

My course #39 was Working with the Class System in Python. Like #38, this gets thick. The course covers:

Intro to Object Oriented Programming (OOP) in Python
Introduction to NumPy internals
Introduction to objects and classes
Deep dive on classes
__Init__ializing a class
Methods in classes
Working with a dataset to create dataframes
Renaming columns and the five-figure summary
OOP best practices
Inheritance: is-a versus has-a
Inheritance with DataShells
Composition
Wrapping up OOP

My course #40 was Sentiment Analysis in Python. This course covers:

What is sentiment analysis?
Sentiment analysis types and approaches (from textblob import TextBlob)
Let’s build a word cloud (from wordcloud import WordCloud)!
Bag-of-words (from sklearn.feature_extraction.text import CountVectorizer)
Getting granular with n-grams
Build new features from text (from nltk import word_tokenize)
Can you guess the language (from langdetect import detect_langs)?
Stop words (from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS)
Capturing a token pattern [.isalpha(), .isdigit(), .isalnum()]
Stemming and lemmatization (from nltk.stem import PorterStemmer, WordNetLemmatizer)
TfIdf: more ways to transform text (from sklearn.feature_extraction.text import TfidfVectorizer)
Let’s predict the sentiment (from sklearn.linear_model import LogisticRegression)!
Did we really predict the sentiment well (from sklearn.metrics import accuracy_score, confusion_matrix)?
Logistic regression: revisited
Bringing it all together

I will review more classes next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 12)

Posted by Mark on December 24, 2020 at 07:39 | Last modified: February 1, 2021 15:43

In Part 11, I summarized my Datacamp courses 31-34. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #35 was Intermediate Data Visualization with Seaborn. This course covers:

Introduction to Seaborn [histogram vs. sns.distplot()]
Using the distribution plot
Regression plots in Seaborn [sns.regplot(), sns.lmplot()]
Using Seaborn styles [sns.set_style(), sns.despine()]
Colors in Seaborn
Customizing with matplotlib (using Axes)
Categorical plot types
Regression plots [sns.regplot(), sns.residplot()]
Matrix plots [sns.heatmap(pd.crosstab())]
Using FacetGrid, factorplot, lmplot
Using PairGrid and pairplot
Using JointGrid and jointplot
Selecting Seaborn plots

My course #36 was Introduction to Data Visualization with Seaborn (taking #35 before this was an oversight on my part, but everything ended up okay). This course covers:

Introduction to Seaborn
Using pandas with Seaborn
Adding a third variable with hue
Introduction to relational plots and subplots
Customizing scatter plots
Introduction to line plots
Count plots and bar plots [sns.catplot()]
Creating a box plot
Point plots
Changing plot style and color
Adding titles and labels (FacetGrid vs. AxesSubplot)

My course #37 was Unsupervised Learning in Python. This course covers:

Unsupervised learning (from sklearn.cluster import KMeans)
Evaluating a clustering
Transforming features for better clustering (from sklearn.preprocessing import StandardScaler)
Visualizing hierarchies (from scipy.cluster.hierarchy import linkage, dendrogram)
Cluster labels in hierarchical clustering
t-SNE for 2-dimensional maps (from sklearn.manifold import TSNE)
Visualizing the PCA transformation (from sklearn.decomposition import PCA)
Intrinsic dimension
Dimension reduction with PCA (from sklearn.decomposition import TruncatedSVD)
Non-negative matrix factorization (NMF) (from sklearn.decomposition import NMF)
NMF learns interpretable parts
Building recommender systems using NMF (From sklearn.preprocessing import normalize)

I will review more classes next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 11)

Posted by Mark on December 21, 2020 at 07:41 | Last modified: February 1, 2021 11:34

In Part 10, I summarized my Datacamp courses 28-30. Today I will continue with the next four.

As a reminder, I introduced you to my recent work learning Python here.

My course #31 was Customer Analytics and A/B Testing in Python. This course covers:

What is A/B testing?
Identifying and understanding KPIs
Exploratory analysis of KPIs
Calculating KPIs—a practical example
Working with time series data in pandas
Creating time series graphs with matplotlib
Understanding and visualizing trends in customer data
Events and releases
Introduction to A/B testing
Initial A/B test design
Preparing to run an A/B test
Calculating sample size
Analyzing the A/B test results
Understanding statistical significance (get_pvalue, get_ci)
Interpreting your test results

My course #32 was Machine Learning with Tree-Based Models in Python. This course covers:

Decision-tree for classification (from sklearn.tree import DecisionTreeClassifier)
Classification-tree learning
Decision-tree for regression
Generalization error (bias-variance tradeoff)
Diagnosing bias and variance problems
Ensemble learning
Bagging (from sklearn.ensemble import BaggingClassifier)
Out of bag evaluation
Random forests
AdaBoost (from sklearn.ensemble import AdaBoostClassifier)
Gradient boosting (from sklearn.ensemble import GradientBoostingRegressor)
Stochastic gradient boosting
Tuning a CART’s hyperparameters
Tuning an RF’s hyperparameters

My course #33 was Introduction to PySpark. This is a data engineering course—a field in which I found myself not very enthusiastic. This course covers:

What is Spark, anyway?
Using Spark in Python
Using dataframes
Joining
Maching learning pipelines
Data types
Strings and factors

My course #34 was Cleaning Data with PySpark. This course covers:

Intro to data cleaning with Apache Spark
Immutability and lazy processing
Understanding Parquet
Dataframe column operations
Conditional dataframe column operations
User defined functions
Partitioning and lazy processing
Caching
Improve import performance
Cluster sizing tips
Performance improvements
Introduction to data pipelines
Data handling techniques
Data validation
Final analysis and delivery

I will review more classes next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 10)

Posted by Mark on December 18, 2020 at 07:25 | Last modified: January 29, 2021 14:36

In Part 9, I summarized my Datacamp courses 25-27. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #28 was Supervised Learning with scikit-learn. This course covers:

Supervised learning
Exploratory data analysis [pd.plotting.scatter_matrix()]
The classification challenge (creating arrays, from sklearn.neighbors import KNeighborsClassifier)
Measuring model performance (from sklearn.model_selection import train_test_split, datasets)
Introduction to regression (from sklearn.linear_model import LinearRegression)
The basics of linear regression
Cross-validation (from sklearn.model_selection import cross_val_score)
Correlation
Simple regression (from scipy.stats import linregress) and its limits
Regularized regression (from sklearn.linear_model import Ridge, Lasso)
How good is your model (from sklearn.metrics import classification_report, confusion_matrix)?
Logistic regression and the ROC curve (from sklearn.metrics import roc_curve)
Area under the ROC curve
Hyperparameter tuning (from sklearn.model_selection import GridSearchCV)
Hold-out set for final evaluation
Preprocessing data [pd.get_dummies(df)]
Handling missing data (from sklearn.preprocessing import Imputer, from sklearn.pipeline import Pipeline)
Centering and scaling (from sklearn.preprocessing import scale, StandardScaler)

My course #29 was Introduction to Natural Language Processing in Python. This course covers:

Introduction to regular expressions
Introduction to tokenization (from nltk.tokenize import word_tokenize, sent_tokenize)
Advanced tokenization with regex
Charting word length with nltk
Word counts with bag-of-words (from collections import Counter)
Simple text preprocessing (from nltk.corpus import stopwords, from nltk.stem import WordNetLemmatizer)
Introduction to gensim (from gensim.corpora.dictionary import Dictionary)
Tf-idf with gensim (from gensim.models.tfidfmodel import TfidfModel)
Named entity recognition
Introduction to SpaCy
Multilingual NER with polyglot (from polyglot.text import Text)
Classifying fake news using supervised learning with NLP
Building word count vectors (from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer)
Training and testing a classification model with scikit-learn (from sklearn.naive_bayes import MultinomialNB)
Simple NLP, complex problems

My course #30 was Building Chatbots in Python. This course covers:

Introduction to conversational software (respond function, sleep method from time module)
Creating a personality
Text processing with regular expressions
Understanding intents and entities (re.compile)
Word vectors
Intents and classification (from sklearn.svm import SVC)
Entity extraction
Robust NLU with Rasa (from rasa_nlu.converters import load_data)
Virtual assistants and accessing data
Exploring a DB with natural language
Incremental slot filling and negation
Stateful bots
Asking questions and queuing answers
Frontiers of dialog technology

I will review more classes next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 9)

Posted by Mark on December 15, 2020 at 07:23 | Last modified: January 28, 2021 10:21

In Part 8, I summarized my Datacamp courses 22-24. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #25 was Exploratory Data Analysis in Python (Part 2). This course covers:

Dataframes and series
Clean and validate (inplace arg)
Filter and visualize
Probability mass functions
Cumulative distribution functions (probability < x)
Comparing and modeling distributions
Exploring (scatter plot: transparency, market size, jittering, zoom) and visualizing relationships (violin, box plot)
Correlation
Simple regression (from scipy.stats import linregress) and its limits
Multiple regression
Visualizing regression results
Logistic regression

My course #26 was Regular Expressions in Python. Once into regex, this material gets very complex yet very powerful:

Introduction to string manipulation
String operations (selecting portions of a particular word)
Finding and replacing
Positional formatting (method to format percentages)
Formatted string literal (escape sequences)
Template method (from string import Template)
Introduction to regular expressions
Repetitions
Regex metacharacters
Greedy vs. non-greedy matching
Alternation and non-capturing groups
Backreferences
Lookaround

My course #27 was Introduction to Deep Learning in Python. This course covers:

Introduction to deep learning
Forward propagation
Activation functions
Deeper networks
The need for optimization
Gradient descent
Backpropagation [in practice]
Creating a Keras model
Compiling and fitting a model
Classification models
Using models
Understanding model optimization
Model validation
Thinking about model capacity
Stepping up to images

I will review more classes next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 8)

Posted by Mark on December 10, 2020 at 07:34 | Last modified: January 26, 2021 11:22

In Part 7, I summarized my Datacamp courses 19-21. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #22 was Statistical Thinking in Python (Part 2). This course covers:

Optimal parameters [statistical inference using scipy.stats, statsmodels, or hacker stats with numpy; plt.margins() ]
Linear regression by least squares [slope, intercept = np.polyfit() ]
The importance of exploratory data analysis: Anscombe’s quartet (generating and plotting line of best fit)
Generating bootstrap replicates [ecdf() written in my course (prequel) #14]
Bootstrap confidence intervals
Pairs bootstrap
Formulating and simulating a hypothesis (permutation sample)
Test statistics and p-values (permutation replicate)
Bootstrap hypothesis tests
A/B testing
Test of correlation

My course #23 was Introduction to Financial Concepts in Python. This course covers:

Fundamental financial concepts (calculating return on investment and compound interest)
Present and future value [np.pv(), np.fv() ]
Net present value and cash flows [np.npv(rate= , values=np.array([]) ) ]
Common profitability analysis methods [np.npv(), np.irr(np.array([]) ) ]
Weighted average cost of capital
Comparing two projects of different life spans (EAA)
Mortgage basics [np.pmt(rate, nper, pv) ]
Amortization, principal, and interest (simulating periodic mortgage payments)
Home ownership, equity, and forecasting (cumulative operations in numpy)
Budgeting project proposal [constant cumulative growth with np.repeat(), calculating monthly expenses]
Net worth and valuation in your personal financial life
The power of time and compound interest

My course #24 was Introduction to Portfolio Risk Management in Python. This course covers:

Financial returns
Mean, variance, and normal distributions (scaling volatility)
Skewness and kurtosis (from scipy.stats import skew, kurtosis, Shapiro-Wilk test)
Portfolio composition (calculating market-cap weights)
Correlation and covariance (calculating portfolio volatility)
Markowitz portfolios (MSR and GMV)
The capital asset pricing model (calculating Beta)
Alpha and multi-factor models (Fama-French 3-factor model)
Expanding the 3-factor model (Fama-French 5-factor model)
Estimating tail risk (historical drawdown, historical/conditional VaR)
VaR extensions
Random walks (Monte Carlo simulations)

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 7)

Posted by Mark on December 7, 2020 at 07:19 | Last modified: January 25, 2021 11:26

In Part 6, I summarized my Datacamp courses 16-18. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #19 was Manipulating DataFrames with pandas. This course covers:

Indexing DataFrames (using square brackets, using .loc, using .iloc, selecting certain columns with [[ ]] )
Slicing DataFrames (R boundary included with .loc but not .iloc, slicing with one/two brackets gets Series/df)
Filtering DataFrames
Transforming DataFrames [vectorized computations in numpy without loops, .map() for index, .apply() for Series]
Indexed objects and labeled data (name attribute for index and columns attributes)
Hierarchical indexing (sorting MultiIndex)
Pivoting DataFrames [.pivot(index= , columns= , values= ) ]
Stacking and unstacking DataFrames (pivoting doesn’t work well on MultiIndex so unstack to move index to column)
Melting DataFrames [reverses .pivot() ]
Pivot tables
Categoricals and groupby
Groupby and aggregation/transformation
Iterating over and filtering groupby object
Understanding the column labels
.idxmax() and .idxmin() (row/column label where max/min value located)
.T attribute (transposes numpy array)
Reshaping DataFrames for visualization
Making a histogram (bins, range, normalizing)

My twentieth course was Manipulating Time Series Data in Python. This course has lots of good information for backtesting:

How to use dates and times with pandas ( [sequences of] timestamp and period objects)
Indexing and resampling time series [selecting missing ‘price’ values, .asfreq() ]
Lags, changes, and returns for stock price series [.shift(), n-period % chg, .diff(), .pct_change(), stock price chg in df]
Compare time series growth rates ( .iloc as abs ref, normalizing series, concat prices and .dropna, perf vs. benchmark)
Changing the time series frequency: resampling
Upsampling and interpolation with .resample()
Downsampling and aggregation (plotting resample data with ax)
Rolling window functions with pandas (plotting price and moving average, plotting multiple rolling metrics)
Expanding window functions with pandas (calculating running return, running rate of return)
Relationships between time series: correlation
Select index components and import data
Build a market-cap weighted index
Evaluate index performance
Index correlation and exporting to Excel

My course #21 was Working with Dates and Times in Python. This course covers:

Dates in Python
Math with dates (time delta)
Turning dates into strings
Adding time to the mix
Printing and parsing datetimes (no time printed from datetime object)
Working with durations
UTC offsets
Time zone database (from dateutil import tz)
Starting Daylight Saving Time
Ending Daylight Saving Time [ .datetime_ambiguous() and .enfold() for ambiguous times]
Reading date and time data in Pandas (loading datetimes with parse_dates [or manually with .to_datetime() ])
Summarizing datetime data in Pandas (alternative to for loop)
Additional datetime methods in Pandas
Index correlation and exporting to Excel

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 6)

Posted by Mark on December 4, 2020 at 06:53 | Last modified: January 21, 2021 13:14

In Part 5, I summarized my Datacamp courses 13-15. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #16 was Introduction to Data Science in Python. This course covers:

Creating variables
What is a function?
What is pandas?
Selecting columns
Select rows with logic
Creating line plots
Adding labels and legends
Adding some style (line color, width, style, markers, template)
Making a scatter plot (marker transparency)
Making a bar chart (horizontal, error bars, stacked)
Making a histogram (bins, range, normalizing)

My course #17 was Joining Data with Pandas. This course covers:

Inner join (changing df values with .loc accessor)
One to many relationships
Merging multiple DataFrames
Left join (count number of rows in a column with missing data)
Right and outer joins
Merging a table to itself (i.e. self join)
Merging on indexes
Filtering joins (semi-joins, anti-joins)
Concatenate DataFrames together vertically [.append()]
verify_integrity=True identifies accidental duplicates while validate arg helps to identify relationship type
Using merge_ordered() (for ordered/time-series data and to fill in missing values)
Using .merge_asof() (matches on nearest-value rather than equal-value columns)
Selecting data with .query()
Reshaping data with .melt()

Introduction to Linear Modeling in Python was my eighteenth course. This covers:

Introductory concepts about models (interpolation, extrapolation)
Visualizing linear relationships [object-oriented (OOP) approach to matplotlib]
Quantifying linear relationships (covariance, correlation, normalization)
What makes a model linear (Taylor series, overfitting, defining function to plot graph)
Interpreting slope and intercept
Model optimization (RSS: sum of squared residuals)
Least-squares optimization (by numpy, Scipy, Statsmodels)
Modeling real data
The limits of prediction
Goodness of fit (deviations, residuals, and R-squared in code)
Standard error (RMSE measures spread of residuals whereas SE measures uncertainty in model params)
Inferential statistics concepts
Model estimation and likelihood
Model uncertainty and sample distributions (bootstrap in code)
Model errors and randomness

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 5)

Posted by Mark on November 20, 2020 at 07:23 | Last modified: January 20, 2021 06:41

In Part 4, I summarized my Datacamp courses 10-12. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My thirteenth course was Pandas Foundations. This course covers:

Review of pandas DataFrames
Building DataFrames from scratch
Importing and exporting data [pd.read_csv() args: header, names, na_values, parse_dates]
Plotting (arrays, series, DataFrames) with pandas
Visual exploratory data analysis (line, scatter, box plots, histogram, and different plotting idioms)
Statistical exploratory data analysis
Separating populations
Indexing time series (creating and using a Datetime index)
Resampling time series data
Manipulating time series data
Time series visualization (pandas, not matplotlib)
Reading and cleaning the data (cleaning and tidying datetime data)

Class #14 for me was Statistical Thinking in Python (Part 1). This course covers:

Introduction to exploratory data analysis
Plotting a histogram
Plot all of your data: Bee swarm plots [sns.swarmplot()]
Plot all of your data: Empirical Cumulative Distribution Functions (ECDF)
Introduction to summary statistics: the sample mean and median
Percentiles, outliers, and box plots
Variance and standard deviation
Covariance and the Pearson correlation coefficient
Probabilistic logic and statistical inference
Random number generators and hacker statistics
Probability distributions and stories: the Binomial distribution (binomial PMF and CDF)
Poisson processes and the Poisson distribution
Probability density functions
Introduction to the Normal distribution
The Normal distribution: properties and warnings
The Exponential distribution

My fifteenth course was Introduction to Data Visualization with Matplotlib. This course covers:

Adding data to axes
Customizing your plots (adding markers, setting linestyle, color, axis labels)
Small multiples with plt.subplots
Plotting time-series data (using fig/ax, zooming in on datetime range)
Plotting time-series with different variables (using twin axes, coloring vars and ticks, all-encompassing function)
Annotating time-series data
Quantitative comparisons: bar charts (stacking, adding legend, color)
Quantitative comparisons: histograms
Statistical plotting
Quantitative comparisons: scatter plots (encoding time by color)
Preparing your figures to share with others (choosing plot style)
Sharing your visualizations with others [fig.savefig()]
Automating figures from data

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 4)

Posted by Mark on November 12, 2020 at 07:08 | Last modified: January 19, 2021 10:18

In Part 3, I summarized the 6th through 9th Datacamp courses I took. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

Course #10 was Introduction to Databases in Python. This course covers:

Databases consist of tables
Connecting to your database
Introduction to SQL queries
Filtering and targeting data
Ordering query results
Counting, summing, and grouping data
SQLAlchemy and pandas for visualization
Calculating values in a query
SQL relationships
Working with hierarchical tables
Handling large ResultSets
Creating databases and tables
Inserting and updating data into a table
Deleting data from a database
Census case study
Populating and querying the database

Course #11 was Introduction to Statistics in Python. This course covers:

What is statistics?
Measures of center
Measures of spread and outliers (from scipy stats import iqr)
What are the chances [probability and dataframe .sample() method from random module]?
Discrete and continuous distributions
Generating random numbers according to uniform distribution (from scipy stats import uniform)
Computing cumulative distribution functions
Binomial distribution (from scipy stats import binom)
Normal distribution (from scipy stats import norm)
Central limit theorem
Poisson distribution (from scipy stats import poisson)
Exponential distribution (from scipy stats import expon)
Student’s t-distribution
Log-normal distribution
Correlation (and caveats)
Experimental design and confounders

My twelfth course was Introduction to Data Visualization in Python. This course covers:

Plotting multiple graphs
Customizing axes
Legends, annotations, and styles
Working with 2-D arrays and meshgrid
Visualizing bivariate functions (color bar, color map, axis tight, and contour plots)
Visualizing bivariate distributions (rectangular and hexagonal binning)
Working with images
Visualizing regressions [sns.lmplot(), hue, col, sns.residplot()]
Visualizing univariate distributions [sns.stripplot(), sns.swarmplot(), sns.violinplot()]
Visualizing multivariate distributions [sns.jointplot(), kde, sns.pairplot(), hue, covariance sns.heatplot()]
Visualizing time series (formatting datetime index)
Time series with moving windows
Histogram equalization in images

Categories: Python | Comments (0) | Permalink

Older Entries Newer Entries

Review of Python Courses (Part 13)

Review of Python Courses (Part 12)

Review of Python Courses (Part 11)

Review of Python Courses (Part 10)

Review of Python Courses (Part 9)

Review of Python Courses (Part 8)

Review of Python Courses (Part 7)

Review of Python Courses (Part 6)

Review of Python Courses (Part 5)

Review of Python Courses (Part 4)

Pages

Recent Posts

Categories