Review of Python Courses (Part 22)
Posted by Mark on January 29, 2021 at 07:31 | Last modified: February 9, 2021 13:29In Part 21, I summarized my Datacamp courses 62-64. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #65 was Reshaping Data with pandas. This course covers:
- Wide and long formats
- Reshaping using pivot method
- Pivot tables
- Reshaping with melt
- Wide to long function
- Working with string columns
- Stacking dataframes
- Unstacking dataframes
- Working with multiple levels
- Handling missing data
- Reshaping and combining data
- Transforming a list-like column
- Reading nested data into a dataframe (from pandas import json_normalize)
- Dealing with nested data columns
>
My course #66 was Building Data Engineering Pipelines in Python. For some reason, these data engineering courses did not sit well with me and much of this sailed over my head. This course covers:
- Components of a data platform
- Introduction to data ingestion with Singer
- Running an ingestion pipeline with Singer
- Basic introduction to PySpark (from pyspark.sql import SparkSession)
- Cleaning data
- Transforming data with Spark
- Packaging your application
- On the importance of tests
- Writing unit tests for PySpark
- Continuous testing
- Modern day workflow management
- Building a data pipeline with Airflow (from airflow.operators.bash_operator import BashOperator)
- Deploying Airflow (from airflow.models import DagBag)
>
My course #67 was Importing and Managing Financial Data in Python. This course covers:
- Reading, inspecting, and cleaning data from CSV (parse_dates explained)
- Read data from Excel worksheets
- Combine data from multiple worksheets (importing market data from multiple Excel files)
- The DataReader: access financial data online (from pandas_datareader.data import DataReader)
- Economic data from the Federal Reserve
- Select stocks and get data from Google Finance
- Get several stocks and manage a MultiIndex
- Summarize your data with descriptive stats
- Describe the distribution of your data with quantiles (np.arange() to .describe() with constant-step percentiles)
- Visualize the distribution of your data [ax = sns.distplot(df)]
- Summarize categorical variables
- Aggregate your data by category
- Summary statistics by category with seaborn [sns.countplot()]
- Distributions by category with seaborn [sns.boxplot(), sns.swarmplot()]
>
I will review more courses next time.
No comments posted.