Review of Python Courses (Part 19)
Posted by Mark on January 19, 2021 at 07:12 | Last modified: February 6, 2021 04:54In Part 18, I summarized my Datacamp courses 53-55. Today I will continue with the next three.
As a reminder, I introduced you to my recent work learning Python here.
My course #56 was Writing Efficient Code with pandas. This course covers:
- The need for efficient coding (time.time(), list comprehensions faster than for loop)
- Locate rows: .iloc[] (generally faster for rows) and .loc[] (generally faster for columns)
- Select random rows (built-in sample() function faster than numpy random integer generator)
- Replace scalar values using .replace() (much faster than using .loc[] to find values and reassigning them)
- Replace values using lists (.replace() faster than using .loc[] )
- Replace values using dictionaries (faster than using lists)
- Looping through the .iterrows() function [for loop using .range() is faster than the smarter/cleaner/optimized .iterrows()]
- Looping through the .apply() function (faster iterating along rows while native pandas .sum() faster along columns)
- Vectorization over pandas series [vectorization method .apply() works faster than .iterrows()]
- Vectorization using NumPy arrays using .values() (summing arrays is faster than summing series)
- Data transformation using .groupby().transform (.transform() cleaner and much faster than native Python code)
- Missing value imputation using .transform() (.transform() much faster than native Python code)
- Data filtration using the .filter() function (.groupby().filter() faster than list comprehension + for loop)
>
My course #57 was Credit Risk Modeling in Python. This course covers:
- Understanding credit risk
- Outliers in credit data
- Risk with missing data in loan data (finding, counting, and replacing missing data)
- Logistic regression for probability of default
- Predicting the probability of default
- Credit model performance
- Model discrimination and impact
- Gradient boosted trees with XGBoost
- Column selection for credit risk
- Cross validation for credit models
- Class imbalance in loan data
- Model evaluation and implementation (from sklearn.calibration import calibration_curve)
- Credit acceptance rates
- Credit strategy and maximum expected loss
>
My course #58 was Analyzing IoT Data in Python. This course covers:
- Introduction to IoT data
- Understand the data
- Introduction to data streams (import paho.mqtt.subscribe as subscribe)
- Perform EDA
- Clean data
- Gather minimalistic incremental data
- Prepare and visualize incremental data
- Combining data sources for further analysis
- Correlation
- Outliers (from statsmodels.graphics import tsaplots)
- Seasonality and trends
- Prepare data for machine learning
- Scaling data for machine learning
- Develop machine learning pipeline (from sklearn.pipeline import Pipeline)
- Apply a machine learning model
>
I will review more classes next time.
Categories: Python | Comments (0) | Permalink