Neural Networks Explained: From 1943 Origins to Deep Learning Revolution 🚀 | AI History & Evolution

Machine Learning

Comparison of Classification and Regression

Classification

Common Alogrithms

- Logistic Regression
- Decision Trees
- Random Forest
- Support Vector Machines (SVM)
- Naive Bayes
- K-Nearest Neighbors (KNN)
- Neural Networks (e.g., CNNs for image classification)
- Gradient Boosting (e.g., XGBoost, LightGBM, CatBoost)

Evauation Metrics

- Accuracy
- Precision, Recall, F1-Score
- Confusion Matrix
- ROC Curve / AUC (Area Under Curve)
- Log Loss (Cross-Entropy Loss)
- Specificity/Sensitivity

Regression

Common Algorithms

- Linear Regression

- Decision Trees

- Random Forest

- Support Vector Regression (SVR)

- Neural Networks

- Gradient Boosting (e.g., XGBoost, LightGBM, CatBoost)

- Polynomial Regression

- Ridge/Lasso Regression

Evaluation Metrics

- Mean Absolute Error (MAE)

- Mean Squared Error (MSE)

- Root Mean Squared Error (RMSE)

- R² Score (Coefficient of Determination)

- Mean Absolute Percentage Error (MAPE)

Explanation of Metrics

Classification Metrics

Accuracy: Percentage of correct predictions.
Precision: How many predicted positives were true positives.
Recall (Sensitivity): How many actual positives were identified.
F1-Score: Harmonic mean of Precision and Recall.
ROC-AUC: Measures the trade-off between true positive rate and false positive rate.
Confusion Matrix: Displays counts of TP, FP, FN, TN to evaluate model performance.

Regression Metrics

MSE: Penalizes large errors by squaring them.
MAE: Average of absolute errors.
RMSE: Square root of MSE for interpretable units.
R² Score: Proportion of variance explained by the model.
MAPE: Measures prediction accuracy as a percentage error.

Machine learning lifecycle phases:

Business goal identification
ML problem framing
Data processing (data collection, data preprocessing, and feature engineering)
Model development (training, tuning, and evaluation)
Model deployment (inference and prediction)
Model monitoring
Model retraining

Use Case

Defining a narrow use case for the application can help you to select the best model for the application. Governance policies and accountability for monitoring a model are important, but they don't fundamentally impact which model to select for the application.

Preparation of Data

A big part of preparing for the training process is to first split your data to ensure a proper division between your training and evaluation efforts.

A common strategy is to split all available labeled data into training, validation, and testing subsets, usually with a ratio of 80 % ,10 %, and 10 %. (Another common ratio is 70 %, 15 %, and 15 %.)

Feature engineering transforms data into features or inputs that will be valuable for the model.

Inclusive and diverse data collection

Inclusiveness and diversity in data collection ensure that data collection processes are fair and unbiased. Data collection should accurately reflect the diverse perspectives and experiences required for the use case of the AI system. This includes a diverse range of sources, viewpoints, and demographics. By doing this, the AI system can work to ensure decisions are unbiased in their performance.

Data augmentation

Use data augmentation techniques to generate new instances of underrepresented groups. This can help to balance the dataset and prevent biases towards more represented groups.

Curating datasets is the process of labeling, organizing, and preprocessing the data so that it can perform accurately on the model.

The curation can help to ensure that the data is representative of the problem at hand and free of biases or other issues that can impact the accuracy of the AI model. Curation helps to ensure that AI models are trained and evaluated on high-quality, reliable data that is relevant to the task they are intended to perform.

Regular auditing

Regularly audit the dataset to ensure it remains balanced and fair. Check for biases and take corrective actions if necessary.

Model fit is important for understanding the root cause of poor model accuracy. This understanding will guide you to take corrective steps. You can determine whether a predictive model is underfitting or overfitting the training data by looking at the prediction error on the training data and the evaluation data.

Overfitting is when the model performs well on the training data but does not perform well on the evaluation data. This is because the model memorized the data it has seen and is unable to generalize to unseen examples.

Regularization to prevent Overfitting . Overfitting effect is models are good at making predictions on the data they are trained on but poorly perform on new data

Underfitting is when the model performs poorly on the training data. This is because the model is unable to capture the relationship between the input examples (often called X) and the target values (often called Y).

If your model is underfitting and performing poorly on the training data, it could be that the model is too simple (the input features are not expressive enough) to describe the target well.

A model that underfits is a model that already performs poorly on training data , This would be detected in model evaluation

Bias and Variance

Think about bias as the gap between your predicted value and the actual value, whereas variance describes how dispersed your predicted values are.

When you use the PutRecord API to add data to a Feature Group in Amazon SageMaker Feature Store, the actual feature values are stored in two locations:

Online Store:
- This is a low-latency, high-availability cache for real-time feature retrieval.
- It's typically used for serving features to online applications or for real-time inference.
- The data is stored in a NoSQL database managed by SageMaker.
Offline Store:
- This is designed for batch processing and training datasets.
- The data is stored in your specified Amazon S3 bucket within your AWS account.
- It uses the Parquet file format for efficient storage and querying.

When you call PutRecord:

The data is immediately written to the Online Store (if enabled).
For the Offline Store, the data is first buffered, then batched, and finally written to S3 within about 15 minutes.

It's important to note:

The Feature Group itself doesn't store the actual feature values, but rather the metadata about the features (like names, types, and definitions).
The Online and Offline stores can be configured independently, allowing you to use one or both depending on your use case.
The Offline Store in S3 includes additional metadata fields like api_invocation_time, write_time, and is_deleted.

Supervised learning

Classification is a supervised learning technique used to assign labels or categories to new, unseen data instances based on a trained model. The model is trained on a labeled dataset, where each instance is already assigned to a known class or category. The goal of classification is to learn patterns from the training data and use them to predict the class or category for new unlabeled data instances.

Classification Metrics

Accuracy
Precision
Recall
F1
AUC-ROC

Use cases include the following:

Fraud detection
Image classification
Customer retention
Diagnostics

Regression is a supervised learning technique used for predicting continuous or numerical values based on one or more input variable. It is used to model the relationship between a dependent variable (the value to be predicted) and one or more independent variables (the features or inputs used for prediction).

Regression Metrics

Mean squared error
R squared : commonly used metric with linear regression problems.

Use cases include the following:

Advertising popularity prediction
Weather forecasting
Market forecasting
Estimating life expectancy
Population growth prediction

predict the sale price of houses based on various features such as square footage, number of bedrooms, location, and age of the property.

Although accuracy is a widely used metric for classification problems, it has limitations. This metric is less effective when there are a lot of true negative cases in your dataset. This is why two other metrics are often used in these situations : precision and recall.

Unsupervised learning

Two main subcategories of unsupervised learning are clustering and dimensionality reduction.

Clustering

A common subcategory of unsupervised learning is clustering. This kind of algorithm groups data into different clusters based on similar features or distances between the data point to better understand the attributes of a specific cluster.

For example, by analyzing customer purchasing habits, an unsupervised algorithm can identify a company as being large or small.
Use cases include the following:

Customer segmentation
Targeted marketing
Recommended systems

Dimensionality reduction

Dimensionality reduction is an unsupervised learning technique used to reduce the number of features or dimensions in a dataset while preserving the most important information or patterns.

Use cases include the following:

Big data visualization
Meaningful compression
Structure discovery
Feature elicitation

Reinforcement learning

Unlike the first two algorithms, this one continuously improves its model by mining feedback from previous iterations. In reinforcement learning, an agent continuously learns through trial and error as it interacts in an environment. Reinforcement learning is broadly useful when the reward of a desired outcome is known, but the path to achieving it isn’t—and that path requires a lot of trial and error to discover.

Real-time inference is ideal for inference workloads where you have real-time, interactive, and low latency requirements.

Use batch transform when you need to get inferences from large datasets and don't need a persistent endpoint. You can also use it when you need to preprocess datasets to remove noise or bias that interferes with training or inference from your dataset.

SageMaker asynchronous inference is a capability in SageMaker that queues incoming requests and processes them asynchronously. This option is ideal for requests with large payload sizes (up to 1GB), long processing times (up to one hour), and near real-time latency requirements

On-demand serverless inference is ideal for workloads that have idle periods between traffic spurts and can tolerate cold starts. It is a purpose-built inference option that you can use to deploy and scale ML models without configuring or managing any of the underlying infrastructure.

Transfer learning involves adapting an existing model for a specific application, such as fine-tuning a model to understand a new language.

Evaluation Click to flip

Evaluation of ML Models

ROUGE-L uses the longest common subsequence to evaluate the coherence and order of the narrative in the generated text.

BLEU - used more for machine translation. So the idea here is to compare the machine translation to a human translation that's considered, again, the ground truth.

uses n-grams to do this evaluation. BLEU is more about precision as opposed to recall,

ROC curve ROC, which stands for Receiver Operating Characteristic Curve.

Precision , Recall and F1, are --> Classification metrics.

Precision: Measures the proportion of true positives out of all predicted positives (i.e., how many of the predicted churned customers actually churned). Precision is useful when the cost of a false positive is high.
Recall: Measures the proportion of true positives out of all actual positives (i.e., how many of the actual churned customers were correctly identified). Since the goal is to prioritize reducing false negatives (ensuring that as many actual churners as possible are correctly identified).
F1 Score: This is the harmonic mean of precision and recall, balancing both metrics. While it’s useful for imbalanced classes, it doesn’t specifically prioritize false negatives. If the primary focus is reducing false negatives, recall alone is a better choice.
Accuracy: Measures the overall correctness of the model, it doesn’t focus on false negatives or any specific class. It can be misleading, especially in cases of class imbalance.

To measure regression like the actual difference in an actual numerical value that's being predicted from their real values, you use measures like R squared, RMSE or Mae.

RLHF aims to align the model's performance with human goals and improve user satisfaction by incorporating human feedback into the training process.

Example question of ML Metric selection

A data scientist is working on a binary classification problem to predict whether a customer will churn. They want to select the best metric to evaluate their model’s performance, given that the cost of a false negative (predicting a customer will not churn when they will) is much higher than a false positive.

Which evaluation metric should the data scientist use to prioritize reducing false negatives?

Options : Presion, Recall, f1 score , accuracy

Answer : Given that the cost of a false negative (predicting a customer will not churn when they actually will) is much higher than a false positive, the data scientist should focus on Recall.

Recommendation

The data scientist should use recall as the evaluation metric because it directly focuses on capturing as many actual churners as possible, minimizing false negatives.

Page updated

Google Sites

Report abuse