Neural Networks Explained: From 1943 Origins to Deep Learning Revolution 🚀 | AI History & Evolution
- Logistic Regression
- Decision Trees
- Random Forest
- Support Vector Machines (SVM)
- Naive Bayes
- K-Nearest Neighbors (KNN)
- Neural Networks (e.g., CNNs for image classification)
- Gradient Boosting (e.g., XGBoost, LightGBM, CatBoost)
- Accuracy
- Precision, Recall, F1-Score
- Confusion Matrix
- ROC Curve / AUC (Area Under Curve)
- Log Loss (Cross-Entropy Loss)
- Specificity/Sensitivity
- Linear Regression
- Decision Trees
- Random Forest
- Support Vector Regression (SVR)
- Neural Networks
- Gradient Boosting (e.g., XGBoost, LightGBM, CatBoost)
- Polynomial Regression
- Ridge/Lasso Regression
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- R² Score (Coefficient of Determination)
- Mean Absolute Percentage Error (MAPE)
Classification Metrics
Accuracy: Percentage of correct predictions.
Precision: How many predicted positives were true positives.
Recall (Sensitivity): How many actual positives were identified.
F1-Score: Harmonic mean of Precision and Recall.
ROC-AUC: Measures the trade-off between true positive rate and false positive rate.
Confusion Matrix: Displays counts of TP, FP, FN, TN to evaluate model performance.
Regression Metrics
MSE: Penalizes large errors by squaring them.
MAE: Average of absolute errors.
RMSE: Square root of MSE for interpretable units.
R² Score: Proportion of variance explained by the model.
MAPE: Measures prediction accuracy as a percentage error.
Business goal identification
ML problem framing
Data processing (data collection, data preprocessing, and feature engineering)
Model development (training, tuning, and evaluation)
Model deployment (inference and prediction)
Model monitoring
Model retraining
Defining a narrow use case for the application can help you to select the best model for the application. Governance policies and accountability for monitoring a model are important, but they don't fundamentally impact which model to select for the application.
A big part of preparing for the training process is to first split your data to ensure a proper division between your training and evaluation efforts.
A common strategy is to split all available labeled data into training, validation, and testing subsets, usually with a ratio of 80 % ,10 %, and 10 %. (Another common ratio is 70 %, 15 %, and 15 %.)
Feature engineering transforms data into features or inputs that will be valuable for the model.
Inclusive and diverse data collection
Inclusiveness and diversity in data collection ensure that data collection processes are fair and unbiased. Data collection should accurately reflect the diverse perspectives and experiences required for the use case of the AI system. This includes a diverse range of sources, viewpoints, and demographics. By doing this, the AI system can work to ensure decisions are unbiased in their performance.
Data augmentation
Use data augmentation techniques to generate new instances of underrepresented groups. This can help to balance the dataset and prevent biases towards more represented groups.
Curating datasets is the process of labeling, organizing, and preprocessing the data so that it can perform accurately on the model.
The curation can help to ensure that the data is representative of the problem at hand and free of biases or other issues that can impact the accuracy of the AI model. Curation helps to ensure that AI models are trained and evaluated on high-quality, reliable data that is relevant to the task they are intended to perform.
Regular auditing
Regularly audit the dataset to ensure it remains balanced and fair. Check for biases and take corrective actions if necessary.
Model fit is important for understanding the root cause of poor model accuracy. This understanding will guide you to take corrective steps. You can determine whether a predictive model is underfitting or overfitting the training data by looking at the prediction error on the training data and the evaluation data.
Overfitting is when the model performs well on the training data but does not perform well on the evaluation data. This is because the model memorized the data it has seen and is unable to generalize to unseen examples.
Regularization to prevent Overfitting . Overfitting effect is models are good at making predictions on the data they are trained on but poorly perform on new data
Underfitting is when the model performs poorly on the training data. This is because the model is unable to capture the relationship between the input examples (often called X) and the target values (often called Y).
If your model is underfitting and performing poorly on the training data, it could be that the model is too simple (the input features are not expressive enough) to describe the target well.
A model that underfits is a model that already performs poorly on training data , This would be detected in model evaluation
Think about bias as the gap between your predicted value and the actual value, whereas variance describes how dispersed your predicted values are.
When you use the PutRecord API to add data to a Feature Group in Amazon SageMaker Feature Store, the actual feature values are stored in two locations:
Online Store:
This is a low-latency, high-availability cache for real-time feature retrieval.
It's typically used for serving features to online applications or for real-time inference.
The data is stored in a NoSQL database managed by SageMaker.
Offline Store:
This is designed for batch processing and training datasets.
The data is stored in your specified Amazon S3 bucket within your AWS account.
It uses the Parquet file format for efficient storage and querying.
When you call PutRecord:
The data is immediately written to the Online Store (if enabled).
For the Offline Store, the data is first buffered, then batched, and finally written to S3 within about 15 minutes.
It's important to note:
The Feature Group itself doesn't store the actual feature values, but rather the metadata about the features (like names, types, and definitions).
The Online and Offline stores can be configured independently, allowing you to use one or both depending on your use case.
The Offline Store in S3 includes additional metadata fields like api_invocation_time, write_time, and is_deleted.
Classification is a supervised learning technique used to assign labels or categories to new, unseen data instances based on a trained model. The model is trained on a labeled dataset, where each instance is already assigned to a known class or category. The goal of classification is to learn patterns from the training data and use them to predict the class or category for new unlabeled data instances.
Classification Metrics
Accuracy
Precision
Recall
F1
AUC-ROC
Use cases include the following:
Fraud detection
Image classification
Customer retention
Diagnostics
Regression is a supervised learning technique used for predicting continuous or numerical values based on one or more input variable. It is used to model the relationship between a dependent variable (the value to be predicted) and one or more independent variables (the features or inputs used for prediction).
Regression Metrics
Mean squared error
R squared : commonly used metric with linear regression problems.
Use cases include the following:
Advertising popularity prediction
Weather forecasting
Market forecasting
Estimating life expectancy
Population growth prediction
predict the sale price of houses based on various features such as square footage, number of bedrooms, location, and age of the property.
Two main subcategories of unsupervised learning are clustering and dimensionality reduction.
Clustering
A common subcategory of unsupervised learning is clustering. This kind of algorithm groups data into different clusters based on similar features or distances between the data point to better understand the attributes of a specific cluster.
For example, by analyzing customer purchasing habits, an unsupervised algorithm can identify a company as being large or small.
Use cases include the following:
Customer segmentation
Targeted marketing
Recommended systems
Dimensionality reduction
Dimensionality reduction is an unsupervised learning technique used to reduce the number of features or dimensions in a dataset while preserving the most important information or patterns.
Use cases include the following:
Big data visualization
Meaningful compression
Structure discovery
Feature elicitation
Unlike the first two algorithms, this one continuously improves its model by mining feedback from previous iterations. In reinforcement learning, an agent continuously learns through trial and error as it interacts in an environment. Reinforcement learning is broadly useful when the reward of a desired outcome is known, but the path to achieving it isn’t—and that path requires a lot of trial and error to discover.
Real-time inference is ideal for inference workloads where you have real-time, interactive, and low latency requirements.
Use batch transform when you need to get inferences from large datasets and don't need a persistent endpoint. You can also use it when you need to preprocess datasets to remove noise or bias that interferes with training or inference from your dataset.
SageMaker asynchronous inference is a capability in SageMaker that queues incoming requests and processes them asynchronously. This option is ideal for requests with large payload sizes (up to 1GB), long processing times (up to one hour), and near real-time latency requirements
On-demand serverless inference is ideal for workloads that have idle periods between traffic spurts and can tolerate cold starts. It is a purpose-built inference option that you can use to deploy and scale ML models without configuring or managing any of the underlying infrastructure.
Transfer learning involves adapting an existing model for a specific application, such as fine-tuning a model to understand a new language.
Evaluation Click to flip
ROUGE-L uses the longest common subsequence to evaluate the coherence and order of the narrative in the generated text.
BLEU - used more for machine translation. So the idea here is to compare the machine translation to a human translation that's considered, again, the ground truth.
uses n-grams to do this evaluation. BLEU is more about precision as opposed to recall,
ROC curve ROC, which stands for Receiver Operating Characteristic Curve.
Precision , Recall and F1, are --> Classification metrics.
Precision: Measures the proportion of true positives out of all predicted positives (i.e., how many of the predicted churned customers actually churned). Precision is useful when the cost of a false positive is high.
Recall: Measures the proportion of true positives out of all actual positives (i.e., how many of the actual churned customers were correctly identified). Since the goal is to prioritize reducing false negatives (ensuring that as many actual churners as possible are correctly identified).
F1 Score: This is the harmonic mean of precision and recall, balancing both metrics. While it’s useful for imbalanced classes, it doesn’t specifically prioritize false negatives. If the primary focus is reducing false negatives, recall alone is a better choice.
Accuracy: Measures the overall correctness of the model, it doesn’t focus on false negatives or any specific class. It can be misleading, especially in cases of class imbalance.
To measure regression like the actual difference in an actual numerical value that's being predicted from their real values, you use measures like R squared, RMSE or Mae.
RLHF aims to align the model's performance with human goals and improve user satisfaction by incorporating human feedback into the training process.
A data scientist is working on a binary classification problem to predict whether a customer will churn. They want to select the best metric to evaluate their model’s performance, given that the cost of a false negative (predicting a customer will not churn when they will) is much higher than a false positive.
Which evaluation metric should the data scientist use to prioritize reducing false negatives?
Options : Presion, Recall, f1 score , accuracy
Answer : Given that the cost of a false negative (predicting a customer will not churn when they actually will) is much higher than a false positive, the data scientist should focus on Recall.
The data scientist should use recall as the evaluation metric because it directly focuses on capturing as many actual churners as possible, minimizing false negatives.