1. Definition and Purpose
- Supervised Learning:
- In supervised learning, the model is trained on labeled data, meaning each input is paired with a correct output.
- The goal is to learn the mapping between inputs and outputs so that the model can predict the output for new, unseen inputs.
- Example: Predicting house prices based on features like size, location, and number of bedrooms (where historical prices are known).
- Unsupervised Learning:
- In unsupervised learning, the model is given data without labeled responses. Instead, it tries to find patterns or structure in the data.
- The goal is often to explore data, find groups (clustering), or detect outliers.
- Example: Grouping customers into segments based on purchasing behavior without predefined categories.
2. Types of Problems Addressed
- Supervised Learning:
- Classification: Categorizing data into classes (e.g., spam vs. not spam in emails).
- Regression: Predicting continuous values (e.g., stock prices or temperature).
- Unsupervised Learning:
- Clustering: Grouping similar data points (e.g., market segmentation).
- Association: Finding associations or relationships between variables (e.g., market basket analysis in retail).
- Dimensionality Reduction: Reducing the number of features while retaining essential information (e.g., principal component analysis for visualizing data in 2D).
3. Example Algorithms
- Supervised Learning Algorithms:
- Linear Regression
- Logistic Regression
- Decision Trees and Random Forests
- Support Vector Machines (SVM)
- Neural Networks (when trained with labeled data)
- Unsupervised Learning Algorithms:
- K-Means Clustering
- Hierarchical Clustering
- Principal Component Analysis (PCA)
- Association Rule Mining (like the Apriori algorithm)
4. Training Data Requirements
- Supervised Learning: Requires a labeled dataset, which can be costly and time-consuming to collect and label.
- Unsupervised Learning: Works with unlabeled data, which is often more readily available, but the insights are less straightforward without predefined labels.
5. Evaluation Metrics
- Supervised Learning: Can be evaluated with standard metrics like accuracy, precision, recall, F1 score (for classification), and mean squared error (for regression), since we have labeled outputs.
- Unsupervised Learning: Harder to evaluate directly. Techniques like silhouette score or Davies–Bouldin index (for clustering) are used, or qualitative analysis may be required.
6. Use Cases
- Supervised Learning: Fraud detection, email classification, medical diagnosis, sales forecasting, and image recognition.
- Unsupervised Learning: Customer segmentation, anomaly detection, topic modeling, and data compression.
In summary:
- Supervised learning requires labeled data and is primarily used for prediction or classification tasks where the outcome is known.
- Unsupervised learning doesn’t require labeled data and is mainly used for data exploration, clustering, and finding patterns where the outcome is not predefined.