Supervised and Unsupervised Learning

Published on 14 July 2024 at 12:33

Supervised Learning Overview

 Teaching or training a machine that uses data that is labelled well is called supervised learning, and it is usually containing data already tagged with correct information (Geeksforgeeks, 2023). The machine will produce outcome from labelled data when it is provided with new data examples because the supervised learning algorithm analyzes it. A simple example of how supervised learning works is to imagine you have a basket of different types of fruit. The machine must learn the different types of fruit one by one. For example, the machine will learn how a banana looks by seeing it is a long-curved cylinder shape that is either green or yellow and it will be labeled as a banana. Next, the machine will learn that if the object is round and has a depression at its top and is red or green it will be labeled as an apple.

If after training, another piece of fruit is found, and it is an apple, when presented to the machine, the machine will know to put the new fruit found in the apple category.

In supervised learning there are two types of algorithms. Those two types are classification and regression. Below is a description about them both (Johnson, 2023):

  1. Classification – this is when the output variable is a category such as “purple” or “green”. This type of algorithm determines data into different classes. Its predictive modeling classification is the task of a mapping function such as discrete output variables (y), and input variables (x). The algorithm will generate a probability score for input. There are several types of classification algorithms. One of them is called decision trees which performs well. However, individual trees that are unconstrained tend to be prone to overfitting.
  2. Regression – this is when the output variable is something like “dollars” that is a real value. A regression algorithm is used to predict single output values using training data. The outputs have probabilistic interpretations. However, this type of algorithm can underperform in logistic regression because it is not flexible and does not capture more complex relationships.

Supervised learning optimizes performance criteria with experience. Machines in supervised learning solve various real-world problems. Supervised learning does produce data output from previous experiences which is always helpful. However, it can be challenging in supervised learning to classify big data. There is also a lot of computation time involved in training for supervised learning.

Unsupervised Learning Overview

The training of a machine by using information that is not labeled nor classified, and to allow the machine algorithm to work on information without guidance, is called unsupervised learning (Geeksforgeeks, 2023). In unsupervised learning, the machine has the task of grouping unsorted information according to patterns, similarities, and difference without training data in previous sessions. There is no teacher provided in unsupervised learning and that means no training the machine which means the machine is then restricted to find unlabeled information by itself. A simple example of how unsupervised learning works is if the machine sees an image of cats and mice that it has not seen before. Because the machine has never seen this image, it can’t categorize it as “cats and mice”. However, the machine will categorize the image according to its patterns, similarities, and differences. The machine would likely put it in two categories such as all images having cats in them, and another category labeled as images having mice in them. The machine model works on its own to find information and patterns that was detected before. It deals with unlabeled data. 

There are two algorithm categories that unsupervised learning is classified into. Those two algorithms are association and clustering. A description about both algorithms is as follows (Geeksforgeeks, 2023):

  • Association – this algorithm is used to discover rules that would describe a large portion of data such as customers that buy this product also tends to buy this other product.
  • Clustering – this algorithm is used to discover inherent groupings in data such as finding purchasing behavior in groups of customers.

Unsupervised learning learns the data and classifies it without labels because the labels are added after data has been classified (Singhal, n.d.). This makes unsupervised learning easier than supervised learning. Reduction in dimensionality is much easier in unsupervised learning, making it a perfect tool for data scientists that use raw data. Unsupervised learning is the most like human intelligence because models learn slowly and then calculate results. However, in unsupervised learning a user must spend time labeling and interpreting classes that follow a classification. Input data isn’t known and isn’t labeled by users in advance. Class spectral properties also can change over time that would cause the same class information not to be there while moving from one image to another. 

Supervised and Unsupervised Learning Applications

Supervised Learning Applications

In supervised learning, the algorithm breaks down data into two parts called the testing set and training set. The training set uses past observations of data to make predictions. The testing set works as a real test to check how the algorithm is making predictions to make sure it is as accurate as possible. This ensures the best results for predictions. There are several applications that use supervised learning. Three of the most common applications are listed below (Smolic, 2022):

  1. Finance – using supervised machine learning has become very popular in the finance industry. Predictions are very important in finance so supervised learning is in demand due to its prediction algorithm. Financial institutions use supervised learning to identify fraud, investment decisions, and stock price forecasts to name a few. The supervised machine learning algorithm also provides the finance industry with credit scoring that allows the assessment of the creditworthiness of potential borrowers. The financial institution would be able to predict if the customer would make late payments or default on the loan.
  2. Forecasting weather – the supervised learning algorithm has proven useful for weather forecasting as it predicts weather changes based on historical data. Meteorologists consider atmospheric pressure, temperature, wind speed, and humidity to help with weather prediction. Supervised machine learning combines this information with radar readings, satellite images, and other information to give accurate weather forecasts.
  3. Retail – retailers use supervised machine learning to predict their customers’ purchasing behavior. This is done by viewing a customer’s previous purchases, time of the day, their income level, type of store and other demographic information. Retailers use the information to help with staffing decisions as well as with inventory levels. Customer behavior prediction helps retail businesses focus their marketing on potential and existing customers as they improve cost and operations savings. Supervised machine learning can also help retailers on the operations side by improving stock management, prevention of fraud, and waste reduction.

Various professions and industries have learned that using supervised machine learning helps them make informed decisions with better accuracy. Many businesses have prospered using supervised machine learning. It makes sense to use it.

Unsupervised Learning Applications

Different techniques like association rules, clustering, and dimensionality reduction are approached through unsupervised learning. However, clustering is the most popular and commonly used unsupervised learning technique. Similar data pieces are grouped into clusters that haven’t been defined beforehand. Below are three of the most common application uses for clustering (Altexsoft, 2021):

  1. Anomaly detection– with this unsupervised learning technique in clustering, financial organizations can detect outliers in data which helps stop fraudulent transactions. Transportation companies can use anomaly detection for predictive maintenance to predict obstacles or defective mechanical parts.
  2. Cancer studies – studying cancer gene expression data is possible because of unsupervised learning with cluster methods. This allows the prediction of cancer in its early stages.
  3. Market segmentation – Customers that have similar traits can be grouped together with clustering algorithms. This is to create customer personas for the use of marketing campaigns so products can be presented to customers that are likely to buy them.

Unsupervised learning helps data science teams find similarities and differences in data groups. This helps when a data scientist does not yet know what to look for. Unlabeled data is faster to get, and in unsupervised learning human error is reduced because people are not using a manual labeling process.

Supervised and Unsupervised Learning Issues and Solutions

Supervised Learning Issues and Solutions

Although supervised learning offers many advantages, there are some issues that have been observed. The biggest issues are not being able to cluster data based on features on its own, it can not handle large amounts of data or all complex tasks, and sometimes new data that does not belong to an existing class, could end up in a wrong class after classification (Intellipaat, 2023). As supervised learning uses predictive models, data in an incorrect class could make the results incorrect.

The way to address issues in supervised learning is to consider the dataset size and balance, feature selection, and the model choice (Lo Duca, 2021). These will improve the performance of supervised learning algorithms. Datasets size is important as the dataset might be too small to get an accurate result. Data balance means making sure the output classes correctly correspond to the same number of input records. Model choice is important because sometimes just changing the model can improve performance immensely. Choosing the right feature selection in a model is crucial in getting the right results. Using the Correlation Feature Selection method will calculate correlation between each output and feature and will select only the features that have correlation greater than certain thresholds.

Unsupervised Learning Issues and Solutions

Unsupervised learning offers many advantages. However, there are some noted issues. Some of the biggest issues are that results might not be completely accurate as because there is no input data to train from, the unsupervised learning algorithm learns from raw data with no prior knowledge to pull from, the algorithm is time-consuming as it analyses and calculates every possibility (Techvidvan, n.d.). Accuracy in results is important.

There are a few algorithms in unsupervised learning that work well with raw unclassified data, and they are clustering algorithms. These algorithms are known to help work with issues in unsupervised learning. There are different types of clustering algorithms such as overlapping, specifically exclusive, hierarchical, and probabilistic (IBM, n.d.). Likely the best thing to do would be to research the different clustering algorithms and choose which one would give the results that would fit the answer needed.

In conclusion, both supervised and unsupervised learning have great benefits as well as issues. An organization would use either one depending on what type of data they have, and what type of results they need. If a retail business wants to predict their customers’ purchasing behavior, or a financial institution wants to determine stock market forecasting, they would use supervised machine learning. If an organization wishes to make a marketing campaign according to what their customers are likely to buy, they will use unsupervised machine learning.  The key is for organizations to know their data size and type, along with knowing what kind of answer they need. This will help them decide which machine learning type to use.

 

 

 References

Altexsoft. (2021, April 14). Unsupervised learning algorithms and examples. Altexsoft. https://www.altexsoft.com/blog/unsupervised-machine-learning/

Geeksforgeeks. (2023, January 10). Supervised and unsupervised learning. Geeksforgeeks. https://www.geeksforgeeks.org/supervised-unsupervised-learning/

IBM. (n.d.). What is unsupervised learning? IBM. https://www.ibm.com/topics/unsupervised-learning#:~:text=Hierarchical%20clustering%2C%20also%20known%20as,can%20be%20agglomerative%20or%20divisive.

Intellipaat. (2023, March). What is supervised learning? Intellipaat. https://intellipaat.com/blog/what-is-supervised-learning/

Johnson, D. (2023, January 21). Supervised machine learning: What is, algorithms with examples. Guru99. https://www.guru99.com/supervised-machine-learning.html

Lo Duca, A. (2021, February 8). How to improve the performance of a supervised machine learning agorithm. Towardsdatascience. https://towardsdatascience.com/how-to-improve-the-performance-of-a-supervised-machine-learning-algorithm-c9f9f2705a5c

Singhal, V. (n.d.). Advantages and disadvantages of unsupervised Learning. Boardinfinity. https://discuss.boardinfinity.com/t/advantages-and-disadvantages-of-unsupervised-learning/4930

Smolic, H. (2022, March 13). Machine learning supervised: 10 popular applications in 2023. Graphite Note. https://graphite-note.com/machine-learning-supervised

Techvidvan. (n.d.). Unsupervised learning – machine learning algorithms. Techvidvan. https://techvidvan.com/tutorials/unsupervised-learning/