In 1959, IBM employee Arthur Samuel coined the term “machine learning” to describe a computer program that calculated the winning chance in checkers for each side. However, machine learning goes way back in history, although they are more statistical in nature. Since then, machine learning has grown from Ordinary Least Squares to Large Language Models that power the biggest tech companies of today.
Machine learning (ML) is one of the biggest trends in the world right now, and it is expected that it will go farther than this. The central premise of ML is that if you optimize an algorithm’s performance on a dataset that models real-world problems, the algorithm can make accurate predictions on the new data it sees in its ultimate use case.
But before we get into the weeds of machine learning, it is important to define what it is, what encompasses, and what it can do. In this lesson, you will learn about what machine learning is, understand what it does, and the problems it may be able to solve.
What is Machine Learning?
According to Wikipedia, Machine Learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data without explicit instructions.
IBM defined ML as the subset of AI focused on algorithms that can “learn” the patterns of training data and, subsequently, make accurate inferences about new and unseen data. Meanwhile, MIT Sloan referred to it as the capability of a machine to imitate intelligent human behavior. Lastly, AWS defined ML as a type of AI that performs data analysis tasks without explicit instructions.
One thing in common among these definitions is that machine learning is a subfield in the broader field of AI, concerned with training algorithms using data and generate predictions from new and unseen data. Some key aspects or elements are common among these definitions:
Representation: It refers to how you will represent reality for an algorithm to understand. It comes in the form of data and how the algorithms will interpret it through various models like regression, classification, or deep learning.
Evaluation: After training algorithms with data, evaluation refers to how you will evaluate your hypothesis. This may include various model evaluation metrics, squared errors, or likelihood and posterior probabilities.
Application: Machine learning should not just end with training and testing datasets. It should have a real-world application. It must address a problem in your field. In chemistry, it could include predicting properties of certain molecules, searching for lead drugs, etc.
Anatomy of Machine Learning
As stated from the definitions above, machine learning is a subfield of artificial intelligence. Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. AI includes natural language processing, robotics, cognitive computing, as well as software development.
Within AI, machine learning involves algorithms that can learn the patterns of training data and generalize it to new and unseen data. Machine learning is like the brain of artificial intelligence, in charge of helping “AI” gain knowledge and learn from real-world data.
In summary, all machine learning is AI, but not all AI is machine learning.
Machine learning includes image classification, supervised learning, unsupervised learning, reinforcement learning, and an even deeper subfield of deep learning.
Why Use Machine Learning?
While machine learning can solve a lot of problems, it does not come without any limitations.
| Advantages | Disadvantages |
|---|---|
| Automation: Automates repetitive and tedious tasks, reducing human error and freeing up employees for more complex work. | High cost and resource-intensive: Requires significant investment in computing power and skilled personnel, making it costly to implement and maintain. |
| Data analysis: Can process and analyze massive amounts of data to find patterns, trends, and insights that humans might miss. | Data dependency: Performance is heavily dependent on the availability of large, high-quality, and unbiased datasets, which can be difficult to acquire |
| Improved accuracy and efficiency: Continuously improves and becomes more accurate as it gains more data and experience, leading to more accurate predictions and better outcomes | Potential for errors and bias: Models can produce incorrect or biased results if the training data is flawed or incomplete, and it can be difficult to trace the source of these errors |
| Enhanced decision-making: Provides data-driven insights that lead to more informed and accurate business decisions. | Complexity and interpretability: It can be challenging to understand how a machine learning model arrives at a specific decision or prediction, a problem known as the “black box” problem |
| Wide application: Used across many industries, including healthcare, finance, and entertainment. | Privacy and security risks: The collection and use of large datasets raise concerns about data privacy and the risk of data breaches |
Applications in Chemistry
Machine learning has a lot of applications in the field of business, finance, marketing, and more. It also found its way into chemistry, as shown by these examples.
Drug and materials discovery: ML models can predict the properties of new compounds and generate novel molecules with desired characteristics, such as improved binding affinity.
Reaction design and optimization: Algorithms can design synthetic pathways and optimize chemical processes by predicting reaction outcomes and making real-time adjustments.
Property prediction: ML is used to predict various molecular properties, such as solubility or boiling point, using datasets of known molecules.
Computational chemistry: ML can accelerate quantum-accurate simulations and create models that approximate complex physical behaviors.
Analytical chemistry: ML algorithms can analyze data from experiments, such as smartphone-based analysis of blood samples for disease diagnosis or to identify and classify molecules from spectral data.