Google has recently announced a breakthrough in its efforts to build an AI architecture capable of handling millions of tasks, including complex reasoning and learning. This famed system is known as The Pathways Language Model (PaLM). PaLM algorithm can outperform current state-of-the-art AI and even beat humans in reasoning and language tests.
However, the researchers point out that large-scale language models can have limitations that could unintentionally lead to negative ethical outcomes.
Background Information
These sections provide background information to explain the PaLM algorithm.
Few-Shot Learning
Few-shot Learning is the next level of learning beyond deep learning. Hugo Larochelle, the Google Brain researcher (@hugo_larochelle) has recently gave a presentation entitled, Generalizing From Few Examples with Meta-Learning. He explained that deep learning is difficult because it needs to collect large amounts of data, which requires significant human labor.
He said that deep learning is unlikely to lead to an AI capable of solving many tasks, and deep learning requires millions upon millions of examples for an AI to learn from each task.
Larochelle explains,
“The idea is to attack this problem directly, the problem of few-shot learning, generalizing from small amounts of data.
The main idea behind what I’ll be presenting is simple. Instead of determining what that learning algorithm is by N and using our intuition to decide the best algorithm for few-shot learning. Rather we should attempt to learn that algorithm completely.
That’s why we call learning to learn, or as I prefer to call it, meta-learning.”
This few-shot approach shows how people learn and combine different knowledge to solve new problems.
A machine that can use all the knowledge to solve new problems is advantageous. PaLM algorithm’s ability to tell a joke it has never heard before exemplifies this capability.
Pathways AI
Google published an article in October 2021 describing the goals of a new AI architecture called Pathways. The pathway was a new chapter in the ongoing development of AI systems.
A common approach was to develop algorithms that could be trained to do certain things well. Pathways’ approach to AI is to create one AI model that can solve all problems, and it does this by learning how they are solved and avoiding the inefficient way of using thousands of algorithms to accomplish thousands of tasks.
As per the Google’s Pathways document:
This is how a model learns from training on a single task, such as how aerial photos can predict the elevation in a landscape. It could also learn how to predict floodwaters through that terrain.
Today’s AI models are commonly trained to do only one thing. Pathways will allow us to train a single model to do hundreds of millions of things.
Today’s models mostly focus on one sense. Pathways will enable multiple senses.
Today’s models are dense and inefficient. Pathways will make them sparse and efficient.
Pathways outlined Google’s plan for AI advancement, and it was designed to bridge the gap between machine and human learning.
Google’s new model, Pathways Language Model (PaLM), represents this next step. According to this research paper, PaLM algorithm is a significant advancement in AI.
What makes Google PaLM algorithm Important
PaLM scales learning in a few seconds.
As per the research PaLM research paper,
“Large-language models have been proven to perform well across a wide range of natural language tasks. This is due to few-shot learning, which drastically reduces the number of task-specific examples needed to adapt the model for a particular application.
We trained a 540 billion parameter, densely activated Transformer language model to further learn about the impact of scale on few-shot learning. This model is now called PaLM.
Many research papers have been published describing algorithms that are not better than the current state-of-the-art or only show incremental improvements.
PaLM algorithm is not one of these models. Researchers claim that PaLM has significant improvements over current models and even surpasses human benchmarks, and this algorithm is notable for its success.
Researchers write:
“We continue to show the benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
In a number of these tasks, PaLM 540B achieves remarkable performance. It outperforms the current state-of-the-art in multi-step reasoning tasks and outperforms average human performance using the newly released BIG-bench benchmark.
Many BIG-bench tasks showed significant improvements in performance as we scaled up to the largest model.”
PaLM algorithm is superior to the current state of the art for English natural language processing tasks, and this makes PaLM significant.
A collaborative benchmark called BIG bench, which included over 150 tasks related to reasoning, translation, and question answering, showed that PaLM algorithm outperformed state-of-the-art. However, there were some areas where it didn’t do so well.
It is worth noting that PaLM algorithm performed better than humans on 35% of tasks, especially mathematical-related tasks. (See section 6.2 BIG bench of the research paper, page 17).
PaLM algorithm did a better job translating from another language into English than translating English to other languages. Researchers stated that this was a common problem and could be solved by prioritizing multilingual data.
However, PaLM algorithm outperformed all other language models across the board.
Ability to Reason
It was particularly notable for its performance in commonsense reasoning and arithmetic tasks.
Let’s take an example of an Arithmetic Task
Question:
John owns five tennis balls and buys two more cans of tennis balls. Each can contains three tennis balls. What number of tennis balls does he currently have?
Answer:
The answer to that question is 11.
Now take an example of Commonsense reasoning:
Question:
Adam was running late to get home. But the light suddenly turned yellow, and he had to do something.
Answer Options:
(a) Take the time
(b) Dawdle
(c), Go slowly
(d) Ocean
(e) Slow down
Answer:
Slow down.
Generating Explanations
A remarkable example of the power of PaLM in the generation of explanations. This is what the research paper explains.
“This section aims to demonstrate the capabilities of PaLM in explanatory language generation. For prompts that require complex combinations of logical and abstract language understandings, logical inference chains, world knowledge, and lexicographic languages understandings (e.g., puns).”
Explaining Jokes
Example of a language model that explains a joke:
Joke:
I planned to fly to my family to see them on April 6th. My mom told me,
“Oh great, that’s your stepdad’s poetry reading night>
Now, I am flying on April 7th.
Model Explanation:
The joke goes that the speaker’s mom tries to convince them to attend their stepdad’s poetry reading. But the speaker isn’t interested in going, so they change their flight to the day following the reading.
Inference Chaining
This is an example of inference-chaining. To create an answer, we need to join multiple concepts or entities.
Input:
Guido was working on a Python program. Guido keeps getting confused about variable scope. He looked it up on StackOverflow.
Could the Guido in this prompt be a reference to Guido van Rossum?
Model Output:
Guido van Rossum was the original creator of Python, and he wouldn’t need to search variable scope on StackOverflow. The answer is “no.”
Another example:
Input:
Jennifer looks out of her window and sees a cool cloud below. She unbuckles her seatbelt, and she heads for the bathroom. Is Jennifer traveling over 300 miles an hour about the earth?
Model Output:
Three hundred miles per hour equals approximately 480 km/h. This is the speed of a commercial plane, and Jennifer is most likely on an airplane because clouds are often below planes.
The short answer is “yes.”
Next Generation Search Engine?
This example of PaLM’s ability to perform complex reasoning shows how a next-generation search engine might be able to answer complex questions using knowledge from the Internet.
Google Pathways and PaLM are working towards achieving an AI architecture that can provide answers that reflect the world around them.
The researchers stressed that PaLM was not the end-all solution for AI and search, and paLM is only a first step towards the next type of search engine Pathways envisions.
To understand PaLM, we need first to understand two terms, jargon.
Modalities
Generalization
The Modalities is a reference to how things are experienced. It refers to the state of things, such as text that is read, images to be seen, or audio that is listened to.
In machine learning, the word generalization refers to the ability of a language modeler to solve new tasks that it hasn’t previously been trained on.
Researchers noted that
“PaLM” is just one step in our vision to establish Pathways as the future for ML scaling at Google. We believe PaLM provides a solid foundation for our ultimate goal to develop a large-scale modularized system with broad generalization capabilities across multiple modalities.”
Ethical Considerations and Real-World Resistances
The researchers caution about ethical considerations in this research paper. They claim that large-scale language models based on web data absorb many “toxic” stereotypes, social disparities, and other undesirable influences.
This research paper refers to a paper that was published in 2021 and explores how we can use large-scale languages models to promote the following harms:
Discrimination, Exclusion, and Toxicity
Information Hazards
Misinformation Harms
Malicious Uses
Human-Computer Interaction Harms
Access, Automation, and Environmental Harms
Finally, researchers pointed out that PaLM is indeed a reflection of toxic social stereotypes, and this makes it clear that filtering these biases can be difficult.
PaLM researchers explain
“Our analysis shows that our training data and PaLM reflect various social stereotypes and toxic associations around identity terms.
It is not easy to remove these associations. Future work should focus on effectively dealing with such unfavorable biases in data and their impact on model behavior.
While this is happening, real-world PaLM users should conduct further contextualized fairness assessments to determine the potential harms and provide appropriate mitigation and protections.”
PaLM is a glimpse into the future of search. PaLM claims to be the best at what it does, but researchers state that we need more research to find a way to reduce misinformation and toxic stereotypes.
Key Features of PaLM
Efficient scaling – PaLM can demonstrate the first large-scale use of Pathways. It is a new ML system that allows training of a single model across hundreds or millions of accelerator chips in a highly efficient manner.
Continued improvements from scaling – PaLM researchers assess PaLM across thousands of natural language, code, and mathematical reasoning tasks. PaLM researchers achieve state-of-the-art results on most of these benchmarks, typically by substantial margins.
Breakthrough capabilities – PaLM researchers demonstrate breakthrough capabilities in language understanding and generation across various difficult tasks.
Discontinuous improvements – To better understand the scaling behavior, PaLM researchers present results at three different parameter scales. They are 8B, 62B, and 540B. Typically, scaling from 62B to 540B results in similar performance as scaling from 8B to 62B. It is consistent with the “power law” rule of thumb often observed in neural network scaling.
Multilingual understanding – This work conducts a more thorough evaluation of multilingual benchmarks. They include machine translation, summarization, and question answering in different languages.
Bias and toxicity – Team also assessed model performance for distributional bias and toxicity. Eventually, this resulted in several quality insights.