Javascript must be enabled to continue!
Approximated Variational Bayesian Inverse Reinforcement Learning for Large Language Model Alignment
View through CrossRef
The alignment of large language models (LLMs) is crucial for generating helpful and harmless content. Existing approaches leverage preference-based human feedback data to learn the reward function and align the LLM with the feedback data. However, these approaches focus on modeling the reward difference between the chosen and rejected demonstrations, rather than directly modeling the true reward from each demonstration. Moreover, these approaches assume that the reward is only obtained at the end of the sentence, which overlooks the modeling of intermediate rewards. These issues lead to insufficient use of training signals in the feedback data, limiting the representation and generalization ability of the reward and potentially resulting in reward hacking. In this paper, we formulate LLM alignment as a Bayesian Inverse Reinforcement Learning (BIRL) problem and propose a novel training objective, Approximated Variational Alignment (AVA), to perform LLM alignment through Approximated Variational Reward Imitation Learning (AVRIL). The BIRL formulation facilitates intermediate reward modeling and direct reward modeling on each single demonstration, which enhances the utilization of training signals in the feedback data. Experiments show that AVA outperforms existing LLM alignment approaches in reward modeling, RL fine-tuning, and direct optimization.
Title: Approximated Variational Bayesian Inverse Reinforcement Learning for Large Language Model Alignment
Description:
The alignment of large language models (LLMs) is crucial for generating helpful and harmless content.
Existing approaches leverage preference-based human feedback data to learn the reward function and align the LLM with the feedback data.
However, these approaches focus on modeling the reward difference between the chosen and rejected demonstrations, rather than directly modeling the true reward from each demonstration.
Moreover, these approaches assume that the reward is only obtained at the end of the sentence, which overlooks the modeling of intermediate rewards.
These issues lead to insufficient use of training signals in the feedback data, limiting the representation and generalization ability of the reward and potentially resulting in reward hacking.
In this paper, we formulate LLM alignment as a Bayesian Inverse Reinforcement Learning (BIRL) problem and propose a novel training objective, Approximated Variational Alignment (AVA), to perform LLM alignment through Approximated Variational Reward Imitation Learning (AVRIL).
The BIRL formulation facilitates intermediate reward modeling and direct reward modeling on each single demonstration, which enhances the utilization of training signals in the feedback data.
Experiments show that AVA outperforms existing LLM alignment approaches in reward modeling, RL fine-tuning, and direct optimization.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Figs S1-S9
Figs S1-S9
Fig. S1. Consensus phylogram (50 % majority rule) resulting from a Bayesian analysis of the ITS sequence alignment of sequences generated in this study and reference sequences from...
Sample-efficient Optimization Using Neural Networks
Sample-efficient Optimization Using Neural Networks
<p>The solution to many science and engineering problems includes identifying the minimum or maximum of an unknown continuous function whose evaluation inflicts non-negligibl...
Theory of variational quantum simulation
Theory of variational quantum simulation
The variational method is a versatile tool for classical simulation of a variety of quantum systems. Great efforts have recently been devoted to its extension to quantum computing ...
The Effect of Compression Reinforcement on the Shear Behavior of Concrete Beams with Hybrid Reinforcement
The Effect of Compression Reinforcement on the Shear Behavior of Concrete Beams with Hybrid Reinforcement
Abstract
This study examines the impact of steel compression reinforcement on the shear behavior of concrete beams reinforced with glass fiber reinforced polymer (GFRP) bar...
Study on Scheme Optimization of bridge reinforcement increasing ratio
Study on Scheme Optimization of bridge reinforcement increasing ratio
Abstract
The bridge reinforcement methods, each method has its advantages and disadvantages. The load-bearing capacity of bridge members is controlled by the ultimat...
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
<span lang="EN-MY">In this paper, a wideband millimeter-wave (mm-Wave) printed dipole antenna is proposed to be used for fifth generation (5G) communications. The single elem...
Dopamine regulates decision thresholds in human reinforcement learning
Dopamine regulates decision thresholds in human reinforcement learning
AbstractDopamine fundamentally contributes to reinforcement learning by encoding prediction errors, deviations of an outcome from expectation. Prediction error coding in dopaminerg...

