Javascript must be enabled to continue!

Utility-Based Preference Training for Effective Synthetic Text Classification

High-quality synthetic text can mitigate annotation scarcity in text classification. However, standard preference optimization often produces samples that are fluent but weakly label-specific. We present Utility-weighted Direct Preference Optimization (U-DPO), a preference-optimization framework for class-conditional synthetic data generation. In U-DPO, a task-specific classifier provides a margin-based external score for each candidate generation, which is combined with an embedding-based internal similarity score to form an overall utility. These utilities are used (i) to mine preference pairs from multiple candidates per class and (ii) to weigh each DPO update by the utility gap between preferred and dispreferred samples. This design encourages the generator to concentrate on learning informative, label-discriminative preference comparisons rather than treating all pairs equally. Across two multiclass scientific-abstract benchmarks (arXiv and WOS-11967), U-DPO consistently improves downstream SciBERT classification accuracy compared with both vanilla synthetic generation and standard DPO fine-tuning, with gains up to 0.88 percentage points on arXiv and 0.83 percentage points on WOS-11967 depending on the generator. An additional GPT-4.5-based evaluation also indicates a higher mean quality score for U-DPO samples with reduced variance.

MDPI AG

Jiho Gwak Yuchul Jung

Mathematics

2026

Title: Utility-Based Preference Training for Effective Synthetic Text Classification

Description:

High-quality synthetic text can mitigate annotation scarcity in text classification.

However, standard preference optimization often produces samples that are fluent but weakly label-specific.

We present Utility-weighted Direct Preference Optimization (U-DPO), a preference-optimization framework for class-conditional synthetic data generation.

In U-DPO, a task-specific classifier provides a margin-based external score for each candidate generation, which is combined with an embedding-based internal similarity score to form an overall utility.

These utilities are used (i) to mine preference pairs from multiple candidates per class and (ii) to weigh each DPO update by the utility gap between preferred and dispreferred samples.

This design encourages the generator to concentrate on learning informative, label-discriminative preference comparisons rather than treating all pairs equally.

Across two multiclass scientific-abstract benchmarks (arXiv and WOS-11967), U-DPO consistently improves downstream SciBERT classification accuracy compared with both vanilla synthetic generation and standard DPO fine-tuning, with gains up to 0.

88 percentage points on arXiv and 0.

83 percentage points on WOS-11967 depending on the generator.

An additional GPT-4.

5-based evaluation also indicates a higher mean quality score for U-DPO samples with reduced variance.

Back

<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...

Sleep Habits and Occurrence of Lowback Pain among Craftsmen

<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...

Bounds on the sum of broadcast domination number and strong metric dimension of graphs

Let [Formula: see text] be a connected graph of order at least two with vertex set [Formula: see text]. For [Formula: see text], let [Formula: see text] denote the length of an [Fo...

ANALYSIS OF READING MATERIALS IN TEXTBOOK FOR GRADE XI SENIOR HIGH SCHOOL

This study aims to find out the GI and LD level, the text which has the highest GI and LD and what make the text has the highest GI and LD of Advanced Learning English 2 textbook. ...

A saturation problem in meshes

Let [Formula: see text] and [Formula: see text] be graphs, where we view [Formula: see text] as the “host” graph and [Formula: see text] as a “forbidden” graph. A spanning subgraph...

E-Press and Oppress

From elephants to ABBA fans, silicon to hormone, the following discussion uses a new research method to look at printed text, motion pictures and a te...

On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/

<spa...

When is R[θ] integrally closed?

Let [Formula: see text] be an integrally closed domain with quotient field [Formula: see text] and [Formula: see text] be an element of an integral domain containing [Formula: see ...

Email:
Password:

Email:

Utility-Based Preference Training for Effective Synthetic Text Classification

Related Results