Javascript must be enabled to continue!

QoE-Driven Reinforcement Learning for Joint Bitrate, Rebuffering, and TTFF Optimization in HLS/DASH

HTTP adaptive streaming over HLS/DASH must balance delivered visual quality against playback interruptions, bitrate variation, and startup delay. In many deployed players, time-to-first-frame (TTFF) is still handled through startup heuristics rather than being optimized jointly with steady-state adaptive bitrate (ABR) decisions. This paper studies a trace-driven controller family that combines a PPO+GAE actor-critic policy with two deployment-oriented constraints: a safety supervisor that caps bitrate by an online throughput estimate and an optional startup cap that operates only before playback begins. We evaluate the controller family on 40 mobile HSDPA throughput traces from MMSys’13 using a simulator with 2 s segments, a 6-level bitrate ladder, and a unified QoE metric that rewards bitrate and penalizes rebuffering, bitrate changes, and TTFF. In the four-way controller comparison on the held-out 8-trace test split, the 750 kbps startup-cap operating point (SafeRL-TTFF-750) achieves the highest mean QoE (136.125 ± 58.994), improves mean TTFF by 16.6% relative to the throughput-based RB baseline, and keeps mean rebuffering at 0.228 ± 0.556 s. On the full 40-trace set, SafeRL-TTFF-750 and RB are effectively tied in mean QoE, with the former trading slightly higher bitrate and lower TTFF for higher rebuffering. An ablation study shows that the safety supervisor is essential, and that stricter startup caps can reduce TTFF further with only small changes in scalar QoE. The results support a practical conclusion: learned ABR can be useful on mobile traces when RL decisions are wrapped in transparent safety and startup controls.

Scientific Publication Center

Eric Wang Heyu Wang

Journal of Advanced Computing Systems

2026

Title: QoE-Driven Reinforcement Learning for Joint Bitrate, Rebuffering, and TTFF Optimization in HLS/DASH

Description:

HTTP adaptive streaming over HLS/DASH must balance delivered visual quality against playback interruptions, bitrate variation, and startup delay.

In many deployed players, time-to-first-frame (TTFF) is still handled through startup heuristics rather than being optimized jointly with steady-state adaptive bitrate (ABR) decisions.

This paper studies a trace-driven controller family that combines a PPO+GAE actor-critic policy with two deployment-oriented constraints: a safety supervisor that caps bitrate by an online throughput estimate and an optional startup cap that operates only before playback begins.

We evaluate the controller family on 40 mobile HSDPA throughput traces from MMSys’13 using a simulator with 2 s segments, a 6-level bitrate ladder, and a unified QoE metric that rewards bitrate and penalizes rebuffering, bitrate changes, and TTFF.

In the four-way controller comparison on the held-out 8-trace test split, the 750 kbps startup-cap operating point (SafeRL-TTFF-750) achieves the highest mean QoE (136.

125 ± 58.

994), improves mean TTFF by 16.

6% relative to the throughput-based RB baseline, and keeps mean rebuffering at 0.

228 ± 0.

556 s.

On the full 40-trace set, SafeRL-TTFF-750 and RB are effectively tied in mean QoE, with the former trading slightly higher bitrate and lower TTFF for higher rebuffering.

An ablation study shows that the safety supervisor is essential, and that stricter startup caps can reduce TTFF further with only small changes in scalar QoE.

The results support a practical conclusion: learned ABR can be useful on mobile traces when RL decisions are wrapped in transparent safety and startup controls.

Back

On-line video streaming is an ever evolving ecosystem of services and technologies, where content providers are on a constant race to satisfy the users' demand for richer content a...

Metabolomic Profiles Associated With Blood Pressure Reduction in Response to the DASH and DASH-Sodium Dietary Interventions

Background: The DASH (Dietary Approaches to Stop Hypertension) diets reduced blood pressure (BP) in the DASH and DASH-Sodium trials, but the underlying mechanisms are u...

Portuguese version of the HLS-EU-Q6 and HLS-EU-Q16 questionnaire: psychometric properties

Abstract Background Health Literacy (HL) refers to the empowerment and competencies of individuals and the general population to navigate in the various areas of health ca...

Validation of the HLS-EU-Q6 and HLS-EU-Q16 questionnaire of Portuguese version - a methodological study

Abstract Background Health Literacy (HL) refers to the empowerment and competencies of individuals and the general population to navigate in the various areas of health ca...

Abstract P208: Metabolomic Profiles Associated With Blood Pressure Reduction in Response to the DASH and DASH-Sodium Trials

Introduction: The Dietary Approaches to Stop Hypertension (DASH) diet significantly reduced blood pressure (BP) in the DASH and DASH-Sodium trials, but the underlying m...

A Proposed Adaptive Bitrate Scheme Based on Bandwidth Prediction Algorithm for Smoothly Video Streaming

A robust video-bitrate adaptive scheme at client-aspect plays a significant role in keeping a good quality of video streaming technology experience. Video quality affects the amoun...

Risk-Sensitive Offline Reinforcement Learning for Stable ABR QoE Improvements on Real HSDPA and LTE Traces

Adaptive bitrate (ABR) algorithms must select per-chunk video quality under substantial network uncertainty. While reinforcement learning (RL) improves average Quality-of-Experienc...

Urine Metabolites Associated with the Dietary Approaches to Stop Hypertension (DASH) Diet: Results from the DASH‐Sodium Trial

ScopeSerum metabolomic markers of the Dietary Approaches to Stop Hypertension (DASH) diet are previously reported. In an independent study, the similarity of urine metabolomic mark...

Email:
Password:

Email:

QoE-Driven Reinforcement Learning for Joint Bitrate, Rebuffering, and TTFF Optimization in HLS/DASH

Related Results