Javascript must be enabled to continue!
QoE-Driven Reinforcement Learning for Joint Bitrate, Rebuffering, and TTFF Optimization in HLS/DASH
View through CrossRef
HTTP adaptive streaming over HLS/DASH must balance delivered visual quality against playback interruptions, bitrate variation, and startup delay. In many deployed players, time-to-first-frame (TTFF) is still handled through startup heuristics rather than being optimized jointly with steady-state adaptive bitrate (ABR) decisions. This paper studies a trace-driven controller family that combines a PPO+GAE actor-critic policy with two deployment-oriented constraints: a safety supervisor that caps bitrate by an online throughput estimate and an optional startup cap that operates only before playback begins. We evaluate the controller family on 40 mobile HSDPA throughput traces from MMSys’13 using a simulator with 2 s segments, a 6-level bitrate ladder, and a unified QoE metric that rewards bitrate and penalizes rebuffering, bitrate changes, and TTFF. In the four-way controller comparison on the held-out 8-trace test split, the 750 kbps startup-cap operating point (SafeRL-TTFF-750) achieves the highest mean QoE (136.125 ± 58.994), improves mean TTFF by 16.6% relative to the throughput-based RB baseline, and keeps mean rebuffering at 0.228 ± 0.556 s. On the full 40-trace set, SafeRL-TTFF-750 and RB are effectively tied in mean QoE, with the former trading slightly higher bitrate and lower TTFF for higher rebuffering. An ablation study shows that the safety supervisor is essential, and that stricter startup caps can reduce TTFF further with only small changes in scalar QoE. The results support a practical conclusion: learned ABR can be useful on mobile traces when RL decisions are wrapped in transparent safety and startup controls.
Title: QoE-Driven Reinforcement Learning for Joint Bitrate, Rebuffering, and TTFF Optimization in HLS/DASH
Description:
HTTP adaptive streaming over HLS/DASH must balance delivered visual quality against playback interruptions, bitrate variation, and startup delay.
In many deployed players, time-to-first-frame (TTFF) is still handled through startup heuristics rather than being optimized jointly with steady-state adaptive bitrate (ABR) decisions.
This paper studies a trace-driven controller family that combines a PPO+GAE actor-critic policy with two deployment-oriented constraints: a safety supervisor that caps bitrate by an online throughput estimate and an optional startup cap that operates only before playback begins.
We evaluate the controller family on 40 mobile HSDPA throughput traces from MMSys’13 using a simulator with 2 s segments, a 6-level bitrate ladder, and a unified QoE metric that rewards bitrate and penalizes rebuffering, bitrate changes, and TTFF.
In the four-way controller comparison on the held-out 8-trace test split, the 750 kbps startup-cap operating point (SafeRL-TTFF-750) achieves the highest mean QoE (136.
125 ± 58.
994), improves mean TTFF by 16.
6% relative to the throughput-based RB baseline, and keeps mean rebuffering at 0.
228 ± 0.
556 s.
On the full 40-trace set, SafeRL-TTFF-750 and RB are effectively tied in mean QoE, with the former trading slightly higher bitrate and lower TTFF for higher rebuffering.
An ablation study shows that the safety supervisor is essential, and that stricter startup caps can reduce TTFF further with only small changes in scalar QoE.
The results support a practical conclusion: learned ABR can be useful on mobile traces when RL decisions are wrapped in transparent safety and startup controls.
Related Results
Identifying and diagnosing video streaming performance issues
Identifying and diagnosing video streaming performance issues
On-line video streaming is an ever evolving ecosystem of services and technologies, where content providers are on a constant race to satisfy the users' demand for richer content a...
Metabolomic Profiles Associated With Blood Pressure Reduction in Response to the DASH and DASH-Sodium Dietary Interventions
Metabolomic Profiles Associated With Blood Pressure Reduction in Response to the DASH and DASH-Sodium Dietary Interventions
Background:
The DASH (Dietary Approaches to Stop Hypertension) diets reduced blood pressure (BP) in the DASH and DASH-Sodium trials, but the underlying mechanisms are u...
Portuguese version of the HLS-EU-Q6 and HLS-EU-Q16 questionnaire: psychometric properties
Portuguese version of the HLS-EU-Q6 and HLS-EU-Q16 questionnaire: psychometric properties
Abstract
Background
Health Literacy (HL) refers to the empowerment and competencies of individuals and the general population to navigate in the various areas of health ca...
Validation of the HLS-EU-Q6 and HLS-EU-Q16 questionnaire of Portuguese version - a methodological study
Validation of the HLS-EU-Q6 and HLS-EU-Q16 questionnaire of Portuguese version - a methodological study
Abstract
Background
Health Literacy (HL) refers to the empowerment and competencies of individuals and the general population to navigate in the various areas of health ca...
Abstract P208: Metabolomic Profiles Associated With Blood Pressure Reduction in Response to the DASH and DASH-Sodium Trials
Abstract P208: Metabolomic Profiles Associated With Blood Pressure Reduction in Response to the DASH and DASH-Sodium Trials
Introduction:
The Dietary Approaches to Stop Hypertension (DASH) diet significantly reduced blood pressure (BP) in the DASH and DASH-Sodium trials, but the underlying m...
A Proposed Adaptive Bitrate Scheme Based on Bandwidth Prediction Algorithm for Smoothly Video Streaming
A Proposed Adaptive Bitrate Scheme Based on Bandwidth Prediction Algorithm for Smoothly Video Streaming
A robust video-bitrate adaptive scheme at client-aspect plays a significant role in keeping a good quality of video streaming technology experience. Video quality affects the amoun...
Risk-Sensitive Offline Reinforcement Learning for Stable ABR QoE Improvements on Real HSDPA and LTE Traces
Risk-Sensitive Offline Reinforcement Learning for Stable ABR QoE Improvements on Real HSDPA and LTE Traces
Adaptive bitrate (ABR) algorithms must select per-chunk video quality under substantial network uncertainty. While reinforcement learning (RL) improves average Quality-of-Experienc...
Urine Metabolites Associated with the Dietary Approaches to Stop Hypertension (DASH) Diet: Results from the DASH‐Sodium Trial
Urine Metabolites Associated with the Dietary Approaches to Stop Hypertension (DASH) Diet: Results from the DASH‐Sodium Trial
ScopeSerum metabolomic markers of the Dietary Approaches to Stop Hypertension (DASH) diet are previously reported. In an independent study, the similarity of urine metabolomic mark...

