Javascript must be enabled to continue!

Risk-Sensitive Offline Reinforcement Learning for Stable ABR QoE Improvements on Real HSDPA and LTE Traces

Adaptive bitrate (ABR) algorithms must select per-chunk video quality under substantial network uncertainty. While reinforcement learning (RL) improves average Quality-of-Experience (QoE), trace-driven evaluations often reveal heavy-tailed stall events and brittle behavior under high-variance cellular links. This paper presents a risk-sensitive offline-RL ABR design that optimizes the lower tail of the return distribution via Conditional Value-at-Risk (CVaR) computed from a distributional Q-function. We conduct a full empirical evaluation using two public real-trace datasets: (i) 12 3G/HSDPA throughput logs from Norwegian mobile streaming sessions (UMass MMSys trace archive), and (ii) 20 4G/LTE bandwidth logs collected along routes in Ghent, Belgium (UGent/IDLab dataset). Using a Pensieve-style chunked streaming simulator and a standard QoE function (bitrate reward, rebuffer penalty, and smoothness penalty), we compare a buffer-based rule (BBA), robust model predictive control (RobustMPC), online tabular actor–critic (A2C), and an offline distributional RL method (Quantile Regression Conservative Q-Learning, QR-CQL) with a CVaR decision rule. Across 400 fixed test episodes on held-out traces, the risk-sensitive policy OfflineQR-CQL(CVaR@0.25) achieves mean QoE 104.91 (within 17.6% of the best policy, RobustMPC). Relative to online A2C, it improves mean QoE by -8.3% and reduces mean rebuffer time by -224.2%. Relative to RobustMPC, it improves mean QoE by -17.6% and reduces mean rebuffer time by -79.6%. Bucketed analysis by trace coefficient-of-variation shows the largest QoE gain in the highest-variability quartile (Q4), where OfflineQR-CQL(CVaR@0.25) exceeds RobustMPC by -27.59 QoE points. A CVaR sensitivity sweep confirms a controllable risk–reward trade-off governed by α.

Scientific Publication Center

Yunhe Li

Journal of Advanced Computing Systems

2026

Title: Risk-Sensitive Offline Reinforcement Learning for Stable ABR QoE Improvements on Real HSDPA and LTE Traces

Description:

Adaptive bitrate (ABR) algorithms must select per-chunk video quality under substantial network uncertainty.

While reinforcement learning (RL) improves average Quality-of-Experience (QoE), trace-driven evaluations often reveal heavy-tailed stall events and brittle behavior under high-variance cellular links.

This paper presents a risk-sensitive offline-RL ABR design that optimizes the lower tail of the return distribution via Conditional Value-at-Risk (CVaR) computed from a distributional Q-function.

We conduct a full empirical evaluation using two public real-trace datasets: (i) 12 3G/HSDPA throughput logs from Norwegian mobile streaming sessions (UMass MMSys trace archive), and (ii) 20 4G/LTE bandwidth logs collected along routes in Ghent, Belgium (UGent/IDLab dataset).

Using a Pensieve-style chunked streaming simulator and a standard QoE function (bitrate reward, rebuffer penalty, and smoothness penalty), we compare a buffer-based rule (BBA), robust model predictive control (RobustMPC), online tabular actor–critic (A2C), and an offline distributional RL method (Quantile Regression Conservative Q-Learning, QR-CQL) with a CVaR decision rule.

Across 400 fixed test episodes on held-out traces, the risk-sensitive policy OfflineQR-CQL(CVaR@0.

25) achieves mean QoE 104.

91 (within 17.

6% of the best policy, RobustMPC).

Relative to online A2C, it improves mean QoE by -8.

3% and reduces mean rebuffer time by -224.

2%.

Relative to RobustMPC, it improves mean QoE by -17.

6% and reduces mean rebuffer time by -79.

6%.

Bucketed analysis by trace coefficient-of-variation shows the largest QoE gain in the highest-variability quartile (Q4), where OfflineQR-CQL(CVaR@0.

25) exceeds RobustMPC by -27.

59 QoE points.

A CVaR sensitivity sweep confirms a controllable risk–reward trade-off governed by α.

Back

This PhD thesis has the characteristic to span over a long time because while working on it, I was working as a research engineer at CTTC with highly demanding development duties. ...

Identifying and diagnosing video streaming performance issues

On-line video streaming is an ever evolving ecosystem of services and technologies, where content providers are on a constant race to satisfy the users' demand for richer content a...

Fair coexistence between LTE and Wi-Fi in unlicensed spectrum

La coexistence équitable entre LTE et Wi-Fi dans le spectre non licencié Les systèmes de communication des nouvelles générations telles que LTE et LTE-Advanced, leu...

Hearing aids in patients with vestibular schwannoma: Interest of the auditory brainstem responses

ObjectiveHearing loss subsequent to a unilateral vestibular schwannoma (VS) has an impact on the social life of non‐operated patients. We investigated the utility of auditory brain...

Strategi Refarming Frekuensi 1800 MHz Untuk Implementasi LTE di Indonesia

LTE adalah teknologi yang digunakan dalam generasi keempat dengan arsitektur yang lebih sederhana dan semua berbasis IP (Internet Protocol). Teknologi baru ini membutuhkan spektrum...

ANALISIS PERANCANGAN LTE HOME PADA JARINGAN 4G LTE BERBASIS OPEN RADIO ACCESS NETWORK

Penelitian tentang LTE berlanjut hingga teknologi generasi kelima secara resmi ditetapkan oleh badan standar. Ketersediaan jaringan transmisi berbasis serat optik mendorong riset u...

QoE-Driven Reinforcement Learning for Joint Bitrate, Rebuffering, and TTFF Optimization in HLS/DASH

HTTP adaptive streaming over HLS/DASH must balance delivered visual quality against playback interruptions, bitrate variation, and startup delay. In many deployed players, time-to-...

Optimalisasi Throughput LTE Femtocell Pada Jaringan Koeksistensi LTE-GSM

Throughput pada jaringan koeksistensi GSM dan LTE Femto sangat tergantung pada kondisi sebaran femtocell, diantaranya jumlah dan posisi femtocell. Pada studi sebelumnya telah banya...

Email:
Password:

Email:

Risk-Sensitive Offline Reinforcement Learning for Stable ABR QoE Improvements on Real HSDPA and LTE Traces

Related Results