Javascript must be enabled to continue!
A Sample Decreasing Threshold Greedy-Based Algorithm for Big Data Summarisation
View through CrossRef
Abstract
As the scales of datasets expand rapidly in the applications of big data, increasing efforts have been made to develop fast algorithms. This paper addresses big data summarisation problems using the submodular maximisation approach and proposes an efficient algorithm for maximising general non-negative submodular objective functions subject to k-extendible system constraints. Leveraging the sampling process and the decreasing threshold strategy, we develop an algorithm, named Sample Decreasing Threshold Greedy (SDTG). The proposed algorithm obtains an expected approximation guarantee of 1/1+k-є for maximising monotone submodular functions and of k/(1+k)2-є in non-monotone cases with expected computational complexity of O(n/(1+k)є ln r/є). Here, r is the largest size of feasible solutions, and є є(0.1/1+k) is an adjustable designing parameter for the trade-off between the approximation ratio and the computational complexity. The performance of the proposed algorithm is verified through experiments with a movie recommendation system and compared with that of benchmark algorithms.
Title: A Sample Decreasing Threshold Greedy-Based Algorithm for Big Data Summarisation
Description:
Abstract
As the scales of datasets expand rapidly in the applications of big data, increasing efforts have been made to develop fast algorithms.
This paper addresses big data summarisation problems using the submodular maximisation approach and proposes an efficient algorithm for maximising general non-negative submodular objective functions subject to k-extendible system constraints.
Leveraging the sampling process and the decreasing threshold strategy, we develop an algorithm, named Sample Decreasing Threshold Greedy (SDTG).
The proposed algorithm obtains an expected approximation guarantee of 1/1+k-є for maximising monotone submodular functions and of k/(1+k)2-є in non-monotone cases with expected computational complexity of O(n/(1+k)є ln r/є).
Here, r is the largest size of feasible solutions, and є є(0.
1/1+k) is an adjustable designing parameter for the trade-off between the approximation ratio and the computational complexity.
The performance of the proposed algorithm is verified through experiments with a movie recommendation system and compared with that of benchmark algorithms.
Related Results
Towards an Improved Strategy for Solving Multi-Armed Bandit Problem
Towards an Improved Strategy for Solving Multi-Armed Bandit Problem
Multi-Armed Bandit (MAB) problem is one of the classical reinforcements learning problems that describe the friction between the agent’s exploration and exploitation. This study ex...
SOLVING 0 - 1 KNAPSACK PROBLEM BASED ON HYBRID GREEDY FIREWORKS ALGORITHM
SOLVING 0 - 1 KNAPSACK PROBLEM BASED ON HYBRID GREEDY FIREWORKS ALGORITHM
Aiming at the classical knapsack problem in combinatorial optimization,
in order to improve the local search ability and global search ability
of the basic fireworks algorithm, an ...
The Power of Verification for Greedy Mechanism Design
The Power of Verification for Greedy Mechanism Design
Greedy algorithms are known to provide, in polynomial time, near optimal approximation guarantees for Combinatorial Auctions (CAs) with multidimensional...
Empowering the middle management: Incorporating data summarisation and visualisation techniques in management information systems
Empowering the middle management: Incorporating data summarisation and visualisation techniques in management information systems
Human–computer interaction (HCI) researchers and practitioners have explored how computers and software can aid users in decision-making. Accurate information is crucial in decisio...
Digital Footprint as a Source of Big Data in Education
Digital Footprint as a Source of Big Data in Education
The purpose of this study is to consider the prospects and problems of using big data in education.Materials and methods. The research methods include analysis, systematization and...
Real-Time Timeline Summarisation for High-Impact Events in Twitter
Real-Time Timeline Summarisation for High-Impact Events in Twitter
Twitter has become a valuable source of event-related information, namely, breaking news and local event reports. Due to its capability of transmitting information in real-time, Tw...
Why Should Big Data-based Price Discrimination be Governed?
Why Should Big Data-based Price Discrimination be Governed?
Abstract
The e-commerce platform provides data service for resident merchants for precise marketing, but which also leads to frequent occurrence of big data-based price dis...
Multi-armed Bandit Algorithms for Cournot Games
Multi-armed Bandit Algorithms for Cournot Games
Abstract
We investigate using a multi-armed bandit (MAB) setting for modeling repeated Cournot oligopoly games. Agents interact with separate bandit problems. An agent can ...

