Skip to content

Towards Seamless Configuration Tuning of Big Data Analytics

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Standard

Towards Seamless Configuration Tuning of Big Data Analytics. / Fekry, Ayat; Carata, Lucian; Pasquier, Thomas; Rice, Andrew; Hopper, Andy.

2019 IEEE International Conference on Distributed Computing Systems (ICDCS 2019). Institute of Electrical and Electronics Engineers (IEEE), 2019.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Harvard

Fekry, A, Carata, L, Pasquier, T, Rice, A & Hopper, A 2019, Towards Seamless Configuration Tuning of Big Data Analytics. in 2019 IEEE International Conference on Distributed Computing Systems (ICDCS 2019). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/ICDCS.2019.00189

APA

Fekry, A., Carata, L., Pasquier, T., Rice, A., & Hopper, A. (2019). Towards Seamless Configuration Tuning of Big Data Analytics. In 2019 IEEE International Conference on Distributed Computing Systems (ICDCS 2019) Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/ICDCS.2019.00189

Vancouver

Fekry A, Carata L, Pasquier T, Rice A, Hopper A. Towards Seamless Configuration Tuning of Big Data Analytics. In 2019 IEEE International Conference on Distributed Computing Systems (ICDCS 2019). Institute of Electrical and Electronics Engineers (IEEE). 2019 https://doi.org/10.1109/ICDCS.2019.00189

Author

Fekry, Ayat ; Carata, Lucian ; Pasquier, Thomas ; Rice, Andrew ; Hopper, Andy. / Towards Seamless Configuration Tuning of Big Data Analytics. 2019 IEEE International Conference on Distributed Computing Systems (ICDCS 2019). Institute of Electrical and Electronics Engineers (IEEE), 2019.

Bibtex

@inproceedings{c9a9e27774f444f5b5aeaefa12ce94f6,
title = "Towards Seamless Configuration Tuning of Big Data Analytics",
abstract = "The execution of distributed data processing workloads (such as those running on top of Hadoop or Spark) in cloud environments presents a unique opportunity to explore multiple trade-offs between elasticity (and types of resources being allocated), overall runtime and total costs. However, beyond high-level constraints and objectives, it's not the end-users who should be mainly concerned with those optimizations, but the cloud providers. They have both the vantage point to collect actionable information, economies of scale and position to adjust parameters when dynamic conditions change, in order to fulfil SLOs that go beyond classic measures of latency and throughput.This is at odds with the existing approach of making software (including the interfaces to the cloud and the processing frameworks) as configurable as possible. We propose that rather than configurability, self-tunability (or the illusion of it as far as the end-user is concerned) is a better long-term goal.",
keywords = "Tuning, Sparks, Cloud computing, Runtime, Optimisation, Piplines, Measurement, Big data, data analysis, data handling, parallel processing, big data analytics, distributed data workloads, cloud environments, multiple trade-offs, elasticity, high-level constraints, cloud providers, vantage point, actionable information, dynamic conditions change, classic measures, processing frameworks, end-user, seamless configuration tuning, SLO, Configuration Tuning, Data intensive computing",
author = "Ayat Fekry and Lucian Carata and Thomas Pasquier and Andrew Rice and Andy Hopper",
year = "2019",
doi = "10.1109/ICDCS.2019.00189",
language = "English",
booktitle = "2019 IEEE International Conference on Distributed Computing Systems (ICDCS 2019)",
publisher = "Institute of Electrical and Electronics Engineers (IEEE)",
address = "United States",

}

RIS - suitable for import to EndNote

TY - GEN

T1 - Towards Seamless Configuration Tuning of Big Data Analytics

AU - Fekry, Ayat

AU - Carata, Lucian

AU - Pasquier, Thomas

AU - Rice, Andrew

AU - Hopper, Andy

PY - 2019

Y1 - 2019

N2 - The execution of distributed data processing workloads (such as those running on top of Hadoop or Spark) in cloud environments presents a unique opportunity to explore multiple trade-offs between elasticity (and types of resources being allocated), overall runtime and total costs. However, beyond high-level constraints and objectives, it's not the end-users who should be mainly concerned with those optimizations, but the cloud providers. They have both the vantage point to collect actionable information, economies of scale and position to adjust parameters when dynamic conditions change, in order to fulfil SLOs that go beyond classic measures of latency and throughput.This is at odds with the existing approach of making software (including the interfaces to the cloud and the processing frameworks) as configurable as possible. We propose that rather than configurability, self-tunability (or the illusion of it as far as the end-user is concerned) is a better long-term goal.

AB - The execution of distributed data processing workloads (such as those running on top of Hadoop or Spark) in cloud environments presents a unique opportunity to explore multiple trade-offs between elasticity (and types of resources being allocated), overall runtime and total costs. However, beyond high-level constraints and objectives, it's not the end-users who should be mainly concerned with those optimizations, but the cloud providers. They have both the vantage point to collect actionable information, economies of scale and position to adjust parameters when dynamic conditions change, in order to fulfil SLOs that go beyond classic measures of latency and throughput.This is at odds with the existing approach of making software (including the interfaces to the cloud and the processing frameworks) as configurable as possible. We propose that rather than configurability, self-tunability (or the illusion of it as far as the end-user is concerned) is a better long-term goal.

KW - Tuning

KW - Sparks

KW - Cloud computing

KW - Runtime

KW - Optimisation

KW - Piplines

KW - Measurement

KW - Big data

KW - data analysis

KW - data handling

KW - parallel processing

KW - big data analytics

KW - distributed data workloads

KW - cloud environments

KW - multiple trade-offs

KW - elasticity

KW - high-level constraints

KW - cloud providers

KW - vantage point

KW - actionable information

KW - dynamic conditions change

KW - classic measures

KW - processing frameworks

KW - end-user

KW - seamless configuration tuning

KW - SLO

KW - Configuration Tuning

KW - Data intensive computing

U2 - 10.1109/ICDCS.2019.00189

DO - 10.1109/ICDCS.2019.00189

M3 - Conference contribution

BT - 2019 IEEE International Conference on Distributed Computing Systems (ICDCS 2019)

PB - Institute of Electrical and Electronics Engineers (IEEE)

ER -