Diagnosing data center behavior flow by flow

14Citations
Citations of this article
41Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Multi-tenant data centers are complex environments, running thousands of applications that compete for the same infrastructure resources and whose behavior is guided by (sometimes) divergent configurations. Small workload changes or simple operator tasks may yield unpredictable results and lead to expensive failures and performance degradation. In this paper, we propose a holistic approach for detecting operational problems in data centers. Our framework, FlowDiff, collects information from all entities involved in the operation of a data center - applications, operators, and infrastructure - and continually builds behavioral models for the operation. By comparing current models with pre-computed, known-to-be-stable models, FlowDiff is able to detect many operational problems, ranging from host and network failures to unauthorized access. FlowDiff also identifies common system operations (e.g., VM migration, software upgrades) to validate the behavior changes against planned operator tasks. We show that using passive measurements on control traffic from programmable switches to a centralized controller is sufficient to build strong behavior models; FlowDiff does not require active measurements or expensive server instrumentation. Our experimental results using NEC data center testbed, Amazon EC2, and simulations demonstrate that FlowDiff is effective and robust in detecting anomalous behavior. FlowDiff scales well with the number of applications running in the data center and their traffic volume. © 2013 IEEE.

References Powered by Scopus

OpenFlow: Enabling Innovation in Campus Networks

7366Citations
N/AReaders
Get full text

Network traffic characteristics of data centers in the wild

1913Citations
N/AReaders
Get full text

NOX: Towards an Operating System for Networks

1172Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Software-defined networking: A comprehensive survey

3819Citations
N/AReaders
Get full text

An online fault detection model and strategies based on SVM-grid in clouds

216Citations
N/AReaders
Get full text

Resource allocation in SDN based 5G cellular networks

29Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Arefin, A., Singh, V. K., Jiang, G., Zhang, Y., & Lumezanu, C. (2013). Diagnosing data center behavior flow by flow. In Proceedings - International Conference on Distributed Computing Systems (pp. 11–20). https://doi.org/10.1109/ICDCS.2013.18

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 28

80%

Researcher 5

14%

Professor / Associate Prof. 1

3%

Lecturer / Post doc 1

3%

Readers' Discipline

Tooltip

Computer Science 35

90%

Engineering 2

5%

Physics and Astronomy 1

3%

Neuroscience 1

3%

Save time finding and organizing research with Mendeley

Sign up for free