About

Log in?

DTU users get better search results including licensed content and discounts on order fees.

Anyone can log in and get personalized features such as favorites, tags and feeds.

Log in as DTU user Log in as non-DTU user No thanks

DTU Findit

Conference paper

An ETL optimization framework using partitioning and parallelization

In Proceedings of the 30th Annual Acm Symposium on Applied Computing — 2015, pp. 1015-1022
From

University of Waterloo, Canada1

600141712

University College of Northern, Denmark3

Extract-Transform-Load (ETL) handles large amounts of data and manages workload through dataflows. ETL dataflows are widely regarded as complex and expensive operations in terms of time and system resources. In order to minimize the time and the resources required by ETL dataflows, this paper presents an optimization framework using partitioning and parallelization.

The framework first partitions an ETL dataflow into multiple execution trees according to the characteristics of ETL constructs, then within an execution tree pipelined parallelism and shared cache are used to optimize the partitioned dataflow. Furthermore, multi-threading is used in component-based optimization.

The experimental results show that the proposed framework can achieve 4.7 times faster than the ordinary ETL dataflows (without using the proposed partitioning and optimization methods), and is comparable to the similar ETL tools.

Language: English
Year: 2015
Pages: 1015-1022
Types: Conference paper
DOI: 10.1145/2695664.2695846

DTU users get better search results including licensed content and discounts on order fees.

Log in as DTU user

Access

Analysis