Multilevel hybrid parallel method for big data applications

HUANG Lei1, ZHI Xiaoli1, ZHENG Shengan2

doi:10.3969/j.issn.1007-2861.2015.04.017

Journal of Shanghai University >

2016 , Vol. 22 >Issue 1: 69 - 80

DOI: https://doi.org/10.3969/j.issn.1007-2861.2015.04.017

Multilevel hybrid parallel method for big data applications

Expand

1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China; 2. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Received date: 2015-11-19

Online published: 2016-02-29

Fold

Abstract

Many large data applications require a variety of parallel data processing. This paper presents a two-layer hybrid parallel method, i.e., hybrid parallel of execution units and hybrid parallel of computing model. By hybrid parallel of execution units on the same computing node. The computing power of infrastructure can be fully taped, and thus data processing performance can be improved. By integrating several calculation models into the same execution engine in a parallel way, diverse heterogeneous processing modes may be applied. Different hybrid parallel ways can meet different data and calculation characteristics, and meet different parallel objectives as well. This paper introduces the basic ideas of hybrid parallel methods, and describes main implementation mechanisms of hybrid parallelism.

Key words： bulk synchronous parallel (BSP); hybrid parallelism; MapReduce; programming model

Cite this article

HUANG Lei1, ZHI Xiaoli1, ZHENG Shengan2 . Multilevel hybrid parallel method for big data applications[J]. Journal of Shanghai University, 2016 , 22(1) : 69 -80 . DOI: 10.3969/j.issn.1007-2861.2015.04.017

References

[1] Lynch C. Big data: how do your data grow? [J]. Nature, 2008, 455(4): 28-29.
[2] Goldston D. Big data: data wrangling [J]. Nature, 2008, 455(4): 15.
[3] Wang S, Wang H J, Qin X P, et al. Architecting big data: challenges, studies and forecasts [J]. Chinese Journal of Computers, 2011, 34(10): 1741-1752.
[4] Qin X P, Wang H J, Li F R, et al. New landscape of data management technologies [J]. Journal of Software, 2013, 24(2): 175-197.
[5] Zhang Y S, Jiao M, Wang Z W, et al. One-size-fits-all OLAP technique for big data analysis [J]. Chinese Journal of Computers, 2011, 34(10): 1936-1946.
[6] Gong X Q, Jin C Q, Wang X L, et al. Data-intensive science and engineering: requirements and challenges [J]. Chinese Journal of Computers, 2012, 35(8): 1563-1578.
[7] Ma K, Yang B. Log-based change data capture from schema-free document stores using Map-Reduce [C]//2015 International Conference on Cloud Technologies and Applications (CloudTech). 2015: 1-6.
[8] Jung G, Gnanasambandam N, Mukherjee T. Synchronous parallel processing of bigdata [C]//2012 IEEE fifth International Conference on Cloud Computing. 2012: 811-818.
[9] Liu X, Gao W, Hu Z Y. Hybrid parallel bundle adjustment for 3D scene reconstruction with massive points [J]. Journal of Computer Science and Technology, 2012, 27(6): 1269-1280.
[10] Feinbube F, Sobania J A, Tr¨oger P, et al. Light-weight programming of hybrid systems [J]. Parallel & Cloud Computing, 2012, 1(2): 34-44.

[11] Wang P, Meng D, Han J Z, et al. Transformer: a new paradigm for building data-parallel programming models [J]. Micro IEEE, 2010, 30(4): 55-64.
[12] Pace M F. BSP vs. MapReduce [J]. Procedia Computer Science, 2012, 9: 246-255.
[13] 潘巍, 李战怀, 伍赛, 等. 基于消息传递机制的MapReduce 图算法研究[J]. 计算机学报, 2011, 34(10): 1768-1784.
[14] Fegaras L. Supporting bulk synchronous parallelism in Map-Reduce queries [C]//High Performance Computing, Networking, Storage and Analysis (SCC). 2012: 1068-1077.
[15] Qin X P, Wang H J, Du X Y, et al. Big data analysis-competition and symbiosis of RDBMS and MapReduce [J]. Journal of Software, 2012, 23(1): 32-45.
[16] Ding L L, Xin J C, Wang G R, et al. Efficient skyline query processing of massive data based on MapReduce [J]. Chinese Journal of Computers, 2011, 34(10): 1785-1796.
[17] Valiant L G. A bridging model for parallel computation [J]. Communication of the ACM, 1990, 33(8): 103-111.
[18] Malewicz G, Austern M H, Bik A J C, et al. Pregel: a system for large-scale graph processing [C]//Proceedings of the 2010 International Conference on Management of Data. 2010:
135-145.
[19] HAMA-a general BSP framework on top of Hadoop [EB/OL]. [2015-10-20]. http://hama.apache.org.
[20] Avery C. Giraph: large-scale graph processing infrastructure on Hadoop [C]//Proceedings of the Hadoop Summit. 2011: 1-8.
[21] Liu X D, Tong W Q, Fu Z R, et al. BSPCloud: a hybrid distributed-memory and sharedmemory programming model [J]. International Journal of Grid and Distributed Computing, 2013, 6(1): 87-98.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References