面向大数据应用的多层次混合式并行方法

黄磊1, 支小莉1, 郑圣安2

doi:10.3969/j.issn.1007-2861.2015.04.017

上海大学学报(自然科学版) >

2016 , Vol. 22 >Issue 1: 69 - 80

DOI: https://doi.org/10.3969/j.issn.1007-2861.2015.04.017

大数据

面向大数据应用的多层次混合式并行方法

展开

1. 上海大学计算机工程与科学学院, 上海 200444; 2. 上海交通大学计算机科学与工程系, 上海 200240

收稿日期: 2015-11-19

网络出版日期: 2016-02-29

基金资助

上海市科委科研计划资助项目(15DZ1100305)

收起

Multilevel hybrid parallel method for big data applications

Expand

1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China; 2. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Received date: 2015-11-19

Online published: 2016-02-29

Fold

摘要

基于很多大数据应用存在对数据进行多种并行处理的需求, 提出两层混合式并行方法, 即执行单元的混合并行和计算模型的混合并行. 通过在同一个计算节点上执行单元的混合并行, 充分挖掘基础设施的计算能力, 从而提高数据处理性能; 采用在同一个执行引擎中集成多个计算模型的并行方法, 以适合应用多样异质处理模式. 不同的混合并行方法可以契合不同的数据和计算特点, 以满足不同的并行目标. 介绍了混合式并行方法的基本思想, 并以前期开发的并行编程模型BSPCloud为基础, 阐述了进程和线程混合并行、BSP和MapReduce混合并行的主要实现机制.

关键词： BSP); MapReduce; 编程模型; 混合并行; 整体同步并行(bulk synchronous parallel

本文引用格式

黄磊1, 支小莉1, 郑圣安2 . 面向大数据应用的多层次混合式并行方法[J]. 上海大学学报(自然科学版), 2016 , 22(1) : 69 -80 . DOI: 10.3969/j.issn.1007-2861.2015.04.017

Abstract

Many large data applications require a variety of parallel data processing. This paper presents a two-layer hybrid parallel method, i.e., hybrid parallel of execution units and hybrid parallel of computing model. By hybrid parallel of execution units on the same computing node. The computing power of infrastructure can be fully taped, and thus data processing performance can be improved. By integrating several calculation models into the same execution engine in a parallel way, diverse heterogeneous processing modes may be applied. Different hybrid parallel ways can meet different data and calculation characteristics, and meet different parallel objectives as well. This paper introduces the basic ideas of hybrid parallel methods, and describes main implementation mechanisms of hybrid parallelism.

Key words： bulk synchronous parallel (BSP); hybrid parallelism; MapReduce; programming model

参考文献

[1] Lynch C. Big data: how do your data grow? [J]. Nature, 2008, 455(4): 28-29.
[2] Goldston D. Big data: data wrangling [J]. Nature, 2008, 455(4): 15.
[3] Wang S, Wang H J, Qin X P, et al. Architecting big data: challenges, studies and forecasts [J]. Chinese Journal of Computers, 2011, 34(10): 1741-1752.
[4] Qin X P, Wang H J, Li F R, et al. New landscape of data management technologies [J]. Journal of Software, 2013, 24(2): 175-197.
[5] Zhang Y S, Jiao M, Wang Z W, et al. One-size-fits-all OLAP technique for big data analysis [J]. Chinese Journal of Computers, 2011, 34(10): 1936-1946.
[6] Gong X Q, Jin C Q, Wang X L, et al. Data-intensive science and engineering: requirements and challenges [J]. Chinese Journal of Computers, 2012, 35(8): 1563-1578.
[7] Ma K, Yang B. Log-based change data capture from schema-free document stores using Map-Reduce [C]//2015 International Conference on Cloud Technologies and Applications (CloudTech). 2015: 1-6.
[8] Jung G, Gnanasambandam N, Mukherjee T. Synchronous parallel processing of bigdata [C]//2012 IEEE fifth International Conference on Cloud Computing. 2012: 811-818.
[9] Liu X, Gao W, Hu Z Y. Hybrid parallel bundle adjustment for 3D scene reconstruction with massive points [J]. Journal of Computer Science and Technology, 2012, 27(6): 1269-1280.
[10] Feinbube F, Sobania J A, Tr¨oger P, et al. Light-weight programming of hybrid systems [J]. Parallel & Cloud Computing, 2012, 1(2): 34-44.

[11] Wang P, Meng D, Han J Z, et al. Transformer: a new paradigm for building data-parallel programming models [J]. Micro IEEE, 2010, 30(4): 55-64.
[12] Pace M F. BSP vs. MapReduce [J]. Procedia Computer Science, 2012, 9: 246-255.
[13] 潘巍, 李战怀, 伍赛, 等. 基于消息传递机制的MapReduce 图算法研究[J]. 计算机学报, 2011, 34(10): 1768-1784.
[14] Fegaras L. Supporting bulk synchronous parallelism in Map-Reduce queries [C]//High Performance Computing, Networking, Storage and Analysis (SCC). 2012: 1068-1077.
[15] Qin X P, Wang H J, Du X Y, et al. Big data analysis-competition and symbiosis of RDBMS and MapReduce [J]. Journal of Software, 2012, 23(1): 32-45.
[16] Ding L L, Xin J C, Wang G R, et al. Efficient skyline query processing of massive data based on MapReduce [J]. Chinese Journal of Computers, 2011, 34(10): 1785-1796.
[17] Valiant L G. A bridging model for parallel computation [J]. Communication of the ACM, 1990, 33(8): 103-111.
[18] Malewicz G, Austern M H, Bik A J C, et al. Pregel: a system for large-scale graph processing [C]//Proceedings of the 2010 International Conference on Management of Data. 2010:
135-145.
[19] HAMA-a general BSP framework on top of Hadoop [EB/OL]. [2015-10-20]. http://hama.apache.org.
[20] Avery C. Giraph: large-scale graph processing infrastructure on Hadoop [C]//Proceedings of the Hadoop Summit. 2011: 1-8.
[21] Liu X D, Tong W Q, Fu Z R, et al. BSPCloud: a hybrid distributed-memory and sharedmemory programming model [J]. International Journal of Grid and Distributed Computing, 2013, 6(1): 87-98.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献