Journal of Shanghai University(Natural Science Edition) ›› 2018, Vol. 24 ›› Issue (5): 730-744.doi: 10.12066/j.issn.1007-2861.1843

• Research Articles • Previous Articles     Next Articles

Application of data provenance in multi-version documents retrieval

CHEN Yue1, DONG Hongbin2, TAN Chengyu1, LIANG Yiwen1()   

  1. 1. School of Computer Science, Wuhan University, Wuhan 430072, China
    2. International School of Software, Wuhan University, Wuhan 430079, China
  • Received:2016-09-23 Online:2018-10-30 Published:2018-10-26
  • Contact: LIANG Yiwen E-mail:ywliang@whu.edu.cn

Abstract:

As the big data era emerges, the number of document versions is rapidly growing to make document retrieval difficult. Related studies show that provenance information is an important cue in helping users find needed documents. Information retrieval researches based on data provenance often capture files events that cannot describe particular relationship between documents, and therefore are not useful enough for re-finding documents. This paper presents a provenance model based on PROV at the content level, and constructs a specific vocabulary for multi-version documents retrieval. Furthermore, a low-level mode is described with resource description framework (RDF), and the high-level is formed based on query of the former. Finally, to give users a more accessible way to evaluate information, a visualization method of the provenance information is proposed. The results show that the model provides users with more valuable cues by using provenance information to expand retrieval results, and help them find target document quickly and improve efficiency.

Key words: multi-version documents, document retrieval, data provenance, PROV

CLC Number: