Research Articles

Application of data provenance in multi-version documents retrieval

Expand
  • 1. School of Computer Science, Wuhan University, Wuhan 430072, China
    2. International School of Software, Wuhan University, Wuhan 430079, China

Received date: 2016-09-23

  Online published: 2018-10-26

Abstract

As the big data era emerges, the number of document versions is rapidly growing to make document retrieval difficult. Related studies show that provenance information is an important cue in helping users find needed documents. Information retrieval researches based on data provenance often capture files events that cannot describe particular relationship between documents, and therefore are not useful enough for re-finding documents. This paper presents a provenance model based on PROV at the content level, and constructs a specific vocabulary for multi-version documents retrieval. Furthermore, a low-level mode is described with resource description framework (RDF), and the high-level is formed based on query of the former. Finally, to give users a more accessible way to evaluate information, a visualization method of the provenance information is proposed. The results show that the model provides users with more valuable cues by using provenance information to expand retrieval results, and help them find target document quickly and improve efficiency.

Cite this article

CHEN Yue, DONG Hongbin, TAN Chengyu, LIANG Yiwen . Application of data provenance in multi-version documents retrieval[J]. Journal of Shanghai University, 2018 , 24(5) : 730 -744 . DOI: 10.12066/j.issn.1007-2861.1843

References

[1] Lyman P, Varian H R. How much information 2003 [EB/OL]. (2003-10-27)[2016-10-05]. http://www2.sims.berkeley.edu/research/projects/how-much-info.
[2] Blanc-Brude T, Scapin D L. What do people recall about their documents: implications for desktop search tools[C]// Proceedings of the 12th International Conference on Intelligent User Interfaces, ACM. 2007: 102-111.
[3] Shah S, Soules C A N, Ganger G R, et al. Using provenance to aid in personal file search[C]// USENIX Annual Technical Conference. 2007: 171-184.
[4] Soules C A N, Ganger G R. Connections: using context to enhance file search[J]. ACM SIGOPS Operating Systems Review, 2005,39(5):119-132.
[5] Stumpf S, Fitzhenry E, Dietterich T G. The use of provenance in information retrieval[C]// Workshop on Principles of Provenance. 2007: 20.
[6] Chau D H, Myers B, Faulring A. Feldspar: a system for finding information by association[C]// Proceedings of Personal Information Management. 2008: 131-138.
[7] 戴超凡, 王涛, 张鹏程. 数据起源技术发展研究综述[J]. 计算机应用研究, 2010,27(9):3215-3221.
[8] Rinck M, Hinze A, Bainbridge D, et al. Document DNA: content centric provenance data tracking in documents[C]// Proceedings of the 37th Australasian Computer Science Conference. 2014: 57-66.
[9] Lu C T, Shukla M, Subramanya S H, et al. Performance evaluation of desktop search engines[C]// IEEE International Conference Information Reuse and Integration. 2007: 110-115.
[10] Jensen C, Lonsdale H, Wynn E, et al. The life and times of files and information: a study of desktop provenance[C]// Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM. 2010: 767-776.
[11] Muniswamy-Reddy K K, Holland D A, Braun U, et al. Provenance-aware storage systems[C]// USENIX Annual Technical Conference, General Track. 2006: 43-56.
[12] Dragunov A N, Dietterich T G, Johnsrude K, et al. TaskTracer: a desktop environment to support multi-tasking knowledge workers[C]// Proceedings of the 10th International Conference on Intelligent User Interfaces, ACM. 2005: 75-82.
[13] Yamamoto K, Kuriyama T, Shigemori H, et al. Provenance based retrieval: file retrieval system using history of moving and editing in user experience[C]// Computer Software and Applications Conference. 2011: 618-625.
[14] Ball R. Don't search, just show me what I did: visualizing provenance of documents and applications[J]. International Journal of Human-Computer Interaction, 2013,29(3):156-168.
[15] Luc M, Paolo M. PROV-DM: the PROV data model [EB/OL]. [2016-10-05]. https://www.w3.org/TR/2013/REC-prov-dm-20130430/.
[16] Paul G, Luc M. PROV-overview: an overview of the PROV family of docu- ments [EB/OL]. [2016-10-05]. https://www.w3.org/TR/2013/NOTE-prov-overview-20130430/.
[17] Moreau L, Freire J, Futrelle J, et al. The open provenance model: an over- view[C]// International Provenance and Annotation Workshop. 2008: 323-326.
[18] 倪静, 孟宪学. PROV数据溯源模型及Web应用[J]. 图书情报工作, 2014,58(3):13-19.
[19] Tom D N, James C, Paolo M, et al. Constraints of the PROV data model [EB/OL]. [2016-10-05]. https://www.w3.org/TR/2013/REC-prov-constraints-20130430/.
[20] Eric P, Andy S. SPARQL query language for RDF [EB/OL]. [2016-10-05]. https://www.w3.org/TR/rdf-sparql-query/.
Outlines

/