Managing large multidimensional hydrologic datasets

A case study comparing NetCDF and SciDB

Journal article (2018)

Authors

H. Liu OLD Department of GIS Technology

P.J.M. van Oosterom OLD Department of GIS Technology

T.P.M. Tijssen OLD Department of GIS Technology

T.J.F. Commandeur Urban Data Science - Architecture and the Built Environment

Wen Wang Hohai University

Research Group

OLD Department of GIS Technology

NetCDF SciDB Chunked storage structure Hydrologic dataset

To reference this document use:

http://resolver.tudelft.nl/uuid:de98bc8f-ec52-4784-aba1-7e77ee807f15

More Info

expand_more

Published Date

2018

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Research Group

OLD Department of GIS Technology

Abstract

Management of large hydrologic datasets including storage, structuring, clustering, indexing, and query is one of the crucial challenges in the era of big data. This research originates from a specific problem: time series extraction at specific locations takes a long time when a large multidimensional (MD) dataset is stored in the NetCDF classic or the 64-bit offset format. The essence of this issue lies in the contiguous storage structure adopted by NetCDF. In this research, NetCDF file-based solutions and a MD array database management system applying a chunked storage structure are benchmarked to determine the best solution for storing and querying large MD hydrologic datasets. Expert consultancy was conducted to establish benchmark sets, with the HydroNET-4 system being utilized to provide the benchmark environment. In the final benchmark tests, the effect of data storage configurations, elaborating chunk size, dimension order (spatio-temporal clustering) and compression on the query performance, is explored. Results indicate that for big hydrologic MD data management, the properly chunked NetCDF-4 solution without compression is, in general, more efficient than the SciDB DBMS. However, benefits of a DBMS should not be neglected, for example, the integration with other data types, smart caching strategies, transaction support, scalability, and out-of-The-box support for parallelization.

Files

Jh0201058.pdf

(pdf | 0.534 Mb)

Unknown license

Download not available