Hdf5 Core Driver (h5fd_core): Loading Selected Dataset(s)
Currently, I load HDF5 data in python via h5py and read a dataset into memory. f = h5py.File('myfile.h5', 'r') dset = f['mydataset'][:] This works, but if 'mydataset' is the only
Solution 1:
I guess it is the same problem as if you read the file by looping over an abitrary axis without setting a proper chunk-cache-size.
If you are reading it with the core driver, it is guaranteed that the whole file is read sequentially from the disk and everything else (decompressing, chunked data to compact data,...) is done completely in RAM.
I used the simplest form of fancy slicing example from here https://stackoverflow.com/a/48405220/4045774 to write the data.
import h5py as h5
import time
import numpy as np
import h5py_cache as h5c
def Reading():
File_Name_HDF5='Test.h5'
t1=time.time()
f = h5.File(File_Name_HDF5, 'r',driver='core')
dset = f['Test'][:]
f.close()
print(time.time()-t1)
t1=time.time()
f = h5c.File(File_Name_HDF5, 'r',chunk_cache_mem_size=1024**2*500)
dset = f['Test'][:]
f.close()
print(time.time()-t1)
t1=time.time()
f = h5.File(File_Name_HDF5, 'r')
dset = f['Test'][:]
print(time.time()-t1)
f.close()
if __name__ == "__main__":
Reading()
This gives on my machine 2,38s (core driver), 2,29s (with 500 MB chunk-cache-size), 4,29s (with the default chunk-cache-size of 1MB)
Post a Comment for "Hdf5 Core Driver (h5fd_core): Loading Selected Dataset(s)"