Skip to content Skip to sidebar Skip to footer

Inexplicable Behavior When Using Vlen With H5py

I am using h5py to build a dataset. Since I want to store arrays with different #of rows dimension, I use the h5py special_type vlen. However, I experience behavior I can't explain

Solution 1:

I think

train_targets[0] = test

has stored your (11,5) array as an F ordered array in a row of train_targets. According to the (9549,5) shape, that's a row of 5 elements. And since it is vlen, each element is a 1d array of length 11.

That's what you get back in train_targets[0] - an array of 5 arrays, each shape (11,), with values taken from test (order F).

So I think there are 2 issues - what a 2d shape means, and what vlen allows.


My version of h5py is pre v2.3, so I only get string vlen. But I suspect your problem may be that vlen only works with 1d arrays, an extension, so to speak, of byte strings.

Does the 5 in shape=(9549, 5,) have anything to do with 5 in the test.shape? I don't think it does, at least not as numpy and h5py see it.

When I make a file following the string vlen example:

>>>f = h5py.File('foo.hdf5')>>>dt = h5py.special_dtype(vlen=str)>>>ds = f.create_dataset('VLDS', (100,100), dtype=dt)

and then do:

ds[0]='this one string'

and look at ds[0], I get an object array with 100 elements, each being this string. That is, I've set a whole row of ds.

ds[0,0]='another'

is the correct way to set just one element.

vlen is 'variable length', not 'variable shape'. While the https://www.hdfgroup.org/HDF5/doc/TechNotes/VLTypes.html documentation is not entirely clear on this, I think you can store 1d arrays with shape (11,) and (38,) with vlen, but not 2d ones.


Actually, train_targets output is reproduced with:

In [54]: test1=np.empty((5,),dtype=object)
In [55]: for i in range(5):
    test1[i]=test.T.flatten()[i:i+11]

It's 11 values taken from the transpose (F order), but shifted for each sub array.

Post a Comment for "Inexplicable Behavior When Using Vlen With H5py"