Contents

1 Introduction

Access to HSDS in AWS will end Aug 1 2018. HDFlab must be used. I am focusing for the moment on HDF Server that we have control over. HSDS will become open source at the end of 2018.

In its current form rhdf5client is focused on providing access to HDF5 datasets corresponding to R matrices. The test data, provided by HDF Group, for HDF Server illustrates many aspects of data structure, data type, and server operations, that rhdf5client does not address.

We do not have to be comprehensive in our client design but it would be good to address a few more functionalities. For example I do not think we have code that allows management of a vector as opposed to a matrix. And our code is not cleanly factored into server operations, data access, and R interfacing.

2 A new class that ‘simplifies’ some additional tasks

This document defines a class that manages dataset attributes as defined by HDF Server.

getClass("H5S_dsattrs")
## Class "H5S_dsattrs" [in ".GlobalEnv"]
## 
## Slots:
##                                                   
## Name:       attrs        src    hrefVec    theCall
## Class:       list H5S_source  character        ANY

We can generate an instance of this class for the GTEx data, and take a relatively unprocessed slice of the content.

tissatt = H5S_attr_for_host( ss, "tissues", 
    prefix="host=", postfix=".h5s.channingremotedata.org")
tissatt
## H5S_dsattrs instance:
## shape.dims:
##  [1]  9662 58037
## A preview URL string is available with prevURL().
getSlice(tissatt,,"[0:5:1,0:3:1]")
## [[1]]
## [1] 339689     30 217552
## 
## [[2]]
## [1]   98669     764 1076085
## 
## [[3]]
## [1]  54697   1290 168577
## 
## [[4]]
## [1]  122656    1018 1034842
## 
## [[5]]
## [1]  483327  159954 1257228

This would seem to be a step backwards. However, we can get access to more complicated HDF5 data in this way.

lkdt = H5S_attr_for_host( ss, "datatypes.datasettest.test", prefix="host=", postfix=".h5s.channingremotedata.org")
## there are multiple dataset UUIDs for this request
## returning a list of H5S_dsattrs instances
t(sapply(lkdt, function(x) x@attrs$type))
##       base             class        
##  [1,] "H5T_STD_U32LE"  "H5T_INTEGER"
##  [2,] "H5T_STD_I16BE"  "H5T_INTEGER"
##  [3,] "H5T_STD_U64BE"  "H5T_INTEGER"
##  [4,] "H5T_STD_I64LE"  "H5T_INTEGER"
##  [5,] "H5T_STD_U16BE"  "H5T_INTEGER"
##  [6,] "H5T_IEEE_F32LE" "H5T_FLOAT"  
##  [7,] "H5T_IEEE_F32BE" "H5T_FLOAT"  
##  [8,] "H5T_STD_U64LE"  "H5T_INTEGER"
##  [9,] "H5T_IEEE_F64LE" "H5T_FLOAT"  
## [10,] "H5T_STD_I32BE"  "H5T_INTEGER"
## [11,] "H5T_STD_U32BE"  "H5T_INTEGER"
## [12,] "H5T_STD_U16LE"  "H5T_INTEGER"
## [13,] "H5T_STD_I32LE"  "H5T_INTEGER"
## [14,] "H5T_IEEE_F64BE" "H5T_FLOAT"  
## [15,] "H5T_STD_I64BE"  "H5T_INTEGER"
## [16,] "H5T_STD_I16LE"  "H5T_INTEGER"

We can also take slices of vectors.