RT Journal Article SR Electronic T1 A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data JF bioRxiv FD Cold Spring Harbor Laboratory SP 067371 DO 10.1101/067371 A1 Giovanni Delussu A1 Luca Lianas A1 Francesca Frexia A1 Gianluigi Zanetti YR 2016 UL http://biorxiv.org/content/early/2016/08/02/067371.abstract AB This work presents a scalable data access layer, called PyEHR, intended for building data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts openEHR formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structures indexing to speed up searches. The persistence is guarantee by a driver layer with a common driver interface. Presently, are implemented the interfaces with two NoSQL DBMS: MongoDB and Elasticsearch. The scalability of PyEHR has been evaluated experimentally through two types of tests, namely constant load and constant number of records, with queries of increasing complexity on a two synthetic datasets of ten millions records each, containing very complex openEHR archetype structures, distributed on up to ten working nodes.