A dimensional Bus Model for Integrating Clinical and Research Data. Wade td, Hum rc and Murphy jr. National Jewish Health




Скачать 13.54 Kb.
НазваниеA dimensional Bus Model for Integrating Clinical and Research Data. Wade td, Hum rc and Murphy jr. National Jewish Health
Дата26.09.2012
Размер13.54 Kb.
ТипДокументы

A Dimensional Bus Model for Integrating Clinical and Research Data. Wade TD, Hum RC and Murphy JR. National Jewish Health

Appendix B

Dimensional Bus Model: Operational Details


Query Performance


We have a set of 33 template queries that find basic populations of interest to researchers in our organization. Under conditions that simulate real use (but with lightly loaded servers), these queries take at most a few seconds (mean=1.689 seconds, Table 1) to return results to the user. We repeated the same tests on a smaller database that had been created for testing. This database contained one-tenth of the observations of the production database, selected randomly. The same template queries on this smaller database, under comparable conditions, took an average of 0.898 seconds. Thus, query response time scales favorably in the range of database sizes we have tried—increasing the amount of data by ten times only results in a doubling of response time.


The template queries were intended to be the basis for more elaborate queries, and might not reflect expected performance for actual end users. We, therefore, tested response time for 27 of the more complex queries that had been developed for beta users. Mean response time was 1.811 seconds (Table 1), making response times low enough to be convenient for interactive data exploration.


Table 1: Query Response Time for Different Database Sizes and Query Sets.

Template queries are “starter” queries upon which users can build. Real end-user queries were created to answer research questions for beta users.

Database size (# obs records)

430,500

4,352,000

4,352,000

Number of queries

33

33

27

Query set

template

template

real end-user

Mean response time (secs)

0.898

1.689

1.811

Std. deviation of response time

0.576

0.572

0.839



Loading

At its current size, reloading the entire database from its source databases takes about eight hours. Testing for fidelity of the load and planning for mitigation of new problems in the next load can take two more workdays. This cycle should decrease as our use of the relatively new (2.5-year old) EMR stabilizes. While our Bus includes information to support incremental loading, such as timestamps and hashed original source record keys, we are undecided as to whether to continue with full reloads or develop incremental loading. The experience from the I2B2* project is that incremental loading is less desirable because it is hard to reduce incremental accumulation of data errors. (*Murphy SN. Data warehousing for clinical research. In: Liu L, Özsu MT, editors. Encyclopedia of database systems. New York: Springer; 2009. )


Metadata Curation


We have found that reusing study data for other research requires the curation of more metadata than is needed by the study itself because the original researchers often “know” information that is not recorded in the study dictionary. In the period prior to our beta release we probably spent more than half of our resources on analysis of source data and curation of metadata in our Metadata Repository. About two-thirds of that time was spent understanding what our caregivers are entering into the EMR. For a research data warehouse, the largest fraction of the effort is in understanding your source data, regardless of whether you write your own programs and structures or use those written by others.


Integration with Biorepository


Actual management of samples in our institutional biorepository is done by software specialized for the purpose: Freezerworks, from Dataworks Development, Inc (http:/www.freezerworks.com/). Freezerworks is just another data source for our data warehouse. Our goal was to make the management of the biorepository as simple as possible, focusing only on sample processing, management, storage and retrieval. The biorepository does not have to manage phenotypic data or worry about preserving confidentiality. Therefore:

  • The biorepository does not store any phenotypic information, but relies on the phenotypic information in the research database.

  • The biorepository does not store any direct personal identifiers.

  • It receives the medical record number and patient consent information with a sample. Using a special application, the biorepository enters the association of medical record number, consent form and sample ID into a buffer database (the “consent” database) that is under the control of the research database operation. After these associations are confirmed the biorepository does not retain the medical record number in its own records, and it does not pass the medical record number to any other party.

  • The consent information is then imported from the consent database into the Permissions Complex (SubjectPermit table – see Appendix A) so that research database queries of biorepository data can be filtered according to donor permissions just like any other observations.

  • The research database imports information from Freezerworks (using a SOAP interface) about samples and aliquots, uses the consent database to identify the donor by medical record number, and makes the biosample information available for query as observation tables, with the ASID substituted for the medical record number.


Researchers can therefore construct queries against phenotypic information from medical record and study data, and, if their Access Ticket permits it, can include biosample data elements in their query, to find out if sample aliquots are available for patients with the queried phenotypes. Sample identifiers can be output so that the researcher can go to the Biorepository and ask to receive aliquots to perform their research. Thus the system acts as an honest broker of biosample information, obscuring identities while allowing phenotypically characterized samples. We think that the phenotypes available in the research database are much richer, more extensive and better quality-assured than would be practical to have in the biorepository (Freezerworks) database itself.

Page of

Похожие:

A dimensional Bus Model for Integrating Clinical and Research Data. Wade td, Hum rc and Murphy jr. National Jewish Health iconEvaluation of the Health and Research Outcomes of Technologies Licensed by the National Institutes of Health

A dimensional Bus Model for Integrating Clinical and Research Data. Wade td, Hum rc and Murphy jr. National Jewish Health iconAbstract: The Office of Research on Women's Health (orwh) at the National Institutes of Health (nth) was created in 1990 to carry out three major mandates: (1)

A dimensional Bus Model for Integrating Clinical and Research Data. Wade td, Hum rc and Murphy jr. National Jewish Health iconNational Academy of Clinical Biochemistry Laboratory Medicine Practice Guidelines for Use of Tumor Markers in Clinical Practice: Quality requirements

A dimensional Bus Model for Integrating Clinical and Research Data. Wade td, Hum rc and Murphy jr. National Jewish Health iconMurphy Notes collected by Mark A. Murphy, begun July 2000

A dimensional Bus Model for Integrating Clinical and Research Data. Wade td, Hum rc and Murphy jr. National Jewish Health iconAnalysis of Transit Signal Priority Using Archived TriMet Bus Dispatch System Data

A dimensional Bus Model for Integrating Clinical and Research Data. Wade td, Hum rc and Murphy jr. National Jewish Health iconStudie úČinnosti světelné terapie 1981 2008 pramen: PubMed – service of the U. S. National Library of Medicine and the National Institutes of Health

A dimensional Bus Model for Integrating Clinical and Research Data. Wade td, Hum rc and Murphy jr. National Jewish Health iconIntegrating Research and Resource Management

A dimensional Bus Model for Integrating Clinical and Research Data. Wade td, Hum rc and Murphy jr. National Jewish Health iconIntegrating Research and Resource Management

A dimensional Bus Model for Integrating Clinical and Research Data. Wade td, Hum rc and Murphy jr. National Jewish Health iconIntegrating Research and Resource Management

A dimensional Bus Model for Integrating Clinical and Research Data. Wade td, Hum rc and Murphy jr. National Jewish Health iconIntegrating Research and Resource Management

Разместите кнопку на своём сайте:
Библиотека


База данных защищена авторским правом ©lib.znate.ru 2014
обратиться к администрации
Библиотека
Главная страница