Eureka Protempa ETL

Eureka Protempa ETL is the backend data processing layer of Eureka! Clinical Analytics. It performs clinical phenotyping using the Protempa software framework. It currently loads data and found phenotypes into an i2b2 project, though ultimately it will support other export formats.

License

License

Categories

Categories

CLI User Interface Eureka Container Microservices
GroupId

GroupId

org.eurekaclinical
ArtifactId

ArtifactId

eureka-protempa-etl
Last Version

Last Version

3.0
Release Date

Release Date

Type

Type

war
Description

Description

Eureka Protempa ETL
Eureka Protempa ETL is the backend data processing layer of Eureka! Clinical Analytics. It performs clinical phenotyping using the Protempa software framework. It currently loads data and found phenotypes into an i2b2 project, though ultimately it will support other export formats.
Project Organization

Project Organization

Emory University

Download eureka-protempa-etl

How to add to project

<!-- https://jarcasting.com/artifacts/org.eurekaclinical/eureka-protempa-etl/ -->
<dependency>
    <groupId>org.eurekaclinical</groupId>
    <artifactId>eureka-protempa-etl</artifactId>
    <version>3.0</version>
    <type>war</type>
</dependency>
// https://jarcasting.com/artifacts/org.eurekaclinical/eureka-protempa-etl/
implementation 'org.eurekaclinical:eureka-protempa-etl:3.0'
// https://jarcasting.com/artifacts/org.eurekaclinical/eureka-protempa-etl/
implementation ("org.eurekaclinical:eureka-protempa-etl:3.0")
'org.eurekaclinical:eureka-protempa-etl:war:3.0'
<dependency org="org.eurekaclinical" name="eureka-protempa-etl" rev="3.0">
  <artifact name="eureka-protempa-etl" type="war" />
</dependency>
@Grapes(
@Grab(group='org.eurekaclinical', module='eureka-protempa-etl', version='3.0')
)
libraryDependencies += "org.eurekaclinical" % "eureka-protempa-etl" % "3.0"
[org.eurekaclinical/eureka-protempa-etl "3.0"]

Dependencies

compile (12)

Group / Artifact Type Version
org.eurekaclinical : eureka-common jar 3.0
org.eurekaclinical : aiw-i2b2-etl jar 3.0
org.eurekaclinical : aiw-neo4j-etl jar 3.0
org.eurekaclinical : protempa-bconfigs-ini4j-ini jar 4.0
org.eurekaclinical : protempa-tsb-umls jar 4.0
org.eurekaclinical : protempa-dsb-relationaldb jar 4.0
org.eurekaclinical : protempa-dsb-file jar 4.0
org.apache.poi : poi-ooxml jar 3.9
org.eurekaclinical : javautil jar 4.0
org.apache.commons : commons-csv jar 1.2
commons-io : commons-io jar 2.4
org.eurekaclinical : eurekaclinical-patient-set-client jar 1.0

provided (2)

Group / Artifact Type Version
org.jasig.cas.client : cas-client-core jar 3.2.1
org.eurekaclinical : eurekaclinical-ontology jar 2.0

test (3)

Group / Artifact Type Version
com.sun.jersey.jersey-test-framework : jersey-test-framework-grizzly2 jar 1.19.4
org.eurekaclinical : eureka-common test-jar 3.0
com.h2database : h2 jar 1.4.193

Project Modules

There are no modules declared in this project.

Eureka! Clinical Analytics

Atlanta Clinical and Translational Science Institute (ACTSI), Emory University, Atlanta, GA

What does it do?

It provides tools for electronic health record (EHR) phenotyping, that is, finding patients of interest that match specified patterns in clinical and administrative EHR data. Eureka stores these patterns in computable form, and it computes them rapidly in clinical datasets and databases, including i2b2 clinical data warehouses. It supports building a repository of phenotypes representing best practices in how to find patient populations of interest. See http://www.eurekaclinical.org/docs/analytics/ for more information.

Version history

Latest release: Latest release

Version 3.0

Version 3 broke components of Eureka up into microservices. We rewrote the phenotype editing, cohort editing, and job submission screens. In addition, concept browsing is much faster and works better on tablet form factors.

Version 2.5.2

As compared with version 1 of Eureka, version 2 primarily differs in much more efficient backend code for processing data from relational databases. The performance of spreadsheet data processing is also much better.

Version 1.9

Version 1.9 includes an update to the UI, now using Bootstrap 3, to make the application more usable on mobile devices. It also includes support for i2b2 1.7. A new feature to allow usage of BioPortal for ontologies was added. The codebase was also updated to utilize Java 7. A data element search functionality was added to the phenotype editor and job submission screens. Eureka! can now utilize OAuth as an authentication mechanism, allowing the use of services like Facebook, Twitter, Google Plus, and other to log into the system. Finally, many improvements were made to ease the process of installing and configuring the software.

Build requirements

We build Eureka regularly on Mac and Linux. It may also build on Windows.

Runtime requirements

Proxied REST APIs

You can call all of eureka's REST APIs through a proxy provided by eureka-webapp. The proxy will forward selected calls to eureka-protempa-etl and eurekaclinical-user-service. All other valid URLs will be forwarded to eureka-services. Replace /protected/api with /proxy-resource in your URLs. See the READMEs for each of these service projects for REST endpoint documentation.

Proxy calls that are forwarded to eureka-protempa-etl

  • /proxy-resource/file
  • /proxy-resource/output

Proxy calls that are forwarded to eurekaclinical-user-service

  • /proxy-resource/users
  • /proxy-resource/roles

Proxy calls that are forwarded to eureka-services

Everything else

Building it

The project uses the maven build tool.

The system tests that are run automatically during build require more RAM than Java's default. Add the MAVEN_OPTS environment variable to your user account's profile, and set the max Java heap size to 4GB. On Linux and Mac, this is specified in your ~/.bash_profile as follows:

export MAVEN_OPTS='-Xmx4g`

Typically, you build it by invoking mvn clean install at the command line. For simple file changes, not additions or deletions, you can usually use mvn install. See https://github.com/eurekaclinical/dev-wiki/wiki/Building-Eureka!-Clinical-projects for more details.

You can build any of the modules separately by appending -pl <module-name> to your maven command, where <module-name> is the artifact id of the module.

Performing system tests

You can run this project in an embedded tomcat by executing mvn tomcat7:run after you have built it. You also must be running the eurekaclinical-analytics-webclient project. The eureka-webapp backend calls will then be accessible in your web browser at https://localhost:8000/eureka-webapp/. Your username will be superuser.

Installation

NOTE: we have Ansible provisioning scripts that automate the installation process. Contact use for details. The following provides detail on the steps that those scripts perform. We have omitted general steps such as installation of Tomcat, SSL certificates, and the like.

Database schema creation

The eureka-services and eureka-protempa-etl modules each have a database schema. Each has a Liquibase changelog at src/main/resources/dbmigration/changelog-master.xml for creating the schema's objects. Liquibase 3.3 or greater is required.

eureka-services

Perform the following steps:

  1. Create a schema for i2b2-export-service in your database.
  2. Get a JDBC driver for your database and put it the liquibase lib directory.
  3. Run the following:
./liquibase \
      --driver=JDBC_DRIVER_CLASS_NAME \
      --classpath=/path/to/jdbcdriver.jar:/path/to/eureka-services.war \
      --changeLogFile=dbmigration/changelog-master.xml \
      --url="JDBC_CONNECTION_URL" \
      --username=DB_USER \
      --password=DB_PASS \
      update
  1. Add the following Resource tag to Tomcat's context.xml file:
<Context>
...
    <Resource name="jdbc/EurekaService" auth="Container"
            type="javax.sql.DataSource"
            driverClassName="JDBC_DRIVER_CLASS_NAME"
            factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
            url="JDBC_CONNECTION_URL"
            username="DB_USER" password="DB_PASS"
            initialSize="3" maxActive="20" maxIdle="3" minIdle="1"
            maxWait="-1" validationQuery="SELECT 1" testOnBorrow="true"/>
...
</Context>

The validation query above is suitable for PostgreSQL. For Oracle and H2, use SELECT 1 FROM DUAL.

eureka-protempa-etl

Perform the following steps:

  1. Create a schema for i2b2-export-service in your database.
  2. Get a JDBC driver for your database and put it the liquibase lib directory.
  3. Run the following:
./liquibase \
      --driver=JDBC_DRIVER_CLASS_NAME \
      --classpath=/path/to/jdbcdriver.jar:/path/to/eureka-protempa-etl.war \
      --changeLogFile=dbmigration/changelog-master.xml \
      --url="JDBC_CONNECTION_URL" \
      --username=DB_USER \
      --password=DB_PASS \
      update
  1. Add the following Resource tag to Tomcat's context.xml file:
<Context>
...
    <Resource name="jdbc/EurekaBackend" auth="Container"
            type="javax.sql.DataSource"
            driverClassName="JDBC_DRIVER_CLASS_NAME"
            factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
            url="JDBC_CONNECTION_URL"
            username="DB_USER" password="DB_PASS"
            initialSize="3" maxActive="20" maxIdle="3" minIdle="1"
            maxWait="-1" validationQuery="SELECT 1" testOnBorrow="true"/>
...
</Context>

The validation query above is suitable for PostgreSQL. For Oracle and H2, use SELECT 1 FROM DUAL.

Configuration

Eureka is configured using a properties file located at /etc/eureka/application.properties. It supports the following properties:

  • eurekaclinical.userwebapp.url: https://hostname.of.eurekaclinicaluserwebapp:port/eurekaclinical-user-webapp
  • eurekaclinical.userservice.url: https://hostname.of.eurekaclinicaluserservice:port/eurekaclinical-user-service
  • cas.url: https://hostname.of.casserver:port/cas-server
  • eureka.common.callbackserver: https://hostname:port
  • eureka.common.demomode: true or false depending on whether to act like a demonstration; default is false.
  • eureka.common.ephiprohibited: true or false depending on whether to display that managing ePHI is prohibited; default is true.
  • eureka.webapp.registrationenabled: true or false to enable/disable registering for an account managed by this project; default is true.
  • eureka.support.uri: URI link for contacting support. Could be http, https, or mailto.
  • eureka.support.uri.name: Display name of the URI link for contacting support.
  • eureka.webapp.callbackserver: URL of the server running the webapp; default is https://localhost:8443.
  • eureka.webapp.url: the URL of the webapp; default is https://localhost:8443/eureka-webapp.
  • eureka.webapp.ephiprohibited: true or false depending on whether to display that managing ePHI is prohibited; default is true.
  • eureka.webapp.demomode: true or false depending on whether to act like a demonstration; default is false.
  • eureka.etl.url: URL of the server running the backend; default is https://localhost:8443/eureka-protempa-etl.
  • eureka.etl.threadpool.size: the number of threads in the ETL threadpool; default is 4.
  • eureka.etl.callbackserver: URL of the server running the backend; default is https://localhost:8443.
  • eureka.services.url: URL of the server running the services layer; default is https://localhost:8443/eureka-services.
  • eureka.services.callbackserver: URL of the server running the services layer; default is https://localhost:8443.
  • eureka.services.jobpool.size: the number of threads in the ETL threadpool; default is 5.
  • eureka.services.registration.timeout: timeout in hours for registration request verification; default is 72.
  • eureka.jstree.searchlimit: max number of results returned from a concept search; default is 200.
  • eureka.services.defaultprops: concept subtrees to show in the concept tree: default is Patient PatientDetails Encounter ICD9:Diagnoses ICD9:Procedures ICD10:Diagnoses ICD10:Procedures LAB:LabTest MED:medications VitalSign

A Tomcat restart is required to detect any changes to the configuration file.

Tomcat configuration

In the $CATALINA_HOME/bin/setenv.sh file, add the following:

CATALINA_OPTS="${CATALINA_OPTS} -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true 
-Doracle.jdbc.ReadTimeout=43200000 -Djava.security.egd=file:///dev/urandom"
JAVA_OPTS="${JAVA_OPTS} -Xms512m -Xmx6G"

The oracle.jdbc.ReadTimeout system property is only needed if using an Oracle database with Eureka. The max heap size may need to be increased more depending on the data volume being processed.

WAR installation

  1. Stop Tomcat.
  2. Remove any old copies of Eureka's three unpacked wars from Tomcat's webapps directory.
  3. Copy new warfiles into the tomcat webapps directory.
  4. Start Tomcat.

Developer documentation

Note on licensing

Out of the box, Eureka! Clinical Analytics is available under the Apache License. If you use the Neo4j plugin provided by the aiw-neo4j-etl project, due to the licensing of Neo4j, you cannot use the Apache license anymore. For that reason, Eureka! Clinical Analytics is optionally available under the GPL version 3.

Getting help

Feel free to contact us at [email protected].

org.eurekaclinical

Eureka! Clinical

Microservices for clinical and translational research

Versions

Version
3.0
3.0-Beta-2
3.0-Beta-1
3.0-Alpha-43
3.0-Alpha-42
3.0-Alpha-41
3.0-Alpha-40
3.0-Alpha-39
3.0-Alpha-38
3.0-Alpha-37
3.0-Alpha-36
3.0-Alpha-35
3.0-Alpha-34
3.0-Alpha-33
3.0-Alpha-32
3.0-Alpha-31
3.0-Alpha-30
3.0-Alpha-29
3.0-Alpha-28
3.0-Alpha-27
3.0-Alpha-26
3.0-Alpha-25
3.0-Alpha-24
3.0-Alpha-23
3.0-Alpha-22
3.0-Alpha-21
3.0-Alpha-20
3.0-Alpha-19
3.0-Alpha-18
3.0-Alpha-17
3.0-Alpha-16
3.0-Alpha-15
3.0-Alpha-14
3.0-Alpha-13
3.0-Alpha-12
3.0-Alpha-11
3.0-Alpha-10
3.0-Alpha-9
3.0-Alpha-8
3.0-Alpha-7
3.0-Alpha-6
3.0-Alpha-5
3.0-Alpha-4
3.0-Alpha-3
3.0-Alpha-2
3.0-Alpha-1
2.5.2
2.5.1
2.5
2.4
2.3
2.2
2.1.2
2.1.1
2.1
2.0
2.0-Alpha-44
2.0-Alpha-43
2.0-Alpha-42
2.0-Alpha-41
2.0-Alpha-40
2.0-Alpha-39
2.0-Alpha-38
2.0-Alpha-37
2.0-Alpha-36
2.0-Alpha-35
2.0-Alpha-34
2.0-Alpha-33
2.0-Alpha-32
2.0-Alpha-31
2.0-Alpha-30
2.0-Alpha-29
2.0-Alpha-28
2.0-Alpha-27
2.0-Alpha-26
2.0-Alpha-25
2.0-Alpha-24
2.0-Alpha-23
2.0-Alpha-22
2.0-Alpha-21
2.0-Alpha-20
2.0-Alpha-19
2.0-Alpha-18
2.0-Alpha-17
2.0-Alpha-16
2.0-Alpha-15
2.0-Alpha-14
2.0-Alpha-13
2.0-Alpha-12
2.0-Alpha-11
2.0-Alpha-10
2.0-Alpha-9
2.0-Alpha-8
2.0-Alpha-7
2.0-Alpha-6
2.0-Alpha-5
2.0-Alpha-4