Uploaded image for project: 'OpenMRS Core'
  1. OpenMRS Core
  2. TRUNK-5380

Cohort Membership: Performance Issues



    • Bug
    • Status: Closed
    • TBD
    • Resolution: Won't Fix
    • None
    • None
    • None
    • Undetermined


      In 2.1, a Cohort shifted from being a TreeSet of Integers to a TreeSet of a new CohortMembership object.

      This adds performance overhead to basic Cohort operations.  For instance, adding set off 100,000 patient ids to a Cohort jumps from taking milliseconds to taking a second or two. This may seem insignificant, but as Cohorts are used extensively for reporting and Cohorts are may be manipulates dozens of times in a single report, this can add up.

      We should find a way to maintain the performance of a Cohort for the majority use case that does not require start date and end date.

      I tested switching from storing as a TreeSet to a HashSet... this had a marginal improvement on peformance, certainly not a game-changer. 

      Some potential solutions include:

      • Reverting Cohort back to it's original design and creating a new CohortWithDateRange object... this would introduce some backawrds incompatibility
      • Changing the implementation of Cohort so that it supported both an underlying data storage of Set<Integer> or Set<CohortMembership> and only started to use CohortMembership if the consumer specified a start and/or end date when adding a Patient.  This might work, but seems hack and error-prone... we'd need to tightly restrict the consumers from accessing the underlying model. 

      Gliffy Diagrams


          Issue Links



                grace Grace Potma
                mogoodrich Mark Goodrich
                0 Vote for this issue
                3 Start watching this issue