Uploaded image for project: 'Reporting Module'
  1. Reporting Module
  2. REPORT-873

Expand options of pageable datasets

    XMLWordPrintable

Details

    • Enhancement
    • Status: Closed
    • Must
    • Resolution: Fixed
    • None
    • 1.25.0
    • DataSet Definitions
    • High

    Description

      1. Background

      Each report's data is made of a collection of DataSet. The DataSet superclass only exposes an iterator to go through each row of reported data: https://github.com/openmrs/openmrs-module-reporting/blob/c42a8ec16cf40dc699aa6b03dd4b079c2f1bb83c/api/src/main/java/org/openmrs/module/reporting/dataset/DataSet.java#L32
      Until now virtually all reports out there rely on collecting reports data through one type of implementation for DataSet: SimpleDataSet.

      SimpleDataSet is an all-in-memory dataset (see this map at its core holding all the data rows) that provides further ways beyond its superclass's iterator to access the rows of data. For instance it is possible to get all the rows at once: https://github.com/openmrs/openmrs-module-reporting/blob/c42a8ec16cf40dc699aa6b03dd4b079c2f1bb83c/api/src/main/java/org/openmrs/module/reporting/dataset/SimpleDataSet.java#L106-L113

      It is easy enough to come up with report definitions that will overwhelm SimpleDataSet by bloating its internal map, thereby leading to memory related exceptions (such as Java heap space).

      2. Scope

      We would like to start by addressing the challenge of SQL-based reports, specifically because they are widely used and they can very easily be designed to generate a lot of data.
      Suffice to look at the evaluators of the common SQL-based reports to encounter SimpleDataSet right away:

      • See here for SqlDataSetEvaluator.
      • See here for SqlFileDataSetEvaluator.
      • ...

      But anyway this would be the case for almost every evaluator out there, even those for non-SQL-based reports.

      3. Suggested Solution

      The code base already hints at an alternative with PageableDataSet that anticipate catering for large row-per-patient type of reports.
      The idea would be use this as a source of inspiration for a new triplet of SQL 1) dataset definition, 2) dataset and 3) evaluator that would be put together in such a way that the resulting dataset exposes its rows of data like a SQL cursor would do.

      The evaluator's evaluate(..) method would presumably setup the cursor and would return a dataset implementation whose iterator() method would be the interface to the SQL cursor behind the scenes.

      4. Dev Notes

      To enable cursor-based streaming see the final part of the ResultSet paragraph on this page: https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html

      See also this discussion on Slack.

      Gliffy Diagrams

        Attachments

          Activity

            People

              luis.oliveira luis Oliveira
              mksd Dimitri R
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: