Status: In Progress
Affects Version/s: None
Fix Version/s: None
Component/s: DataSet Definitions
Each report's data is made of a collection of DataSet. The DataSet superclass only exposes an iterator to go through each row of reported data: https://github.com/openmrs/openmrs-module-reporting/blob/c42a8ec16cf40dc699aa6b03dd4b079c2f1bb83c/api/src/main/java/org/openmrs/module/reporting/dataset/DataSet.java#L32
Until now virtually all reports out there rely on collecting reports data through one type of implementation for DataSet: SimpleDataSet.
SimpleDataSet is an all-in-memory dataset (see this map at its core holding all the data rows) that provides further ways beyond its superclass's iterator to access the rows of data. For instance it is possible to get all the rows at once: https://github.com/openmrs/openmrs-module-reporting/blob/c42a8ec16cf40dc699aa6b03dd4b079c2f1bb83c/api/src/main/java/org/openmrs/module/reporting/dataset/SimpleDataSet.java#L106-L113
It is easy enough to come up with report definitions that will overwhelm SimpleDataSet by bloating its internal map, thereby leading to memory related exceptions (such as Java heap space).
We would like to start by addressing the challenge of SQL-based reports, specifically because they are widely used and they can very easily be designed to generate a lot of data.
Suffice to look at the evaluators of the common SQL-based reports to encounter SimpleDataSet right away:
But anyway this would be the case for almost every evaluator out there, even those for non-SQL-based reports.
The code base already hints at an alternative with PageableDataSet that anticipate catering for large row-per-patient type of reports.
The idea would be use this as a source of inspiration for a new triplet of SQL 1) dataset definition, 2) dataset and 3) evaluator that would be put together in such a way that the resulting dataset exposes its rows of data like a SQL cursor would do.
The evaluator's evaluate(..) method would presumably setup the cursor and would return a dataset implementation whose iterator() method would be the interface to the SQL cursor behind the scenes.
To enable cursor-based streaming see the final part of the ResultSet paragraph on this page: https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
See also this discussion on Slack.