Optimization with datasets
This article describes at a (very) technical level how the servers handles datasets. Datasets can hold a lot of data, making working with them sometimes a bit slow. By understanding what datasets are and how the server handles them, the author can make a model much less memory and CPU heavy.
A dataset is just a variable, only with a specific xml structure. A dataset has the same structure as a graph, with iterations, nodes and data. A dataset is defined by a graph. Example: there is a graph called
gstructure, and it has a node
data, and that node has two variables,
lastname. You can create a variable
mydataset of the type
gstructure. You can add data to
mydataset by declaring
mydataset.data.firstname := 'joe'.
The xml in de variable
mydataset will then be like this:
<dataset name=""> <graph iteration="0" name="gstructure" tuplestatus="new" visibility="visible"> <node name="data"> <data name="firstname">joe</data> <data name="lastname"></data> </node> </graph> </dataset>
graph has 4 parameters. The
name of the graph the structure is based on, and
Tuplestatus is used to see if the iterations are added to the dataset. This information can be used to update a database. The
visibility can be visible or invisible after a select. This means that if you have a big dataset and use a select, the data itself is still there. As an author you do not have to think about that, because the invisible iterations are hidden.
So a dataset is just text as XML. To work with a dataset, for instance say
myname := mydataset.data.firstname, the server has to understand that XML structure. Internally, the server translates the xml to a structure. In a variable (e.g. myname) there is the XML and the structure. The translation of XML to the structure and structure to XML is CPU intens, and that is why we should avoid it, if possible. The variable will know if the XML or the structure should be used if the data is accessed.
If a dataset is created, e.g.
mydataset := graphtodataset(gstructure), the
graphtodataset will generate the XML, and the
:= directive will take the XML and put it in
mydataset. Now, the
mydataset variable has just the text XML, no structure yet, because there is no need. If you would say
mysecondset := mydataset, the text XML of
mydataset is taken, and put in
mysecondset. Both variables will not have a structure.
But now we want to know how many iterations there are in
mydataset. So we say
iterations := count(^mydataset). Now, the server needs to understand the XML, so the XML is converted to the structure. The structure is now equal to the XML. The count looks at the structure and can easily return the number. If you would say
mysecondset := mydataset the XML is just returned, because structure and XML are still the same.
After the count, i want to add a lastname to joe. We use
mydataset.data.lastname := 'doe'. The
:= sets ‘doe’ to the variable. The variable already has the structure, and sets the value to
lastname. The structure is now preferable to the XML. But now, if i say
mysecondset := mydataset, the structure will be converted to XML and then returned to the
:=. So the assignment to mysecondset before the count was much faster then it is now. After the conversion XML and structure are equal again.
optimizing the use of datasets
Take a look at the following code:
1 i := 0; 2 repeat 3 chk_myset := myset; 4 chk_myset := select (chk_myset, (chk_myset.content.language = myset[i].content.language); 5 if count(^chk_myset) > 1 then 6 foutcode := 'double language'; 7 i := i + 1; 8 until (foutcode <> '') OR (i = count(myset));
The dataset will be checked for double values in
chk_myset.content.language. On line 3 the chk_myset gets an XML value. The structure will be empty. The select on line 4 will take the XML of chk_myset, create the structure, does the select, and return the XML, by converting the structure. On line 5 the XML is converted to a structure, to return the number. On line 8, the count(myset) will create the structure, and return a number. Now it goes back to line 3, setting the XML again, so igoring the structure, just to create it again in the select (and back to XML). So in this code, per iteration, the xml to structure and structure to xml conversion is run 3 times.
The new function added is
selectonset. We now take the dataset (line 1) and make sure we only work with the structure. The function
selectonset will use the structure, and doesn’t need to convert to XML. We do have to ‘fix’ the reset, in the first example line 3. After a select only the iterations that meet the condition are set to ‘visible’. To make sure all rows are set back to visible so a new select can take place, we need to reset the structure by using
resetdatasetonset. Which is the same function as
resetdataset, only again, works directly on the structure. See line 4.
1 chk_myset := myset; 2 i := 0; 3 repeat 4 resetdatasetonset(^chk_myset); 5 selectonset (chk_myset, (chk_myset.content.language = myset[i].content.language); 6 if count(^chk_myset) > 1 then 7 foutcode := 'double language'; 8 i := i + 1; 9 until (foutcode <> '') OR (i = count(myset));
As seen above, selecting on datasets keep de datasets big. Filtered iterations still exist. You can use
purgedataset to remove those invisible iterations, making the dataset a lot smaller. You can also use this directly on a select.
selectpurge will return the XML without the invisible iterations (resetdataset will then be useless).
Next to selectpurge there is also
selectpurgeonset. This will do the select on the structure and delete the invisible rows.