# Specifying Data

Except for “single” problems, each problem usually represents a large (often, infinite) family of cases, called instances, that one may want to solve. All these instances are uniquely identified by some specific data. First, recall that the command to be run for generating an XCSP$^3$ instance (file), given a model and some data is:

python <model_file> -data=<data_values>


where:

• <model_file> (is a Python file that) represents a PyCSP$^3$ model
• <data_values> represents some specific data.

In our context, an elementary value is a value of one of these built-in data types: integer (‘int’), real (‘float’), string (‘str’) and boolean (‘bool’). Specific data can be given as:

• a single elementary value, as in
-data=5

• a list of elementary values, between square (or round) brackets (according to the operating system, one might need to escape brackets) and with comma used as a separator, as in
-data=[9,0,0,3,9]

• a list of named elementary values, between square (or round) brackets and with comma used as a separator, as in
-data=[v=9,b=0,r=0,k=3,l=9]

• a JSON file (possibly given by an URL), as in
-data=Bibd-9-3-9.json

• a text file (i.e., a non-JSON file in any arbitrary format) while providing with the option -dataparser some Python code to load it, as in
-data=puzzle.txt -dataparser=ParserPuzzle.py


Then, data can be directly used in PyCSP$^3$ models by means of a predefined variable called data.The value of the predefined PyCSP$^3$ variable data is set as follows:

• if the option -data is not specified, or if it is specified as -data=null or -data=None, then the value of data is None. See, for example, the Sudoku problem.
• if a single elementary value is given (possibly, between brackets), then the value of data is directly this value. See, for example, the Golomb Ruler problem.
• if a JSON file containing a root object with only one field is given, then the value of data is directly this value.
• if a list of (at least two) elementary values is given, then the value of data is a tuple containing those values in sequence. See, for example, the Board Coloration problem.
• if a list of (at least two) named elementary values is given, then the value of data is a named tuple.
• if a JSON file containing a root object with at least two fields is given, then the value of $\mathtt{data}$ is a named tuple. Actually, any encountered JSON object in the file is (recursively) converted into a named tuple. See, for example, the Warehouse and Rack Configuration problems.

For example, for the AllInterval and Bibd problems, we can write:

python AllInterval.py -data=12

python Bibd.py -data=[9,0,0,3,9]


## Storing Data in JSON

Suppose that you would prefer to have a JSON file for storing these data values. You can execute:

python Bibd.py -data=[9,0,0,3,9] -datexport


You then obtain the following JSON file ‘Bibd-9-0-0-3-9.json’

{
"v":9,
"b":0,
"r":0,
"k":3,
"l":9
}


And now, to generate the same XCSP$^3$ instance (file) as above, you can execute:

python Bibd.py -data=Bibd-9-0-0-3-9.json


## Escape Characters

With some command interpreters (shells), you may have to escape the characters ‘[’ and ‘]’, which gives:

python Bibd.py -data=$9,0,0,3,9$


You can also use round brackets instead of square brackets:

python Bibd.py -data=(9,0,0,3,9)


If it causes some problem with the command interpreter (shell), you have to escape the characters ‘(‘ and ‘)’, which gives:

python Bibd.py -data=$$9,0,0,3,9$$


Remark. At the Windows command line, different escape characters may be needed (for example, depending whether you use Windows Powershell or not). However, note that you can always run a command from a batch script file (or use a JSON file).

## Filenames with Formatted Data

As shown above, when data are given under the form of elementary values on the command line, they are integrated in the filename of the generated instance. However, sometimes, it may be interesting to format a little bit such filenames. This is possible by using the format -dataformat. The principle is that the string passed to this option will serve to apply formatting to the values in -data. For example,

python Bibd.py -data=[9,0,0,3,9] -dataformat={:02d}-{:01d}-{:01d}-{:02d}-{:02d}


will generate an XCSP$^3$ file with filename ‘Bibd-09-0-0-03-09.xml’

If the same pattern must be applied to all pieces of data, we can write:

python Bibd.py -data=[9,0,0,3,9] -dataformat={:02d}


so as to obtain an XCSP$^3$ file with filename ‘Bibd-09-00-00-03-09.xml’

## About Using Tuple Unpacking on Data

For the BACP problem, an example of data is given by the following JSON file, called ‘Bacp_example.json’:

{
"nPeriods": 4,
"minCredits": 2,
"maxCredits": 5,
"minCourses": 2,
"maxCourses": 3,
"credits": [2,3,1,3,2,3,3,2,1],
"prequisites": [[2,0],[4,1],[5,2],[6,4]]
}


In the BACP model, in a file called ‘Bacp.py’, it is then possible to use tuple unpacking, and to get all important data in only one statement:

nPeriods, minCredits, maxCredits, minCourses, maxCourses, credits, prereq = data


The command to execute for compiling is then:

python Bacp.py -data=Bacp_example.json


Because tuple unpacking is used, it is important to note that the fields of the root object in the JSON file must be given in this exact order. If it is not the case, as for example:

{
"nPeriods": 4,
"prequisites": [[2,0],[4,1],[5,2],[6,4]],
"minCredits": 2,
"maxCredits": 5,
"credits": [2,3,1,3,2,3,3,2,1],
"minCourses": 2,
"maxCourses": 3
}


there will be a problem when unpacking data. If you wish a safer model (because, for example, you have no guarantee about the way the data are generated), you must specifically refer to the fields of the named tuple instead:

nPeriods = data.nPeriods
minCredits, maxCredits = data.minCredits, data.maxCredits
minCourses, maxCourses = data.minCourses, data.maxCourses
credits, prereq = data.credits, data.prerequisites
nCourses = len(credits)


## About using a Data Parser

Now, let us suppose that you would like to use the data from this MiniZinc file ‘bacp-data.mzn’:

include "curriculum.mzn.model";
n_courses = 9;
n_periods = 4;
courses_per_period_lb = 2;
courses_per_period_ub = 3;
course_load = [2, 3, 1, 3, 2, 3, 3, 2,1, ];
constraint prerequisite(2, 0);
constraint prerequisite(4, 1);
constraint prerequisite(5, 2);
constraint prerequisite(6, 4);


We need to write a piece of code in Python for building the variable data that will used in our model. After importing everything from pycsp3.problems.data.parsing, we can use some PyCSP$^3$ functions such as next_line(), number_in(), remaining_lines}(),… Here, we also use the classical function split() of module re to parse information concerning prerequisites. Note that you have to add relevant fields to the predefined dictionary data (because at this stage, data is a dictionary even if later, it will be automatically converted to a named tuple), as in the following file ‘Bacp_ParserZ.py’:

from pycsp3.problems.data.parsing import *

nCourses = number_in(next_line())
data["nPeriods"] = number_in(next_line())
data["minCredits"] = number_in(next_line())
data["maxCredits"] = number_in(next_line())
data["minCourses"] = number_in(next_line())
data["maxCourses"] = number_in(next_line())
data["credits"] = numbers_in(next_line())
data["prerequisites"] = [[int(v) - 1
for v in re.split(r'constraint prerequisite$$|,|$$;', line) if len(v) > 0]
for line in remaining_lines(skip_curr=True)]


To generate the XCSP$^3$ instance (file), you have to execute:

python Bacp.py -data=bacp.mzn -dataparser=Bacp_ParserZ.py


If you want the same data put in a JSON file, execute:

python Bacp.py -data=bacp-data.mzn -dataparser=Bacp_ParserZ.py -dataexport


You obtain a file called ‘bacp-data.json’ equivalent to the one introduced earlier. If you want to specify the name of the output JSON file, give it as a value to the option -dataexport, as e.g., in:

python Bacp.py -data=bacp-data.mzn -dataparser=Bacp_ParserZ.py -dataexport=instance0


The generated JSON file is then called ‘instance0.json’.

The rules that are used when loading a JSON file in order to set the value of the PyCSP$^3$ predefined variable data are as follows.

• For any field $f$ of the root object in the JSON file, we obtain a field f in the generated named tuple data such that:
• if f is a JSON list (or recursively, a list of lists) containing only integers, the type of data.f is ‘pycsp3.tools.curser.ListInt’ instead of ‘list’; ‘ListInt’ being a subclass of ‘list’. The main interest is that data.f can be directly used as a vector for the global constraint Element. See Mario Problem for an illustration.
• if f is an object, data.f is a named tuple with the same fields as f. See Rack Configuration Problem for an illustration.
• The rules above apply recursively.

## Special Rule when Building Arrays of Variables

When we define a list (array) $x$ of variables with VarArray}(), the type of $x$ is ‘pycsp3.tools.curser.ListVar’ instead of ‘list’. The main interest is that $x$ can be directly used as a vector for the global constraint Element.

## Special Values null and None

When the value null occurs in a JSON file, it becomes None in PyCSP$^3$ after loading the data file.

It is possible to load data fom several JSON files. It suffices to indicate a list of JSON filenames between brackets. For example, let ‘file1.json’ be:

{
"a": 4,
"b": 12
}


let ‘file2.json’ be:

{
"c": 10,
"d": 1
}


and let ‘Test.py’ be:

from pycsp3 import *

a, b, c, d = data

print(a, b, c, d)

...


then, by executing:

python Test.py -data=[file1.json,file2.json]


we obtain the expected values in the four Python variables, because the order of fields is guaranteed (as if the two JSON files haved been concatenated); behind the scene, and OrderedDict is used, and the method update() is called.

## Combining JSON Files and Named Elementary Values

It may be useful to load data from JSON files, while updating some (named) elementary values. It means that we can indicate between brackets JSON filenames as well as named elementary values. The rule is simple: any field of the variable data is given as value the last statement concerning it when loading. For example, the command:

python Test.py -data=[file1.json,file2.json,c=5]


defines the variable data from the two JSON files, except that the variable c is set to 5. However, the command:

python Test.py -data=[c=5,file1.json,file2.json]


is not appropriate because the value of c will be overriden when considering ‘file2.json’.

Just remember that named elementary values must be given after JSON files.

It is also possible to load data fom several text (non-JSON) files. It suffices to indicate a list of filenames between brackets, which then will be concatenated just before soliciting an appropriate parser. For example, let ‘file1.txt’ be:

5
2 4 12 3 8


let ‘file2.txt’ be:

3 3
0 1 1
1 0 1
0 0 1


then, at time the file ‘Test2_Parser.py’ is executed after typing:

python Test2.py -data=[file1.txt,file2.txt] -dataparser=Test2_Parser.py


we can read a sequence of text lines as if a single file was initially given with content:

5
2 4 12 3 8
3 3
0 1 1
1 0 1
0 0 1


It is even possible to add arbitrary lines to the intermediate concatenated file. For example,

python Test2.py -data=[file1.txt,file2.txt,10] -dataparser=Test2_Parser.py


adds a last line containing the value 10. Because whitespace are not tolerated, one may need to surround additional lines with quotes (or double quotes). For example, at time ‘Test2_Parser.py’ is executed after typing:

python Test2.py -data=[file1.txt,file2.txt,10,"3 5",partial] -dataparser=Test2_Parser.py


the sequence of text lines is as follows:

5
2 4 12 3 8
3 3
0 1 1
1 0 1
0 0 1
10
3 5
partial


## Default Data

Except for single problems, data must be specified by the user in order to generate specific problem instances. If data are not specified, an error is raised. However, when writting the model, it is always possible to indicate some default data, notably by using the bahaviour of the Python operator or. For setting a JSON file as being the default data file, we must call the function default_data(). Handling default data is illustrated with BIBD and BACP problems.

For BIBD, If we replace:

v, b, r, k, l = data


by

v, b, r, k, l = data or (9,0,0,3,9)


then, we can generate the default instance with:

python Bibd.py


For BACP, if we replace:

nPeriods, minCredits, maxCredits, minCourses, maxCourses, credits, prereq = data


by

nPeriods, minCredits, maxCredits, minCourses, maxCourses, credits, prereq = data or default_data(Bacp_example.json)


then, we can generate the default instance with:

python Bacp.py


If for some reasons, it is convenient to load some data independently of the option -data, on can call the function load_json_data(). This function accepts a parameter that is the filename of a JSON file (possibly given by an URL), and returns a named tuple containing loaded data.