Necessary information for a smooth processor integration
To ensure the smooth integration of a processor in the P-Pro environment, the following guidelines shall be considered.
Pre-requisites
The execution platform available is based on Linux OS:
- CentOS 7
- CentOS 6
In principle any programming language (in the broadest sense) producing source code or compiled code compatible with the above OS is supported. Examples are:
- Python
- C/C++
- Frotran
- Matlab
- IDL
- R
- SNAP (gpt)
- GDAL
- qGIS
- … (non-exhaustive list)
General (good) rules
1. The parameters which could be configured and changed before the execution of the processor, must be written in one or more configuration files in the generic form:
PARAMETER=value
The processor shall read the configuration file to initialize the parameters.
The same approach must be followed for the absolute paths:
AUX_DATA_PATH=/SOME/PATH/HERE
put them in the configuration file.
2. As the processing will split the main job in several parallel jobs (for each step of the workflow: Step 1 –> Step 2 –> Step 3 …), keep always in mind that the step could be executed in parallel on a shared file system with other similar steps performing the same operations but maybe on a different set of input data. During the execution should be avoided the removal or change of some files in the processor package which are needed to other similar parallel steps in execution: read and write are concurrent operations in a parallel system like P-Pro.
3. The code (if source code is available) should be well commented and structured/organized with a main and several functions or routines performing specific tasks.
4. The software should produce a logfile with INFO, WARNING and ERROR messages and the execution should return success or error codes.
Processor design
According to the P-Pro processor definition, a processor workflow shall be organised in a set of steps. The processor developer shall be able to identify in his algorithm the main processing phases and partition the algorithm operations accordingly. A processing phase will correspond to a workflow step for which the necessary input to execute the operations and the output to be produced shall be defined. It is not mandatory to define multiple workflow steps. However, besides being often beneficial for debugging purposes given the more organic and logical separation of the operations, it also allows finer control of the processing execution on the P-Pro platform.
A simple processor workflow could be:
- Data preparation step
- Input: list of products to process and a set of parameters to refine the pre-processing operations according to the user’s needs
- Output: pre-processed products
- Data processing
- Input: list of pre-processed products
- Output: processed products
Each processor step will have a different objective but a common design pattern can be identified.
The typical sequence of actions is:
- sourcing/importing the required libraries
- reading the parameters and paths values from configuration file
- executing step operations
- writing processing output to step output folder
- return success/failure and possibly error codes.