Statistics¶
Note
In this chapter we use ref and sec abbreviations when refering to the reference input DEM (input_ref
) and the secondary input DEM (ìnput_sec
) respectively.
Demcompare can compute a wide variety of statistics on either an input DEM, or the difference between two input DEMs. The statistics module can consider different number of inputs:
"output_dir": "./test_output/",
"input_ref": {
"path": "./Gironde.tif",
"nodata": -9999.0,
},
"statistics": {
"remove_outliers": "True",
}
If one single DEM is specified in the configuration, the input or default metrics will be directly computed on the input DEM.
By default, the following metrics will be computed: mean
, median
, max
, min
, sum
, squared_sum
, std
.
The user may specify the required metrics as follows:
"output_dir": "./test_output/",
"input_ref": {
"path": "./Gironde.tif",
"nodata": -9999.0,
},
"statistics": {
"remove_outliers": "True",
"metrics": ["mean", {"ratio_above_threshold": {"elevation_threshold": [1, 2, 3]}}]
}
"output_dir": "./test_output/",
"input_ref": {
"path": "./Gironde.tif",
"nodata": -9999.0,
},
"input_sec": {
"path": "./FinalWaveBathymetry_T30TXR_20200622T105631_D_MSL_invert.TIF",
"nodata": -32768,
}
"statistics": {
"remove_outliers": "True",
}
If two DEMs are specified in the configuration, demcompare will do the reprojection of both DEMs to have the same resolution and size, and the difference between both reprojected DEMs will be considered to compute the input or default metrics.
By default, the following metrics will be computed: mean
, median
, max
, min
, sum
, squared_sum
, std
,
percentil_90
, nmad
, rmse
.
The user may specify the required metrics as follows:
"output_dir": "./test_output/",
"input_ref": {
"path": "./Gironde.tif",
"nodata": -9999.0,
},
"input_sec": {
"path": "./FinalWaveBathymetry_T30TXR_20200622T105631_D_MSL_invert.TIF",
"nodata": -32768,
}
"statistics": {
"remove_outliers": "True",
"metrics": ["mean", {"ratio_above_threshold": {"elevation_threshold": [1, 2, 3]}}]
}
With the coregistration step¶
If both coregistration and statistics steps are present on the input configuration:
In order to evaluate the coregistration effect, the differences between the reprojected DEMs before and after coregistration, named initial_dem_diff and final_dem_diff, will be considered to compute the Probability Density Function and the Cummulative Density Function.
The difference between the reprojected DEMs after coregistration (the final_dem_diff) will be considered to compute the input or default metrics.
"output_dir": "./test_output/",
"input_ref": {
"path": "./Gironde.tif",
"nodata": -9999.0,
},
"input_sec": {
"path": "./FinalWaveBathymetry_T30TXR_20200622T105631_D_MSL_invert.TIF",
"nodata": -32768,
},
"coregistration": {
"coregistration_method": "nuth_kaab_internal",
}
"statistics": {
"remove_outliers": "True",
}
The following metrics will be computed:
On initial_dem_diff and on final_dem_diff:
cdf
,
Note
No classification is considered for the metrics to evaluate the coregistration effect. If classification layers are specified on the input configuration, those will be only be considered for the ‘’Other default metrics’’ computation.
On final_dem_diff:
mean
,median
,max
,min
,sum
,squared_sum
,std
,percentil_90
,nmad
,rmse
.
Note
If the user specifies the required metrics to be computed, those will substitute the default metrics. However, the ‘’metrics to evaluate the coregistration effect’’ will still be computed.
The user may specify the required metrics as follows :
"output_dir": "./test_output/",
"input_ref": {
"path": "./Gironde.tif",
"nodata": -9999.0,
},
"input_sec": {
"path": "./FinalWaveBathymetry_T30TXR_20200622T105631_D_MSL_invert.TIF",
"nodata": -32768,
},
"coregistration": {
"coregistration_method": "nuth_kaab_internal",
}
"statistics": {
"remove_outliers": "True",
"metrics": ["mean", {"ratio_above_threshold": {"elevation_threshold": [1, 2, 3]}}]
}
Metrics¶
The following metrics are currently available on demcompare:
mean
max
min
std
(Standard Deviation)rmse
(Root Mean Squared Error)median
nmad
(Normalized Median Absolute Deviation)sum
squared_sum
percentil_90
Name |
Type |
Parameters |
Type |
Default value |
---|---|---|---|---|
|
vector |
bin_step |
float |
|
output_csv_path |
string |
|
||
output_plot_path |
string |
|
||
|
vector |
bin_step |
float |
|
width |
float |
|
||
filter_p98 |
float |
|
||
output_csv_path |
string |
|
||
output_plot_path |
string |
|
||
|
vector |
elevation_threshold |
List[float, int] |
|
original_unit |
string |
|
||
output_csv_path |
string |
|
Note
The metrics are always computed on valid pixels. Valid pixels are those whose value is different than NaN and the nodata value (-32768 by default if not specified in the input configuration or in the input DEM).
Note
Apart from only considering the valid pixels, the user may also specify the remove_outliers
option
in the input configuration. This option will also filter all DEM pixels outside (mu + 3 sigma) and (mu - 3 sigma),
being mu the mean and sigma the standard deviation of all valid pixels in the DEM.
Classification layers¶
Classification layers are a way to classify the DEM pixels in classes according to different criteria in order to compute specific statistics according to each class.
Four types of classification layers exist:
The global classification is the default classification and is always computed. This layer has a single class where all valid pixels are considered. If no classification layers are specified in the input configuration, only the global classification will be considered.
This type of classification layer considers an input classification mask in order to classify the DEM pixels. The classification mask must be specified with its classes, and linked to one of the input DEMs defined in the input configuration as follows:
"output_dir": "./test_output/",
"input_ref": {
"path": "./Gironde.tif",
"zunit": "m"
},
"input_sec": {
"path": "./FinalWaveBathymetry_T30TXR_20200622T105631_D_MSL_invert.TIF",
"zunit": "m",
"nodata": -9999,
"classification_layers": {
"Status": {
"map_path": "./FinalWaveBathymetry_T30TXR_20200622T105631_Status.TIF"
}
}
}
"statistics": {
"remove_outliers": "False",
"classification_layers": {
"Status": {
"type": "segmentation",
"classes": {"valid": [0],"KO": [1],"Land": [2],"NoData": [3],"Outside_detector": [4]}
}
}
}
On this example, we can see that the classification mask is linked to the secondary DEM.
Regarding the classification_layer configuration,
the type
is specified as segmentation
, and the different classes
are specified as a dictionary containing the different
names and their mask values.
Notice that a class may contain different mask values, for instance:
"statistics": {
"remove_outliers": "False",
"classification_layers": {
"Status": {
"type": "segmentation",
"classes": {"valid": [0, 1], "Land": [2, 3], "NoData": [4, 5]}
}
}
}
If a classification mask is specified for both input_ref and input_sec, the mask classification of the ref DEM is considered for the general statistics computation, whilst the sec mask classification is considered for the intersection and exclusion statistics as explained on The modes.
Note
The input classification mask must be superimposable to its support DEM, meaning that it must have the same size and resolution. It is to be noticed that during execution, all the transformations applied to the support DEM will also be applied to its classification masks to ensure that they continue to be superimposable.
This type of classification computes the slope of the input DEMs and classifies the pixels according to the range on which its slope falls. It is to be noticed that if two DEMs are defined as inputs, then the slope will be computed on both input DEMs separately, and not in the difference between both.
The slope of each DEM is obtained as follows:
, where and are the result of the convolution and of the DEM with the kernels :
The slope will then be classified by the ranges set with the ranges
argument.
Each class will contain all the pixels for whom the slope is contained inside the associated slope range. At the end, there will be a class mask for each slope range.
Regarding the classification_layer configuration,
the type
is specified as slope
, and the different ranges
are specified as a list. A valid slope configuration could be:
"classification_layers": {
"Slope0": {
"type": "slope",
"ranges": [0, 5, 10, 25, 45]
}
}
This type of classification layer is created from two or more existing classification layers, as it is the result of fusing the classes of different classification layers. It is to be noticed that only classification layers belonging to the same support DEM can be fused.
For example, given the two following classification layers with their corresponding classes and mask values:
Slope0: "[0%;5%[", 1
"[5%;10%[", 2
"[10%;inf[", 3
Status: "Sea", 1
"Deep_land", 2
"Coast", 3
The resulting fusion layer would have the following fused classes :
Fusion0: "Status_sea_&_Slope0_[0%;5%[", 1,
"Status_sea_&_Slope0_[5%;10%[", 2,
"Status_sea_&_Slope0_[10%;inf[", 3,
"Status_deep_land_&_Slope0_[0%;5%[", 4,
"Status_deep_land_&_Slope0_[5%;10%[", 5,
"Status_deep_land_&_Slope0_[10%;inf[", 6,
A possible configuration including a fusion classification layer in included here. As one can see the type
is specified as fusion
,
and the support dem of the list of layers to be fused, in this case sec
, must be specified :
"output_dir": "./test_output/",
"input_ref": {
"path": "./Gironde.tif",
"zunit": "m"
},
"input_sec": {
"path": "./FinalWaveBathymetry_T30TXR_20200622T105631_D_MSL_invert.TIF",
"zunit": "m",
"nodata": -9999,
"classification_layers": {
"Status": {
"map_path": "./FinalWaveBathymetry_T30TXR_20200622T105631_Status.TIF"}
}
},
"statistics": {
"classification_layers": {
"Status": {
"type": "segmentation",
"classes": {"valid": [0], "KO": [1], "Land": [2], "NoData": [3], "Outside_detector": [4],
},
"Slope0": {
"type": "slope",
"ranges": [0, 10, 25, 50, 90],
},
"Fusion0": {
"type": "fusion",
"sec": ["Slope0", "Status"]
}
}
}
In the following schema we can see an example case where two different segmentation layers and a slope layer are created, each having a single support:
Segmentation_0 has ref support
Segmentation_1 has sec support
Slope_0 has sec support
Hence, a fusion layer can be created by fusing the two layers that have the same support, in this case Segmentation_1 and Slope_0 with sec support.
The modes¶
As shown in previous section, demcompare will classify stats according to classification layers and classification layer masks must be superimposable to one DEM, meaning that the classification mask and its support DEM must have the same size and resolution.
Whenever a classification layer is given for both DEMs (say one has two DEMs with associated segmentation maps) then it can be possible to observe the metrics for pixels whose classification (segmentation for example) is the same between both DEM or not. These observations are available through what we call mode. Demcompare supports:
Within this mode all valid pixels are considered. It means nan values but also outliers (if remove_outliers
was set to "True"
) and masked ones are discarded.
Note that the nan values can be originated from the altitude differences image and / or the exogenous classification layers themselves (ie. if the input segmentation has NaN values, the corresponding pixels will not be considered for the statistics computation of this classification layer).
These modes are only available if both DEMs (ref and sec) where classified by the same classification layer :
The intersection mode is the mode where only the pixels sharing the same label for both DEMs classification layers are kept.
Say after a coregistration, a pixel P is associated to a ‘grass land’ inside a ref classification layer named land_cover and a road inside the sec classification layer also named land_cover, then pixel P is not intersection for demcompare.
The exclusion mode which is the intersection one complementary.
In the following schema we can see a scenario where two different segmentation layers and a slope layer are created. Both segmentation layers having a single support and the slope layer having two supports.
Segmentation_0 has only ref support, hence the statistics are computed considering the ref segmentation_0_mask.
Segmentation_1 has only sec support, hence the statistics are computed considering the sec segmentation_1_mask.
Slope_0 has both ref and support, hence the statistics are computed considering:
the ref slope_0_mask for the standard mode
the intersection between the ref slope_0_mask and the sec slope_0_mask for the intersection and exclusion modes.
Metric selection¶
The metrics to be computed may be specified at different levels on the statistics configuration:
Global level: those metrics will be computed for all classification layers
Classification layer level: those metrics will be computed specifically for the given classification layer
For instance, with the following configuration we could compute the mean, ratio_above_threshold metrics on all layers, whilst nmad metric would be computed only for the Slope0 layer.
"statistics": { "classification_layers": { "Status": { "type": "segmentation", "classes": { "valid": [0], "KO": [1], "Land": [2], "NoData": [3], "Outside_detector": [4], }, }, "Slope0": { "type": "slope", "ranges": [0, 10, 25, 50, 90], "metrics": ["nmad"], }, "Fusion0": { "type": "fusion", "sec": ["Slope0", "Status"] }, }, "metrics": [ "mean", {"ratio_above_threshold": {"elevation_threshold": [1, 2, 3]}}, ], }
Statistics parameters¶
Here is the list of the parameters of the input configuration file for the statistics step and its associated default value when it exists:
Name |
Description |
Type |
Default value |
Required |
---|---|---|---|---|
|
Remove outliers during statistics
computation
|
string |
|
No |
|
Metrics to be computed |
List |
|
No |
Name |
Description |
Type |
Default value |
Required |
---|---|---|---|---|
|
Classification layer type |
string |
|
Yes |
remove_outliers |
Remove outliers during statistics computation
for this particular classification layer
|
string |
|
No |
|
Classification layer no data value |
float or int |
|
No |
|
Classification layer metrics to be computed
(if metrics have been specified for the whole
stats, they will also be computed for this
classification)
|
List |
|
No |
Name |
Description |
Type |
Default value |
Required |
---|---|---|---|---|
|
Segmentation classes |
Dict |
|
Yes |
Name |
Description |
Type |
Default value |
Required |
---|---|---|---|---|
|
Slope ranges |
List |
No |
Name |
Description |
Type |
Default value |
Required |
---|---|---|---|---|
|
Ref classification layers to fusion |
List |
|
No |
|
Sec classification layers to fusion |
List |
|
No |
Statistics outputs¶
Output files and their required parameters¶
The images and files saved with the statistics
option activated on the configuration :
Name |
Description |
---|---|
dem_for_stats.tif |
DEM on which the statistics have been computed |
ref and sec_rectified_support_map.tif |
Stored on each classification layer folder, the rectified support maps
where each pixel has a class value.
|
stats_results.csv and .json |
Stored on each classification layer folder,
the CSV and Json files storing the computed statistics by class.
|
stats_results_intersection.csv and .json |
Stored on each classification layer folder, the CSV and Json files
storing the computed statistics by class in mode intersection.
|
stats_results_exclusion.csv and .json |
Stored on each classification layer folder, the CSV and Json files
storing the computed statistics by class in mode exclusion.
|
Output directories¶
With the command line execution, the following statistics directories that may store the respective files will be automatically generated.
.output_dir
+-- stats
+-- dem_for_stats.tif
+-- *classification_layer_name*
+-- stats_results.json/csv
+-- stats_results_intersection.json/csv
+-- stats_results_exclusion.json/csv
+-- ref_rectified_support_map.tif
+-- sec_rectified_support_map.tif
Note
Please notice that even if no classification layer has been specified, the results will be stored in a folder called global
, as it
is the classification layer that is always computed and only considers all valid pixels.
Note
Please notice that some data may be missing if it has not been computed for the classification layer (ie. intersection maps are only computed under certain conditions The modes).