Metric

Handle processing of the stored outputs to produce the metrics of the doce module.

class doce.metric.Metric[source]

Stores information about the way evaluation metrics are stored and manipulated.

Stores information about the way evaluation metrics are stored and manipulated. Each member of this class describes an evaluation metric and the way it may be abstracted. Two name_spaces (doce.metric.Metric._unit, doce.metric.Metric._description) are available to respectively provide information about the unit of the metric and its semantic.

Each metric may be reduced by any mathematical operation that operate on a vector made available by the numpy library with default parameters.

Two pruning strategies can be complemented to this description in order to remove some items of the metric vector before being abstracted.

One can select one value of the vector by providing its index.

Examples

>>> import doce
>>> m = doce.metric.Metric()
>>> m.duration = ['mean', 'std']
>>> m._unit.duration = 'second'
>>> m._description = 'duration of the processing'

It is sometimes useful to store complementary data useful for plotting that must not be considered during the reduction.

>>> m.metric1 = ['median-0', 'min-0', 'max-0']

In this case, the first value will be removed before reduction.

>>> m.metric2 = ['median-2', 'min-2', 'max-2', '0%']

In this case, the odd values will be removed before reduction and the last reduction will select the first value of the metric vector, expressed in percents by multiplying it by 100.

Methods

`get_column_header`(plan[, factor_display, ...])	Builds the column header of the reduction setting_description.
`name`()	Returns a list of str with the names of the metrics.
`reduce`(settings, path[, setting_encoding, ...])	Apply the reduction directives described in each members of doce.metric.
`reduce_from_h5`(settings, path[, ...])	Handle reduction of the metrics when considering numpy storage.
`reduce_from_npy`(settings, path[, ...])	Handle reduction of the metrics when considering numpy storage.

significance_status

get_column_header(plan, factor_display='long', factor_display_length=2, metric_display='long', metric_display_length=2, metric_has_data=None, reduced_metric_display='capitalize')[source]

Builds the column header of the reduction setting_description.

This method builds the column header of the reduction setting_description by formating the Factor names from the doce.Plan class and by describing the reduced metrics.

Parameters:

plandoce.Plan

The doce.Plan describing the factors of the experiment.

factor_displaystr (optional)

The expected format of the display of factors. ‘long’ (default) do not lead to any reduction. If factor_display contains ‘short’, a reduction of each word is performed.

‘short_underscore’ assumes python_case delimitation.

‘short_capital’ assumes camel_case delimitation.

‘short’ attempts to perform reduction by guessing the type of delimitation.

factor_display_lengthint (optional)

If factor_display has ‘short’, factor_display_length specifies the maximal length of each word of the description of the factor.

metric_has_datalist of bool

Specify for each metric described in the doce.metric.Metric object, whether data has been loaded or not.

reduced_metric_displaystr (optional)

If set to ‘capitalize’ (default), the description of the reduced metric is done in a Camel case fashion: metricReduction.

If set to ‘underscore’, the description of the reduced metric is done in a snake case fashion: metric_reduction.

See also

doce.util.compress_description

name()[source]

Returns a list of str with the names of the metrics.

Returns a list of str with the names of the metrics defined as members of the doce.metric.Metric object.

Examples

>>> import doce
>>> m = doce.metric.Metric()
>>> m.duration = ['mean']
>>> m.mse = ['mean']
>>> m.name()
['duration', 'mse']

reduce(settings, path, setting_encoding=None, factor_display='long', factor_display_length=2, metric_display='long', metric_display_length=2, reduced_metric_display='capitalize', verbose=False)[source]

Apply the reduction directives described in each members of doce.metric. Metric objects for the settings given as parameters.

For each setting in the iterable settings, available data corresponding to the metrics specified as members of the doce.metric.Metric object are reduced using specified reduction methods.

Parameters:

settings: doce.Plan

iterable settings.

path: str

In the case of .npy storage, a valid path to the main directory. In the case of .h5 storage, a valid path to an .h5 file.

setting_encodingdict

Encoding of the setting. See doce.Plan.id for references.

reduced_metric_displaystr (optional)

If set to ‘capitalize’ (default), the description of the reduced metric is done in a Camel case fashion: metric_reduction.

If set to ‘underscore’, the description of the reduced metric is done in a Python case fashion: metric_reduction.

factordoce.Plan

The doce.Plan describing the factors of the experiment.

factor_displaystr (optional)

The expected format of the display of factors. ‘long’ (default) do not lead to any reduction. If factor_display contains ‘short’, a reduction of each word is performed.

‘short_underscore’ assumes python_case delimitation.

‘short_capital’ assumes camel_case delimitation.

‘short’ attempts to perform reduction by guessing the type of delimitation.

factor_display_lengthint (optional)

If factor_display has ‘short’, factor_display_length specifies the maximal length of each word of the description of the factor.

verbosebool

In the case of .npy metric storage, if verbose is set to True, print the file_name seeked for each metric as well as its time of last modification.

In the case of .h5 metric storage, if verbose is set to True, print the group seeked for each metric.

Returns:

setting_descriptionlist of lists of literals: A setting_description, stored as a list of list of literals of the same size. The main list stores the rows of the setting_description.
column_headerlist of str: The column header of the setting_description as a list of str, describing the factors (left side), and the reduced metrics (right side).
constant_setting_descriptionstr: When a factor is equally valued for all the settings, the factor column is removed from the setting_description and stored in constant_setting_description along its value.
nb_column_factorint: The number of factors in the column header.

Examples

doce supports metrics storage using an .npy file per metric per setting.

>>> import doce
>>> import numpy as np
>>> import pandas as pd
>>> np.random.seed(0)

>>> experiment = doce.experiment.Experiment()
>>> experiment.name = 'example'
>>> experiment.set_path('output', '/tmp/'+experiment.name+'/', force=True)
>>> experiment.add_plan('plan', f1 = [1, 2], f2 = [1, 2, 3])
>>> experiment.set_metric(name = 'm1_mean', output = 'm1', func = np.mean)
>>> experiment.set_metric(name = 'm1_std', output = 'm1', func = np.std)
>>> experiment.set_metric(name = 'm2_min', output = 'm2', func = np.min)
>>> experiment.set_metric(name = 'm2_argmin', output = 'm2', func = np.argmin)
>>> def process(setting, experiment):
...   metric1 = setting.f1+setting.f2+np.random.randn(100)
...   metric2 = setting.f1*setting.f2*np.random.randn(100)
...   np.save(experiment.path.output+setting.identifier()+'_m1.npy', metric1)
...   np.save(experiment.path.output+setting.identifier()+'_m2.npy', metric2)
>>> nb_failed = experiment.perform([], process, progress='')
>>> (setting_description,
... column_header,
... constant_setting_description,
... nb_column_factor,
... modification_time_stamp,
... p_values
... ) = experiment.metric.reduce(experiment._plan.select([1]), experiment.path)

>>> df = pd.DataFrame(setting_description, columns=column_header)
>>> df[column_header[nb_column_factor:]] = df[column_header[nb_column_factor:]].round(decimals=2)
>>> print(constant_setting_description)
f1: 2
>>> print(df)
  f2  m1_mean  m1_std  m2_min  m2_argmin
0   1    2.87   1.00  -4.49        35
1   2    3.97   0.93  -8.19        13
2   3    5.00   0.91 -12.07        98

doce also supports metrics storage using one .h5 file sink structured with settings as groups et metrics as leaf nodes.

>>> import doce
>>> import numpy as np
>>> import tables as tb
>>> import pandas as pd
>>> np.random.seed(0)

>>> experiment = doce.experiment.Experiment()
>>> experiment.name = 'example'
>>> experiment.set_path('output', '/tmp/'+experiment.name+'.h5', force=True)
>>> experiment.add_plan('plan', f1 = [1, 2], f2 = [1, 2, 3])
>>> experiment.set_metric(name = 'm1_mean', output = 'm1', func = np.mean)
>>> experiment.set_metric(name = 'm1_std', output = 'm1', func = np.std)
>>> experiment.set_metric(name = 'm2_min', output = 'm2', func = np.min)
>>> experiment.set_metric(name = 'm2_argmin', output = 'm2', func = np.argmin)
>>> def process(setting, experiment):
...   h5 = tb.open_file(experiment.path.output, mode='a')
...   setting_group = experiment.add_setting_group(
...     h5,
...     setting,
...     output_dimension = {'m1':100, 'm2':100}
...   )
...   setting_group.m1[:] = setting.f1+setting.f2+np.random.randn(100)
...   setting_group.m2[:] = setting.f1*setting.f2*np.random.randn(100)
...   h5.close()
>>> nb_failed = experiment.perform([], process, progress='')
>>> h5 = tb.open_file(experiment.path.output, mode='r')
>>> print(h5)
/tmp/example.h5 (File) ''
Last modif.: '...'
    Object Tree:
/ (RootGroup) ''
/f1=1+f2=1 (Group) 'f1=1+f2=1'
/f1=1+f2=1/m1 (Array(100,)) 'm1'
/f1=1+f2=1/m2 (EArray(100,)) 'm2'
/f1=1+f2=2 (Group) 'f1=1+f2=2'
/f1=1+f2=2/m1 (Array(100,)) 'm1'
/f1=1+f2=2/m2 (EArray(100,)) 'm2'
/f1=1+f2=3 (Group) 'f1=1+f2=3'
/f1=1+f2=3/m1 (Array(100,)) 'm1'
/f1=1+f2=3/m2 (EArray(100,)) 'm2'
/f1=2+f2=1 (Group) 'f1=2+f2=1'
/f1=2+f2=1/m1 (Array(100,)) 'm1'
/f1=2+f2=1/m2 (EArray(100,)) 'm2'
/f1=2+f2=2 (Group) 'f1=2+f2=2'
/f1=2+f2=2/m1 (Array(100,)) 'm1'
/f1=2+f2=2/m2 (EArray(100,)) 'm2'
/f1=2+f2=3 (Group) 'f1=2+f2=3'
/f1=2+f2=3/m1 (Array(100,)) 'm1'
/f1=2+f2=3/m2 (EArray(100,)) 'm2'
>>> h5.close()

>>> (setting_description,
... column_header,
... constant_setting_description,
... nb_column_factor,
... modification_time_stamp,
... p_values) = experiment.metric.reduce(experiment.plan.select([0]), experiment.path)

>>> df = pd.DataFrame(setting_description, columns=column_header)
>>> df[column_header[nb_column_factor:]] = df[column_header[nb_column_factor:]].round(decimals=2)
>>> print(constant_setting_description)
f1: 1
>>> print(df)
  f2  m1_mean  m1_std  m2_min  m2_argmin
0   1    2.06   1.01  -2.22        83
1   2    2.94   0.95  -5.32        34
2   3    3.99   1.04  -9.14        89

reduce_from_h5(settings, path, setting_encoding=None, verbose=False)[source]

Handle reduction of the metrics when considering numpy storage.

The method handles the reduction of the metrics when considering h5 storage.

The method doce.metric.Metric.reduce() wraps this method and should be considered as the main user interface, please see its documentation for usage.