Experiment

Handle information of an experiment of the doce module.

class doce.experiment.Experiment(**description)[source]

Stores high level information about the experiment and tools to control the processing and storage of data.

The experiment class displays high level information about the experiment such as its name, description, author, author’s email address, and run identification.

Information about storage of data is specified using the experiment.path name_space. It also stores one or several Plan objects and a Metric object to respectively specify the experimental plans and the metrics considered in the experiment.

See also

doce.Plan, doce.metric.Metric

Examples

>>> import doce
>>> e=doce.Experiment()
>>> e.name='my_experiment'
>>> e.author='John Doe'
>>> e.address='john.doe@no-log.org'
>>> e.path.processing='/tmp'
>>> print(e)
  name: my_experiment
  description
  author: John Doe
  address: john.doe@no-log.org
  version: 0.1
  status:
    run_id: ...
    verbose: 0
  selector: []
  parameter
  metric
  path:
    code_raw: ...
    code: ...
    archive_raw:
    archive:
    export_raw: export
    export: export
    processing_raw: /tmp
    processing: /tmp
  host: []

Each level can be complemented with new members to store specific information:

>>> e.specific_info = 'stuff'
>>> import types
>>> e.my_data = types.SimpleNamespace()
>>> e.my_data.info1= 1
>>> e.my_data.info2= 2
>>> print(e)
  name: my_experiment
  description
  author: John Doe
  address: john.doe@no-log.org
  version: 0.1
  status:
    run_id: ...
    verbose: 0
  selector: []
  parameter
  metric
  path:
    code_raw: ...
    code: ...
    archive_raw:
    archive:
    export_raw: export
    export: export
    processing_raw: /tmp
    processing: /tmp
  host: []
  specific_info: stuff
  my_data:
    info1: 1
    info2: 2

Methods

`add_setting_group`(file_id, setting[, ...])	adds a group to the root of a valid py_tables Object in order to store the metrics corresponding to the specified setting.
`clean_data_sink`(path[, selector, reverse, ...])	Perform a cleaning of a data sink (directory or h5 file).
`get_output`([output, selector, path, tag, plan])	Get the output vector from an .npy or a group of a .h5 file.
`perform`(selector[, function, nb_jobs, ...])	Operate the function with parameters on the settings set generated using selector.
`send_mail`([title, body])	Send an email to the email address given in experiment.address.
`set_path`(name, path[, force])	Create directories whose path described in experiment.path are not reachable.

add_plan
default
get_current_plan
plans
select
set_metric
skip_setting

__str__(style='str')[source]

Provide a textual description of the experiment

List all members of the class and theirs values

Parameters:

stylestr

If ‘str’, return the description as a string.

If ‘html’, return the description with an html format.

Returns:

descriptionstr

If style == ‘str’ : a carriage return separated enumeration of the members of the class experiment.

If style == ‘html’ : an html version of the description

Examples

>>> import doce
>>> print(doce.Experiment())
name
description
author: no name
address: noname@noorg.org
version: 0.1
status:
  run_id: ...
  verbose: 0
selector: []
parameter
metric
path:
  code_raw: ...
  code: ...
  archive_raw:
  archive:
  export_raw: export
  export: export
host: []

>>> import doce
>>> doce.Experiment().__str__(style='html')
    '<div>name</div><div>description</div><div>author: no name</div><div>address: noname@noorg.org</div><div>version: 0.1</div><div>status:</div><div>  run_id: ...</div><div>  verbose: 0</div><div>selector: []</div><div>parameter</div><div>metric</div><div>path:</div><div>  code_raw: ...</div><div>  code: ...</div><div>  archive_raw: </div><div>  archive: </div><div>  export_raw: export</div><div>  export: export</div><div>host: []</div><div></div>'

add_setting_group(file_id, setting, output_dimension=None, setting_encoding=None)[source]

adds a group to the root of a valid py_tables Object in order to store the metrics corresponding to the specified setting.

adds a group to the root of a valid py_tables Object in order to store the metrics corresponding to the specified setting. The encoding of the setting is used to set the name of the group. For each metric, a Floating point Pytable Array is created. For any metric, if no dimension is provided in the output_dimension dict, an expandable array is instantiated. If a dimension is available, a static size array is instantiated.

Parameters:

file_id: py_tables file Object
a valid py_tables file Object, leading to an .h5 file opened with writing permission.
setting: :class:`doce.Plan`
an instantiated Factor object describing a setting.
output_dimension: dict
for metrics for which the dimensionality of the storage vector is known,
each key of the dict is a valid metric name and each corresponding value
is the size of the storage vector.
setting_encodingdict
Encoding of the setting. See doce.Plan.id for references.

Returns:

setting_group: a Pytables Group: where metrics corresponding to the specified setting are stored.

Examples

>>> import doce
>>> import numpy as np
>>> import tables as tb

>>> experiment = doce.experiment.Experiment()
>>> experiment.name = 'example'
>>> experiment.set_path('output', '/tmp/'+experiment.name+'.h5')
>>> experiment.add_plan('plan', f1 = [1, 2], f2 = [1, 2, 3])
>>> experiment.set_metric(name = 'm1_mean', output = 'm1', func = np.mean)
>>> experiment.set_metric(name = 'm1_std', output = 'm1', func = np.std)
>>> experiment.set_metric(name = 'm2_min', output = 'm2', func = np.min)
>>> experiment.set_metric(name = 'm2_argmin', output = 'm2', func = np.argmin)

>>> def process(setting, experiment):
...  h5 = tb.open_file(experiment.path.output, mode='a')
...  sg = experiment.add_setting_group(h5, setting, output_dimension = {'m1':100})
...  sg.m1[:] = setting.f1+setting.f2+np.random.randn(100)
...  sg.m2.append(setting.f1*setting.f2*np.random.randn(100))
...  h5.close()
>>> nb_failed = experiment.perform([], process, progress='')

>>> h5 = tb.open_file(experiment.path.output, mode='r')
>>> print(h5)
/tmp/example.h5 (File) ''
Last modif.: '...'
Object Tree:
/ (RootGroup) ''
/f1=1+f2=1 (Group) 'f1=1+f2=1'
/f1=1+f2=1/m1 (Array(100,)) 'm1'
/f1=1+f2=1/m2 (EArray(100,)) 'm2'
/f1=1+f2=2 (Group) 'f1=1+f2=2'
/f1=1+f2=2/m1 (Array(100,)) 'm1'
/f1=1+f2=2/m2 (EArray(100,)) 'm2'
/f1=1+f2=3 (Group) 'f1=1+f2=3'
/f1=1+f2=3/m1 (Array(100,)) 'm1'
/f1=1+f2=3/m2 (EArray(100,)) 'm2'
/f1=2+f2=1 (Group) 'f1=2+f2=1'
/f1=2+f2=1/m1 (Array(100,)) 'm1'
/f1=2+f2=1/m2 (EArray(100,)) 'm2'
/f1=2+f2=2 (Group) 'f1=2+f2=2'
/f1=2+f2=2/m1 (Array(100,)) 'm1'
/f1=2+f2=2/m2 (EArray(100,)) 'm2'
/f1=2+f2=3 (Group) 'f1=2+f2=3'
/f1=2+f2=3/m1 (Array(100,)) 'm1'
/f1=2+f2=3/m2 (EArray(100,)) 'm2'

>>> h5.close()

clean_data_sink(path, selector=None, reverse=False, force=False, keep=False, wildcard='*', setting_encoding=None, archive_path=None, verbose=0)[source]

Perform a cleaning of a data sink (directory or h5 file).

This method is essentially a wrapper to doce._plan.clean_data_sink().

Parameters:

pathstr

If has a / or \, a valid path to a directory or .h5 file.

If has no / or \, a member of the name_space self.path.

selectora list of literals or a list of lists of literals (optional)

selector used to specify the settings set

reversebool (optional)

If False, remove any entry corresponding to the setting set (default).

If True, remove all entries except the ones corresponding to the setting set.

force: bool (optional)

If False, prompt the user before modifying the data sink (default).

If True, do not prompt the user before modifying the data sink.

wildcardstr (optional)

end of the wildcard used to select the entries to remove or to keep (default: ‘*’).

setting_encodingdict (optional)

format of the identifier describing the setting. Please refer to doce.Plan.identifier() for further information.

archive_pathstr (optional)

If not None, specify an existing directory where the specified data will be moved.

If None, the path doce.Experiment._archive_path is used (default).

See also

doce._plan.clean_data_sink, doce.Plan.id

Examples

>>> import doce
>>> import numpy as np
>>> import os
>>> e=doce.Experiment()
>>> e.set_path('output', '/tmp/test', force=True)
>>> e.add_plan('plan', factor1=[1, 3], factor2=[2, 4])
>>> def my_function(setting, experiment):
...   np.save(f'{experiment.path.output}{setting.identifier()}_sum.npy', setting.factor1+setting.factor2)
...   np.save(f'{experiment.path.output}{setting.identifier()}_mult.npy', setting.factor1*setting.factor2)
>>> nb_failed = e.perform([], my_function, progress='')
>>> os.listdir(e.path.output)
['factor1=1+factor2=4_mult.npy', 'factor1=1+factor2=4_sum.npy', 'factor1=3+factor2=4_sum.npy', 'factor1=1+factor2=2_mult.npy', 'factor1=1+factor2=2_sum.npy', 'factor1=3+factor2=2_mult.npy', 'factor1=3+factor2=4_mult.npy', 'factor1=3+factor2=2_sum.npy']

>>> e.clean_data_sink('output', [0], force=True)
>>> os.listdir(e.path.output)
['factor1=3+factor2=4_sum.npy', 'factor1=3+factor2=2_mult.npy', 'factor1=3+factor2=4_mult.npy', 'factor1=3+factor2=2_sum.npy']

>>> e.clean_data_sink('output', [1, 1], force=True, reverse=True, wildcard='*mult*')
>>> os.listdir(e.path.output)
['factor1=3+factor2=4_sum.npy', 'factor1=3+factor2=4_mult.npy', 'factor1=3+factor2=2_sum.npy']

Here, we remove all the files that match the wildcard mult in the directory /tmp/test that do not correspond to the settings that have the first factor set to the second modality and the second factor set to the second modality.

>>> import doce
>>> import tables as tb
>>> e=doce.Experiment()
>>> e.set_path('output', '/tmp/test.h5')
>>> e.add_plan('plan', factor1=[1, 3], factor2=[2, 4])
>>> e.set_metric(name = 'sum')
>>> e.set_metric(name = 'mult')
>>> def my_function(setting, experiment):
...   h5 = tb.open_file(experiment.path.output, mode='a')
...   sg = experiment.add_setting_group(
...     h5, setting,
...     output_dimension={'sum': 1, 'mult': 1})
...   sg.sum[0] = setting.factor1+setting.factor2
...   sg.mult[0] = setting.factor1*setting.factor2
...   h5.close()
>>> nb_failed = e.perform([], my_function, progress='')
>>> h5 = tb.open_file(e.path.output, mode='r')
>>> print(h5)
/tmp/test.h5 (File) ''
Last modif.: '...'
Object Tree:
/ (RootGroup) ''
/factor1=1+factor2=2 (Group) 'factor1=1+factor2=2'
/factor1=1+factor2=2/mult (Array(1,)) 'mult'
/factor1=1+factor2=2/sum (Array(1,)) 'sum'
/factor1=1+factor2=4 (Group) 'factor1=1+factor2=4'
/factor1=1+factor2=4/mult (Array(1,)) 'mult'
/factor1=1+factor2=4/sum (Array(1,)) 'sum'
/factor1=3+factor2=2 (Group) 'factor1=3+factor2=2'
/factor1=3+factor2=2/mult (Array(1,)) 'mult'
/factor1=3+factor2=2/sum (Array(1,)) 'sum'
/factor1=3+factor2=4 (Group) 'factor1=3+factor2=4'
/factor1=3+factor2=4/mult (Array(1,)) 'mult'
/factor1=3+factor2=4/sum (Array(1,)) 'sum'
>>> h5.close()

>>> e.clean_data_sink('output', [0], force=True)
>>> h5 = tb.open_file(e.path.output, mode='r')
>>> print(h5)
/tmp/test.h5 (File) ''
Last modif.: '...'
Object Tree:
/ (RootGroup) ''
/factor1=3+factor2=2 (Group) 'factor1=3+factor2=2'
/factor1=3+factor2=2/mult (Array(1,)) 'mult'
/factor1=3+factor2=2/sum (Array(1,)) 'sum'
/factor1=3+factor2=4 (Group) 'factor1=3+factor2=4'
/factor1=3+factor2=4/mult (Array(1,)) 'mult'
/factor1=3+factor2=4/sum (Array(1,)) 'sum'
>>> h5.close()

>>> e.clean_data_sink('output', [1, 1], force=True, reverse=True, wildcard='*mult*')
>>> h5 = tb.open_file(e.path.output, mode='r')
>>> print(h5)
/tmp/test.h5 (File) ''
Last modif.: '...'
Object Tree:
/ (RootGroup) ''
/factor1=3+factor2=4 (Group) 'factor1=3+factor2=4'
/factor1=3+factor2=4/mult (Array(1,)) 'mult'
/factor1=3+factor2=4/sum (Array(1,)) 'sum'
>>> h5.close()

Here, the same operations are conducted on a h5 file.

get_output(output='', selector=None, path='', tag='', plan=None)[source]

Get the output vector from an .npy or a group of a .h5 file.

Get the output vector as a numpy array from an .npy or a group of a .h5 file.

Parameters:

output: str: The name of the output.
selector: list: Settings selector.
path: str: Name of path as defined in the experiment, or a valid path to a directory in the case of .npy storage, or a valid path to an .h5 file in the case of hdf5 storage.
plan: str: Name of plan to be considered.

Returns:

setting_metric: list of np.Array: stores for each valid setting an np.Array with the values of the metric selected.
setting_description: list of list of str: stores for each valid setting, a compact description of the modalities of each factors. The factors with the same modality accross all the set of settings is stored in constant_setting_description.
constant_setting_description: str: compact description of the factors with the same modality accross all the set of settings.

Examples

>>> import doce
>>> import numpy as np
>>> import pandas as pd

>>> experiment = doce.experiment.Experiment()
>>> experiment.name = 'example'
>>> experiment.set_path('output', '/tmp/{experiment.name}/', force=True)
>>> experiment.add_plan('plan', f1 = [1, 2], f2 = [1, 2, 3])
>>> experiment.set_metric(name = 'm1_mean', output = 'm1', func = np.mean)
>>> experiment.set_metric(name = 'm1_std', output = 'm1', func = np.std)
>>> experiment.set_metric(name = 'm2_min', output = 'm2', func = np.min)
>>> experiment.set_metric(name = 'm2_argmin', output = 'm2', func = np.argmin)

>>> def process(setting, experiment):
...  output1 = setting.f1+setting.f2+np.random.randn(100)
...  output2 = setting.f1*setting.f2*np.random.randn(100)
...  np.save(f'{experiment.path.output+setting.identifier()}_m1.npy', output1)
...  np.save(f'{experiment.path.output+setting.identifier()}_m2.npy', output2)
>>> nb_failed = experiment.perform([], process, progress='')

>>> (setting_output,
...  setting_description,
...  constant_setting_description
... ) = experiment.get_output(output = 'm1', selector = [1], path='output')
>>> print(constant_setting_description)
f1=2
>>> print(setting_description)
['f2=1', 'f2=2', 'f2=3']
>>> print(len(setting_output))
3
>>> print(setting_output[0].shape)
(100,)

perform(selector, function=None, *parameters, nb_jobs=1, progress='d', log_file_name='', mail_interval=0, tag='')[source]

Operate the function with parameters on the settings set generated using selector.

Operate a given function on the setting set generated using selector. The setting set can be browsed in parallel by setting nb_jobs>1. If log_file_name is not empty, a faulty setting do not stop the execution, the error is stored and another setting is executed. If progress is set to True, a graphical display of the progress through the setting set is displayed.

This function is essentially a wrapper to the function doce.Plan.do().

Parameters:

selectora list of literals or a list of lists of literals

selector used to specify the settings set

functionfunction(Plan, Experiment, *parameters) (optional)

A function that operates on a given setting within the experiment environnment with optional parameters.

If None, a description of the given setting is shown.

*parametersany type (optional)

parameters to be given to the function.

nb_jobsint > 0 (optional)

number of jobs.

If nb_jobs = 1, the setting set is browsed sequentially in a depth first traversal of the settings tree (default).

If nb_jobs > 1, the settings set is browsed randomly, and settings are distributed over the different processes.

progressstr (optional)

display progress of scheduling the setting set.

If str has an m, show the selector of the current setting. If str has an d, show a textual description of the current setting (default).

log_file_namestr (optional)

path to a file where potential errors will be logged.

If empty, the execution is stopped on the first faulty setting (default).

If not empty, the execution is not stopped on a faulty setting, and the error is logged in the log_file_name file.

mail_intervalfloat (optional)

interval for sending email about the status of the run.

If 0, no email is sent (default).

It >0, an email is sent as soon as an setting is done and the difference between the current time and the time the last mail was sent is larger than mail_interval.

tagstring (optional)

specify a tag to be added to the output names

See also

doce.Plan.do

Examples

>>> import time
>>> import random
>>> import doce

>>> e=doce.Experiment()
>>> e.add_plan('plan', factor1=[1, 3], factor2=[2, 5])

>>> # this function displays the sum of the two modalities of the current setting
>>> def my_function(setting, experiment):
...  print(f'{setting.factor1}+{setting.factor2}={setting.factor1+setting.factor2}')

>>> # sequential execution of settings
>>> nb_failed = e.perform([], my_function, nb_jobs=1, progress='')
1+2=3
1+5=6
3+2=5
3+5=8
>>> # arbitrary order execution of settings due to the parallelization
>>> nb_failed = e.perform([], my_function, nb_jobs=3, progress='') 
3+2=5
1+5=6
1+2=3
3+5=8

send_mail(title='', body='')[source]

Send an email to the email address given in experiment.address.

Send an email to the experiment.address email address using the smtp service from gmail. For privacy, please consider using a dedicated gmail account by setting experiment._gmail_id and experiment._gmail_app_password. For this, you will need to create a gmail account, set two-step validation and allow connection with app password.

See https://support.google.com/accounts/answer/185833?hl=en for reference.

Parameters:

titlestr: the title of the email in plain text format
bodystr: the body of the email in html format

Examples

>>> import doce
>>> e=doce.Experiment()
>>> e.address = 'john.doe@no-log.org'
>>> e.send_mail('hello', '<div> good day </div>')
Sent message entitled: [doce]  id ... hello ...

set_path(name, path, force=False)[source]

Create directories whose path described in experiment.path are not reachable.

For each path set in experiment.path, create the directory if not reachable. The user may be prompted before creation.

Parameters:

forcebool

If True, do not prompt the user before creating the missing directories.

If False, prompt the user before creation of each missing directory (default).

Examples

>>> import doce
>>> import os
>>> e=doce.Experiment()
>>> e.name = 'experiment'
>>> e.set_path('processing', f'/tmp/{e.name}/processing', force=True)
>>> e.set_path('output', f'/tmp/{e.name}/output', force=True)
>>> os.listdir(f'/tmp/{e.name}')
['processing', 'output']

class doce.experiment.Path[source]: handle storage of path to disk

doce.experiment.get_from_path(metric, settings=None, path='', tag='', setting_encoding=None, verbose=False)[source]

Get the metric vector from an .npy or a group of a .h5 file.

Get the metric vector as a numpy array from an .npy or a group of a .h5 file.

Parameters:

metric: str

The name of the metric. Must be a member of the doce.metric.Metric object.

settings: doce.Plan

Iterable settings.

path: str

In the case of .npy storage, a valid path to the main directory. In the case of .h5 storage, a valid path to an .h5 file.

setting_encodingdict

Encoding of the setting. See doce.Plan.id for references.

verbosebool

In the case of .npy metric storage, if verbose is set to True, print the file_name seeked for the metric.

In the case of .h5 metric storage, if verbose is set to True, print the group seeked for the metric.

Returns:

setting_metric: list of np.Array: stores for each valid setting an np.Array with the values of the metric selected.
setting_description: list of list of str: stores for each valid setting, a compact description of the modalities of each factors. The factors with the same modality accross all the set of settings is stored in constant_setting_description.
constant_setting_description: str: compact description of the factors with the same modality accross all the set of settings.

Examples

>>> import doce
>>> import numpy as np
>>> import pandas as pd

>>> experiment = doce.experiment.Experiment()
>>> experiment.name = 'example'
>>> experiment.set_path('output', f'/tmp/{experiment.name}/', force=True)
>>> experiment.add_plan('plan', f1 = [1, 2], f2 = [1, 2, 3])
>>> experiment.set_metric(name = 'm1_mean', output = 'm1', func = np.mean)
>>> experiment.set_metric(name = 'm1_std', output = 'm1', func = np.std)
>>> experiment.set_metric(name = 'm2_min', output = 'm2', func = np.min)
>>> experiment.set_metric(name = 'm2_argmin', output = 'm2', func = np.argmin)
>>> def process(setting, experiment):
...  metric1 = setting.f1+setting.f2+np.random.randn(100)
...  metric2 = setting.f1*setting.f2*np.random.randn(100)
...  np.save(f'{experiment.path.output}{setting.identifier()}_m1.npy', metric1)
...  np.save(f'{experiment.path.output}{setting.identifier()}_m2.npy', metric2)
>>> nb_failed = experiment.perform([], process, progress='')

>>> (setting_metric,
...  setting_description,
...  constant_setting_description) = get_from_path(
...      'm1',
...      experiment._plan.select([1]),
...      experiment.path.output)
>>> print(constant_setting_description)
f1=2
>>> print(setting_description)
['f2=1', 'f2=2', 'f2=3']
>>> print(len(setting_metric))
3
>>> print(setting_metric[0].shape)
(100,)