Recursive feature elimination.
A FeaturewiseMeasure is used to compute sensitivity maps given a certain dataset. These sensitivity maps are in turn used to discard unimportant features. For each feature selection the transfer error on some testdatset is computed. This procedure is repeated until a given StoppingCriterion is reached.
Notes
Available conditional attributes:
(Conditional attributes enabled by default suffixed with +)
References
Examples
There are multiple possible ways to design an RFE. Here is one example which would rely on a SplitClassifier to extract sensitivities and provide estimate of performance (error)
>>> # Lazy import
>>> from mvpa2.suite import *
>>> rfesvm_split = SplitClassifier(LinearCSVMC(), OddEvenPartitioner())
>>> # design an RFE feature selection to be used with a classifier
>>> rfe = RFE(rfesvm_split.get_sensitivity_analyzer(
... # take sensitivities per each split, L2 norm, mean, abs them
... postproc=ChainMapper([ FxMapper('features', l2_normed),
... FxMapper('samples', np.mean),
... FxMapper('samples', np.abs)])),
... # use the error stored in the confusion matrix of split classifier
... ConfusionBasedError(rfesvm_split, confusion_state='stats'),
... # we just extract error from confusion, so need to split dataset
... Repeater(2),
... # select 50% of the best on each step
... fselector=FractionTailSelector(
... 0.50,
... mode='select', tail='upper'),
... # and stop whenever error didn't improve for up to 10 steps
... stopping_criterion=NBackHistoryStopCrit(BestDetector(), 10),
... # we just extract it from existing confusion
... train_pmeasure=False,
... # but we do want to update sensitivities on each step
... update_sensitivity=True)
>>> clf = FeatureSelectionClassifier(
... LinearCSVMC(),
... # on features selected via RFE
... rfe,
... # custom description
... descr='LinSVM+RFE(splits_avg)' )
Note: If you rely on cross-validation for the StoppingCriterion, make sure that you have at least 3 chunks so that SplitClassifier could have at least 2 chunks to split. Otherwise it can not split more (one chunk could not be splitted).
Methods
forward(data) | Map data from input to output space. |
forward1(data) | Wrapper method to map single samples. |
generate(ds) | Yield processing results. |
get_postproc() | Returns the post-processing node or None. |
get_space() | Query the processing space name of this node. |
reset() | |
reverse(data) | Reverse-map data from output back into input space. |
reverse1(data) | |
set_postproc(node) | Assigns a post-processing node |
set_space(name) | Set the processing space name of this node. |
train(ds) | The default implementation calls _pretrain(), _train(), and finally _posttrain(). |
untrain() | Reverts changes in the state of this node caused by previous training |
Initialize recursive feature elimination
Parameters: | fmeasure : FeaturewiseMeasure pmeasure : Measure
splitter: Splitter :
fselector : Functor
update_sensitivity : bool
nfeatures_min : int
enable_ca : None or list of str
disable_ca : None or list of str
bestdetector : Functor
stopping_criterion : Functor
train_pmeasure : bool
filler : optional
auto_train : bool
force_train : bool
space : str, optional
pass_attr : str, list of str|tuple, optional
postproc : Node instance, optional
descr : str
|
---|
Methods
forward(data) | Map data from input to output space. |
forward1(data) | Wrapper method to map single samples. |
generate(ds) | Yield processing results. |
get_postproc() | Returns the post-processing node or None. |
get_space() | Query the processing space name of this node. |
reset() | |
reverse(data) | Reverse-map data from output back into input space. |
reverse1(data) | |
set_postproc(node) | Assigns a post-processing node |
set_space(name) | Set the processing space name of this node. |
train(ds) | The default implementation calls _pretrain(), _train(), and finally _posttrain(). |
untrain() | Reverts changes in the state of this node caused by previous training |