imputation-missing-values.md 1.7 KB
Newer Older
1
# Imputation missing values
Arkadiusz Kondas's avatar
Arkadiusz Kondas committed
2
3
4
5
6
7
8
9
10

For various reasons, many real world datasets contain missing values, often encoded as blanks, NaNs or other placeholders.
To solve this problem you can use the `Imputer` class.

## Constructor Parameters

* $missingValue (mixed) - this value will be replaced (default null)
* $strategy (Strategy) - imputation strategy (read to use: MeanStrategy, MedianStrategy, MostFrequentStrategy)
* $axis (int) - axis for strategy, Imputer::AXIS_COLUMN or Imputer::AXIS_ROW
11
* $samples (array) - array of samples to train
Arkadiusz Kondas's avatar
Arkadiusz Kondas committed
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

```
$imputer = new Imputer(null, new MeanStrategy(), Imputer::AXIS_COLUMN);
$imputer = new Imputer(null, new MedianStrategy(), Imputer::AXIS_ROW);
```

## Strategy

* MeanStrategy - replace missing values using the mean along the axis
* MedianStrategy - replace missing values using the median along the axis
* MostFrequentStrategy - replace missing using the most frequent value along the axis

## Example of use

```
Arkadiusz Kondas's avatar
Arkadiusz Kondas committed
27
28
29
use Phpml\Preprocessing\Imputer;
use Phpml\Preprocessing\Imputer\Strategy\MeanStrategy;

Arkadiusz Kondas's avatar
Arkadiusz Kondas committed
30
31
32
33
34
35
36
37
$data = [
    [1, null, 3, 4],
    [4, 3, 2, 1],
    [null, 6, 7, 8],
    [8, 7, null, 5],
];

$imputer = new Imputer(null, new MeanStrategy(), Imputer::AXIS_COLUMN);
38
$imputer->fit($data);
39
$imputer->transform($data);
Arkadiusz Kondas's avatar
Arkadiusz Kondas committed
40
41
42
43
44
45
46
47
48
49
50

/*
$data = [
    [1, 5.33, 3, 4],
    [4, 3, 2, 1],
    [4.33, 6, 7, 8],
    [8, 7, 4, 5],
];
*/

```
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67

You can also use `$samples` constructer parameter instead of `fit` method:

```
use Phpml\Preprocessing\Imputer;
use Phpml\Preprocessing\Imputer\Strategy\MeanStrategy;

$data = [
    [1, null, 3, 4],
    [4, 3, 2, 1],
    [null, 6, 7, 8],
    [8, 7, null, 5],
];

$imputer = new Imputer(null, new MeanStrategy(), Imputer::AXIS_COLUMN, $data);
$imputer->transform($data);
```