A fuzzy logic C++ library
|
This class is intended to hold all the loaded data points for further processing. More...
#include <data_set.hpp>
Public Member Functions | |
DATA_SET () | |
Constructor. | |
size_t | GetNbPoints () const |
void | Clear () |
Clears the set. | |
void | Print (FILE *f, const char *msg=0, bool PrintRaw=false) const |
Prints data set in file. | |
void | ReadData (DATAFILE_INFO &dfi) |
Reads the whole dataset from file described by dfi , throws an error on failure. | |
const VALUE_PTR | GetOutValue (size_t sample_idx) const |
Returns the scalar output value of sample sample_idx (starting from 0) | |
void | GetInputValues (size_t sample_idx, std::vector< double > &values) const |
Returns a vector of all the *numerical* input values, for sample sample_idx (starting from 0) | |
size_t | GetOutputIndex () const |
size_t | GetNbInputFields () const |
Returns nb of input fields. | |
size_t | GetNbFields () const |
Returns the number of fields of the dataset. | |
EN_DATA_FIELD_TYPE | GetFieldType (size_t i) const |
Returns field type. | |
void | GetFieldTypeIndexes (EN_DATA_FIELD_TYPE ft, std::vector< size_t > &v) const |
Returns in v the indexes of fields that are of type ft . | |
Getting statistical information about set | |
const DATASET_PROPERTIES & | GetProperties () const |
Returns properties of the dataset. | |
double | GetInMMValue (EN_MinMaxValue mm, size_t i) const |
Returns min/max input value of data set, index is NOT related to original column in file. | |
double | GetOutMMValue (EN_MinMaxValue mm) const |
Returns min/max value of all the output values of the dataset. | |
Retrieving and adding points | |
const DATA_POINT & | GetDataPoint (size_t idx) const |
DATA_POINT & | GetDataPoint (size_t idx) |
void | AddDataPoint (const DATA_POINT &dpt) |
Add data point to the data set. | |
Description related functions | |
void | AssignDescription (const DATA_DESCR &) |
Check adequacy between description descr and the dataset. If ok, assigns description to dataset and returns true. | |
DATA_DESCR | GetDescription () const |
String-data related functions | |
const std::string & | GetStringValue (size_t col_idx, size_t string_index) const |
Returns string value in case of relational string-handling, col_idx is the column index. | |
size_t | GetStringCount (size_t col_idx, size_t string_elem) const |
Returns string counter value in case of relational string-handling, col_idx is the column index. | |
size_t | GetNbClasses (size_t col_idx) const |
Returns the number of different values. | |
size_t | AddStringItem (size_t pointfield_idx, size_t stringfield_idx, const std::string &str_value) |
Adds the string item str_value to the repository of string attributes, and returns the index on it. | |
Producing subsets | |
void | GetSubset (const std::vector< size_t > &v_idx, const INPUT_SETS &inputsets, DATA_SET &subset) const |
Computes a subset of dataset that contains only data points that are in the support (fuzzy support) of inputs membership functions defined by vector v_idx ;. | |
void | GetSubset (const std::vector< size_t > &v_idx, const INPUT_SETS &inputsets, double threshold, DATA_SET &subset) const |
Computes a subset of dataset that contains only data points that are in the support (fuzzy support) of inputs membership functions defined by vector v_idx ;. | |
void | DivideSet (DATA_SET &subset_A, DATA_SET &subset_B, size_t IntervalIdx, size_t NbIntervals=5) const |
Separates the points of the set into two subsets: subset_A and subset_B , according to IntervalIdx and NbIntervals . | |
Private Member Functions | |
void | p_ComputeProperties () const |
void | p_Init () |
Private Attributes | |
std::vector< std::vector < PAIR_STRING_COUNT > > | _vv_StringData |
Will hold the string items of the dataset. | |
std::vector< int > | _v_StringIndexes |
hold the indexes of the strings related to the columns | |
bool | _HasAssignedDescription |
true: means that a description has been assigned to the dataset (i.e. it is not a "generic" description) | |
DATA_DESCR | _data_descr |
std::vector< DATA_POINT > | _v_datapoint |
vector of values | |
bool | _props_are_computed |
DATASET_PROPERTIES | _properties |
This class is intended to hold all the loaded data points for further processing.
It is mainly useful in the learning step for Takagi-Sugeno FIS, where it can be used to get a subset of the points that matches some input interval.
Data is a stl::vector of output values, associated to a vector of input values (each of these being of course a vector, of size equal to the number of inputs of the FIS)
It can be filled by reading a file, CSV or ARFF (Weka) format
Please note that lines of data have a maximum length of BUF_SIZE, defined in helper_functions.hpp
For string attributes, the values are NOT stored in the DATA_POINT object, but in a separate vector of vectors, that holds the string values in a relational way: the data point only holds indexes on this vector.
See also Dataset handling
Related classes:
slifis::DATA_SET::DATA_SET | ( | ) | [inline] |
Constructor.
References p_Init().
size_t slifis::DATA_SET::GetNbPoints | ( | ) | const [inline] |
void slifis::DATA_SET::Clear | ( | ) |
Clears the set.
Referenced by DivideSet(), and GetSubset().
void slifis::DATA_SET::Print | ( | FILE * | f, |
const char * | msg = 0 , |
||
bool | PrintRaw = false |
||
) | const |
Prints data set in file.
For string fields, recall that they are now stored in a relational way, so they are actually stored as indexes. However, in that case, the DATA_POINT::GetValue() function returns the string itself, that is either of type DT_NUMERIC or DT_STRING. Unless you call it with 'true' as second argument.
References __IN__, __OUT__, slifis::DT_NUMERIC, slifis::DT_STRING, slifis::DT_STRING_INDEX, slifis::ERR_DATA_BAD_TYPE, slifis::DATA_POINT::GetPointId(), slifis::DATA_POINT::GetValue(), and SLIFIS_ERROR_1.
Referenced by main().
void slifis::DATA_SET::ReadData | ( | DATAFILE_INFO & | dfi | ) |
Reads the whole dataset from file described by dfi
, throws an error on failure.
-If the dfi
argument has been assigned a description, then
Else, a generic description will be generated for this dataset.
References __IN__, __OUT__, slifis::DATAFILE_INFO::_NbDataPts, slifis::DATAFILE_INFO::CloseFile(), slifis::DATA_DESCR::ComputeIndexesAfterLoading(), slifis::DFT_ARFF, slifis::DFT_CSV, slifis::ERR_IO_ERROR, slifis::DATAFILE_INFO::FileIsGood(), slifis::DATAFILE_INFO::GetDescription(), slifis::DATAFILE_INFO::GetFileType(), slifis::DATAFILE_INFO::GetNbStringFields(), slifis::DATAFILE_INFO::GetTotNbFields(), slifis::DATAFILE_INFO::HasDescription(), slifis::DATAFILE_INFO::IsSet(), slifis::DATAFILE_INFO::OpenFile(), slifis::DATA_POINT::ReadDataFields(), slifis::DATA_DESCR::SetOutputColumn(), SLIFIS_ERROR, SLIFIS_ERROR_1, SLIFIS_ERROR_2, SLIFIS_ERROR_LOG, slifis::ST_DATALINE, and slifis::ST_FAILURE.
Referenced by main().
const DATASET_PROPERTIES & slifis::DATA_SET::GetProperties | ( | ) | const |
Returns properties of the dataset.
References __IN__, and __OUT__.
Referenced by main(), and process_numeric().
double slifis::DATA_SET::GetInMMValue | ( | EN_MinMaxValue | mm, |
size_t | i | ||
) | const |
Returns min/max input value of data set, index is NOT related to original column in file.
References __IN__, slifis::ERR_DATA_BAD_INDEX, slifis::MM_Max, slifis::MM_Min, and SLIFIS_ERROR_2.
Referenced by main().
double slifis::DATA_SET::GetOutMMValue | ( | EN_MinMaxValue | mm | ) | const |
Returns min/max value of all the output values of the dataset.
References slifis::MM_Max, and slifis::MM_Min.
const VALUE_PTR slifis::DATA_SET::GetOutValue | ( | size_t | sample_idx | ) | const |
Returns the scalar output value of sample sample_idx
(starting from 0)
Data set must have a description
References __IN__, __OUT__, slifis::ERR_DATA_BAD_INDEX, slifis::DATA_POINT::GetValue(), SLIFIS_ERROR_2, and VECTOR_ELEM.
Referenced by slifis::SLIFIS::BuildTSRulesFromValues(), slifis::RULE_IDX::ComputeTSError(), and main().
void slifis::DATA_SET::GetInputValues | ( | size_t | sample_idx, |
std::vector< double > & | values | ||
) | const |
Returns a vector of all the *numerical* input values, for sample sample_idx
(starting from 0)
References __IN__, __OUT__, slifis::ERR_DATA_BAD_INDEX, slifis::DATA_POINT::GetValue(), and SLIFIS_ERROR_2.
Referenced by slifis::SLIFIS::BuildTSRulesFromValues(), and main().
size_t slifis::DATA_SET::GetOutputIndex | ( | ) | const [inline] |
References _data_descr, and slifis::DATA_DESCR::GetOutputIndex().
Referenced by main().
size_t slifis::DATA_SET::GetNbInputFields | ( | ) | const [inline] |
Returns nb of input fields.
References _data_descr, and slifis::DATA_DESCR::GetNbInputs().
Referenced by slifis::RULE_IDX::ComputeTSError().
size_t slifis::DATA_SET::GetNbFields | ( | ) | const [inline] |
Returns the number of fields of the dataset.
If the dataset has a description but has no data loaded, it will return then expected number of fields, as specified by the description
References __IN__, __OUT__, _data_descr, _HasAssignedDescription, _v_datapoint, slifis::DATA_DESCR::GetNbInputs(), and GetNbPoints().
Referenced by slifis::SLIFIS::BuildRuleBaseFromData(), main(), and slifis::DATASET_PROPERTIES::P_ComputeProps().
EN_DATA_FIELD_TYPE slifis::DATA_SET::GetFieldType | ( | size_t | idx | ) | const [inline] |
Returns field type.
References __IN__, __OUT__, slifis::ERR_DATA_NO_POINTS, GetDataPoint(), slifis::DATA_POINT::GetDataType(), GetNbPoints(), and SLIFIS_ERROR.
Referenced by main(), and slifis::DATASET_PROPERTIES::P_ComputeProps().
void slifis::DATA_SET::GetFieldTypeIndexes | ( | EN_DATA_FIELD_TYPE | ft, |
std::vector< size_t > & | v | ||
) | const |
const DATA_POINT & slifis::DATA_SET::GetDataPoint | ( | size_t | idx | ) | const [inline] |
DATA_POINT & slifis::DATA_SET::GetDataPoint | ( | size_t | idx | ) | [inline] |
References __IN__, __OUT__, _v_datapoint, slifis::ERR_DATA_BAD_INDEX, GetNbPoints(), and SLIFIS_ERROR_2.
void slifis::DATA_SET::AddDataPoint | ( | const DATA_POINT & | dpt | ) | [inline] |
Add data point to the data set.
References _v_datapoint.
Referenced by DivideSet(), GetSubset(), and main().
void slifis::DATA_SET::AssignDescription | ( | const DATA_DESCR & | descr | ) |
Check adequacy between description descr
and the dataset. If ok, assigns description to dataset and returns true.
Else, it will throw an error
References __IN__, slifis::ERR_DATA_BAD_TYPE, slifis::DATA_DESCR::GetHighestIndex(), slifis::DATA_DESCR::P_CheckOutputNotInInputs(), SLIFIS_ERROR_1, and SLIFIS_ERROR_2.
Referenced by GetSubset(), and main().
DATA_DESCR slifis::DATA_SET::GetDescription | ( | ) | const [inline] |
References _data_descr.
Referenced by slifis::SLIFIS::BuildRuleBaseFromData().
const std::string & slifis::DATA_SET::GetStringValue | ( | size_t | col_idx, |
size_t | string_index | ||
) | const |
size_t slifis::DATA_SET::GetStringCount | ( | size_t | col_idx, |
size_t | string_elem | ||
) | const |
Returns string counter value in case of relational string-handling, col_idx
is the column index.
References __IN__, __OUT__, slifis::ERR_DATA_BAD_INDEX, and SLIFIS_ERROR_2.
Referenced by main().
size_t slifis::DATA_SET::GetNbClasses | ( | size_t | col_idx | ) | const |
Returns the number of different values.
Referenced by main().
size_t slifis::DATA_SET::AddStringItem | ( | size_t | pointfield_idx, |
size_t | stringfield_idx, | ||
const std::string & | str_value | ||
) |
Adds the string item str_value
to the repository of string attributes, and returns the index on it.
References __IN__, __OUT__, slifis::ERR_DATA_BAD_INDEX, and SLIFIS_ERROR_2.
void slifis::DATA_SET::GetSubset | ( | const std::vector< size_t > & | v_idx, |
const INPUT_SETS & | inputsets, | ||
DATA_SET & | subset | ||
) | const |
Computes a subset of dataset that contains only data points that are in the support (fuzzy support) of inputs membership functions defined by vector v_idx
;.
Say we have a FIS with 3 inputs, and we request as vector v_idx
the values 1,0,2. This means we request the values that will be inside MF(1) for first input, inside MF(0) for second input, and MF(2) for third input.
Motivation: this function is required for learning from data with a TS Fis type. It is needed to compute the TS coefficients value from only the subset of points that matches the requirements (expressed by the membership functions). See SLIFIS::BuildTSRulesFromValues()
The support is defined by having a fuzzy value higher than 0
See also twin function void GetSubset( const std::vector<size_t>& v_idx, const INPUT_SETS& inputsets, double threshold, DATA_SET& subset ) const; (using a different algorithm)
v_idx | requested input vector combination |
inputsets | input sets |
subset | output dataset |
References __IN__, __OUT__, AddDataPoint(), AssignDescription(), Clear(), slifis::DATA_POINT::FillWithInputValues(), slifis::MEMBFUNC::GetFirstPoint(), slifis::MEMBFUNC::GetLastPoint(), slifis::FUZZY_ROOT::GetMf(), slifis::INPUT_SETS::GetMfSet(), slifis::INPUT_SETS::GetNb(), slifis::FUZZY_ROOT::GetNbMf(), and slifis::MEMBFUNC::IsFinite().
Referenced by slifis::SLIFIS::BuildTSRulesFromValues().
void slifis::DATA_SET::GetSubset | ( | const std::vector< size_t > & | v_idx, |
const INPUT_SETS & | inputsets, | ||
double | threshold, | ||
DATA_SET & | subset | ||
) | const |
Computes a subset of dataset that contains only data points that are in the support (fuzzy support) of inputs membership functions defined by vector v_idx
;.
Say we have a FIS with 3 inputs, and we request as vector v_idx
the values 1,0,2. This means we request the values that will be inside MF(1) for first input, inside MF(0) for second input, and MF(2) for third input.
The support is defined by having a fuzzy value higher than threshold
Motivation: this function is required for learning from data with a TS Fis type. It is needed to compute the TS coefficients value from only the subset of points that matches the requirements (expressed by the membership functions). See SLIFIS::BuildTSRulesFromValues()
See also twin function (uses a different algorithm): void GetSubset( const std::vector<size_t>& v_idx, const INPUT_SETS& inputsets, DATA_SET& subset ) const
v_idx | requested input vector combination |
inputsets | input sets |
threshold | fuzzy threshold. We don't use FUZZYVAL to reduce dependencies between the code handling data and the code related to fuzzy logic. |
subset | output dataset |
References __IN__, __OUT__, AddDataPoint(), AssignDescription(), Clear(), slifis::MEMBFUNC::Fuzzify(), slifis::DATA_POINT::GetInputValue(), slifis::FUZZY_ROOT::GetMf(), slifis::INPUT_SETS::GetMfSet(), and slifis::INPUT_SETS::GetNb().
void slifis::DATA_SET::DivideSet | ( | DATA_SET & | subset_A, |
DATA_SET & | subset_B, | ||
size_t | IntervalIdx, | ||
size_t | NbIntervals = 5 |
||
) | const |
Separates the points of the set into two subsets: subset_A
and subset_B
, according to IntervalIdx
and NbIntervals
.
If the set has n points, then we divide it into NbIntervals
points and return in subset_A
the part defined by IntervalIdx
. subset_B
will be filled with the rest of the points.
References __IN__, __OUT__, AddDataPoint(), Clear(), slifis::ERR_DATA_BAD_INDEX, and SLIFIS_ERROR_2.
void slifis::DATA_SET::p_ComputeProperties | ( | ) | const [private] |
void slifis::DATA_SET::p_Init | ( | ) | [private] |
Referenced by DATA_SET().
std::vector< std::vector< PAIR_STRING_COUNT > > slifis::DATA_SET::_vv_StringData [private] |
Will hold the string items of the dataset.
For example, if column 1 and 3 hold string values, then the string items from column 1 will be stored in _vv_StringData
[0] (first element) and the string items from column 3 will be stored in _vv_StringData
[1] (second element)
The stored type (a std::pair) allows to store both the string value and the associated counter
std::vector< int > slifis::DATA_SET::_v_StringIndexes [private] |
hold the indexes of the strings related to the columns
For example, say we have 6 columns (indexes 0 to 5), with column 2 and 5 holding string values. Then _v_StringIndexes[1]=0 (1: second column) and _v_StringIndexes[4]=1 (4: fifth column), while we will have
_v_StringIndexes[0] = _v_StringIndexes[2] = _v_StringIndexes[3] = _v_StringIndexes[5] = -1
bool slifis::DATA_SET::_HasAssignedDescription [private] |
true: means that a description has been assigned to the dataset (i.e. it is not a "generic" description)
Referenced by GetNbFields().
DATA_DESCR slifis::DATA_SET::_data_descr [private] |
Referenced by GetDescription(), GetNbFields(), GetNbInputFields(), and GetOutputIndex().
std::vector<DATA_POINT> slifis::DATA_SET::_v_datapoint [private] |
vector of values
Referenced by AddDataPoint(), GetDataPoint(), GetNbFields(), and GetNbPoints().
bool slifis::DATA_SET::_props_are_computed [mutable, private] |
DATASET_PROPERTIES slifis::DATA_SET::_properties [mutable, private] |