A fuzzy logic C++ library
Public Member Functions | Static Public Member Functions | Private Member Functions | Private Attributes | Static Private Attributes | Friends
slifis::DATAFILE_INFO Class Reference

Holds information on a given data file: what columns are numeric, what columns are strings, etc. More...

#include <datafile_info.hpp>

+ Collaboration diagram for slifis::DATAFILE_INFO:

List of all members.

Public Member Functions

 DATAFILE_INFO (std::string fn)
 DATAFILE_INFO ()
void Clear ()
void GetFileInfo ()
 Returns information on file: how many fields, what type, ...
void AssignDataDescription (const DATA_DESCR &descr)
 Assigns description of requested data to the information on file and check consistency.
bool HasDescription () const
const DATA_DESCRGetDescription () const
const DATA_DESCRGetPostReadingDescr () const
const accessors
void Print (FILE *f) const
bool HasAttribNames () const
bool IsSet () const
EN_DF_TYPE GetFileType () const
 Returns type of file, for string identification, see GetString( EN_DF_TYPE )
size_t GetNbDataPts () const
size_t GetNbNumericFields () const
size_t GetNeededNbFields () const
 Returns the number of fields needed for a datapoint, can be either the *real* value, extracted from file, or the requested value, as given in description.
size_t GetTotNbFields () const
size_t GetNbStringFields () const
char GetDelimChar () const
EN_DATA_FIELD_TYPE GetFieldType (size_t idx) const
 Returns field type for index idx.
std::string GetAttribName (size_t idx) const
bool FieldIsRequested (size_t idx) const
 Returns true if field idx (0-based) needs to be loaded from the data file.
size_t GetFirstNumeric () const
 Returns the first numeric field.
std::string GetFileName () const
file related
void OpenFile ()
bool FileIsGood ()
bool FileIsOpen ()
void CloseFile ()

Static Public Member Functions

static void SetCSVDelim (char sep)

Private Member Functions

std::string P_ReadLine ()
void P_GetFileInfo_arff ()
 Returns information on arff file: how many fields, what type.
void P_GetFileInfo_csv ()
 Returns information on csv file, assuming commented line start with '#' and fields separated with ';'.
bool P_FetchArffCommands (const std::string &buf)
 Fetch ARFF commands from line buf, returns true if finished (i.e. if we encounter a DATA command)

Private Attributes

std::string _input_fn
 file name
std::ifstream _datafile
 the input data file
std::vector< PAIR_ATTRIB_NT_vAttribNameType
 attributes names and type
size_t _NbDataPts
 Nb of points.
size_t _NbNumeric
 Nb of numeric values in the columns.
size_t _NbString
 Nb of string values in the columns.
EN_DF_TYPE _FileType
 arff, csv, or other
bool _IsSet
 flag that gets true once the type of file, nb of attributes and names are known
bool _HasStringAttr
 true if file has at least one field of string type
bool _HasAttribNames
 true if data file has attribute names (always true for arff)
std::vector< size_t > _vStringColumns
 indexes of columns that hold a string attribute.
std::vector< std::vector
< std::string > > 
_vvStringNames
 set of possible values for columns holding a string attribute
DATA_DESCR _descr_Original
DATA_DESCR _descr_PostReading
bool _dfi_HasDescription

Static Private Attributes

static char s_CSV_sep = ';'
 CSF-files separator, see SetCSVDelim()
static char s_buf [512]

Friends

class DATA_SET
class DATA_POINT

Detailed Description

Holds information on a given data file: what columns are numeric, what columns are strings, etc.

Usage:

  1. create object with file name, and call GetFileInfo() on object
            DATAFILE_INFO dfi( "myfile.csv" );
            dfi.GetFileInfo();
    
  2. use accessors to fetch information

Constructor & Destructor Documentation

slifis::DATAFILE_INFO::DATAFILE_INFO ( std::string  fn) [inline]

References Clear().

References Clear().


Member Function Documentation

References slifis::DFT_UNKNOWN.

Referenced by DATAFILE_INFO().

Returns information on file: how many fields, what type, ...

  • Takes as single input the file name in DATAFILE_INFO, and fills all the others.
  • Throws an error in case of failure

Actually, this function opens the file and reads it, but only keeps the metadata.

References __IN__, __OUT__, slifis::DFT_ARFF, slifis::DFT_CSV, slifis::DFT_UNKNOWN, slifis::DT_NUMERIC, slifis::DT_STRING, slifis::ERR_IO_ERROR, SLIFIS_ERROR_2, and SWITCH_ERROR.

Referenced by main().

Assigns description of requested data to the information on file and check consistency.

  • if failure, will print a message in logfile and throw an exception

References slifis::DATA_DESCR::ComputeIndexesAfterLoading(), slifis::ERR_DATA_BAD_INDEX, slifis::DATA_DESCR::GetInputIndex(), slifis::DATA_DESCR::GetNbInputs(), slifis::DATA_DESCR::GetOutputIndex(), SLIFIS_ERROR, and SLIFIS_ERROR_LOG.

Referenced by main().

bool slifis::DATAFILE_INFO::HasDescription ( ) const [inline]
void slifis::DATAFILE_INFO::SetCSVDelim ( char  sep) [inline, static]

References s_CSV_sep.

void slifis::DATAFILE_INFO::Print ( FILE *  f) const
bool slifis::DATAFILE_INFO::HasAttribNames ( ) const [inline]
bool slifis::DATAFILE_INFO::IsSet ( ) const [inline]

References _IsSet.

Referenced by slifis::DATA_SET::ReadData().

Returns type of file, for string identification, see GetString( EN_DF_TYPE )

References __IN__, __OUT__, _FileType, _IsSet, slifis::ERR_DATA_INFO_INVALID, and SLIFIS_ERROR.

Referenced by slifis::DATA_SET::ReadData().

size_t slifis::DATAFILE_INFO::GetNbDataPts ( ) const [inline]
size_t slifis::DATAFILE_INFO::GetNbNumericFields ( ) const [inline]
size_t slifis::DATAFILE_INFO::GetNeededNbFields ( ) const [inline]

Returns the number of fields needed for a datapoint, can be either the *real* value, extracted from file, or the requested value, as given in description.

References _descr_Original, _dfi_HasDescription, slifis::DATA_DESCR::GetNbInputs(), and GetTotNbFields().

Referenced by slifis::DATA_POINT::DATA_POINT(), and slifis::DATA_POINT::ReadDataFields().

size_t slifis::DATAFILE_INFO::GetTotNbFields ( ) const [inline]
size_t slifis::DATAFILE_INFO::GetNbStringFields ( ) const [inline]
char slifis::DATAFILE_INFO::GetDelimChar ( ) const [inline]

Returns field type for index idx.

References __IN__, __OUT__, slifis::ERR_DATA_BAD_INDEX, and SLIFIS_ERROR_2.

Referenced by slifis::DATA_POINT::ReadDataFields().

std::string slifis::DATAFILE_INFO::GetAttribName ( size_t  idx) const

References __IN__, __OUT__, slifis::ERR_DATA_BAD_INDEX, and SLIFIS_ERROR_2.

Referenced by main().

bool slifis::DATAFILE_INFO::FieldIsRequested ( size_t  idx) const

Returns true if field idx (0-based) needs to be loaded from the data file.

Referenced by slifis::DATA_POINT::ReadDataFields().

Returns the first numeric field.

References __IN__, __OUT__, slifis::DT_NUMERIC, slifis::ERR_DATA_NO_NUMERIC, and SLIFIS_ERROR.

std::string slifis::DATAFILE_INFO::GetFileName ( ) const [inline]

References _input_fn.

Referenced by slifis::DATA_POINT::ReadDataFields().

References _datafile.

Referenced by main(), and slifis::DATA_SET::ReadData().

References _datafile.

Referenced by slifis::DATA_POINT::ReadDataFields().

std::string slifis::DATAFILE_INFO::P_ReadLine ( ) [private]

Returns information on arff file: how many fields, what type.

  • reads the file until the end (actually, we only need to go up to the line, but this function also counts the number of data points)

References __IN__, __OUT__, slifis::LineHasContent(), and slifis::TrimCR().

Returns information on csv file, assuming commented line start with '#' and fields separated with ';'.

References __IN__, __OUT__, slifis::DT_NUMERIC, slifis::DT_STRING, slifis::ERR_IO_ERROR, slifis::LineHasContent(), SLIFIS_ERROR_2, SLIFIS_ERROR_LOG, slifis::TokensList(), and slifis::TrimCR().

bool slifis::DATAFILE_INFO::P_FetchArffCommands ( const std::string &  buf) [private]

Fetch ARFF commands from line buf, returns true if finished (i.e. if we encounter a DATA command)

  • This function fills _vAttribNameType with names and types of attributes found in the file
  • Throws an error in case of invalid arff file.

References __IN__, __OUT__, slifis::DT_DATE, slifis::DT_DEFAULT, slifis::DT_NUMERIC, slifis::DT_STRING, slifis::ERR_IO_ERROR, SLIFIS_ERROR_2, and slifis::TokensList().


Friends And Related Function Documentation

friend class DATA_SET [friend]
friend class DATA_POINT [friend]

Member Data Documentation

std::string slifis::DATAFILE_INFO::_input_fn [private]

file name

Referenced by GetFileName().

std::ifstream slifis::DATAFILE_INFO::_datafile [private]

the input data file

Referenced by FileIsGood(), and FileIsOpen().

attributes names and type

Referenced by GetTotNbFields().

Nb of points.

Referenced by GetNbDataPts(), and slifis::DATA_SET::ReadData().

Nb of numeric values in the columns.

Referenced by GetNbNumericFields().

Nb of string values in the columns.

Referenced by GetNbStringFields().

arff, csv, or other

Referenced by GetDelimChar(), and GetFileType().

flag that gets true once the type of file, nb of attributes and names are known

Referenced by GetDelimChar(), GetFileType(), GetNbDataPts(), GetNbNumericFields(), GetNbStringFields(), GetTotNbFields(), HasAttribNames(), and IsSet().

true if file has at least one field of string type

true if data file has attribute names (always true for arff)

Referenced by HasAttribNames().

std::vector<size_t> slifis::DATAFILE_INFO::_vStringColumns [private]

indexes of columns that hold a string attribute.

std::vector< std::vector<std::string> > slifis::DATAFILE_INFO::_vvStringNames [private]

set of possible values for columns holding a string attribute

Referenced by GetNeededNbFields().

char slifis::DATAFILE_INFO::s_CSV_sep = ';' [static, private]

CSF-files separator, see SetCSVDelim()

Referenced by GetDelimChar(), and SetCSVDelim().

char slifis::DATAFILE_INFO::s_buf [static, private]