在View中添加了以下两个函数,第一个函数用于逐行读出Text文件中的以逗号隔开的数据,并进行简单的处理——使得每行中第一个逗号前的数据以1为后缀,第二个逗号前的数据以2为后缀,以此类推;第二个函数用来将处理后的结果保存到一个新的文件中。
std::vector<CString> CDataPreprocessView::DataPreprocess()
{
std::vector<CString> vecResult;
CStdioFile fsin;
CString str = "";
CString strTemp, intToStr;
int iCommaPosition, iItemCount;
fsin.Open("D://Visual Workspace//Data Mining//DataPreprocess//mushrooms.txt", CFile::modeReadWrite | CFile::typeText);// Open text file
while(fsin.ReadString(str)) // read data line by line
{ iItemCount = 1;
strTemp = "";
intToStr = "";
while((iCommaPosition = str.Find(",")) != -1)
{//find each comma, and add a number as the suffix of the data before the comma
intToStr.Format("%d", iItemCount);
strTemp = strTemp + str.Left(iCommaPosition) + intToStr + ",";
str = str.Right(str.GetLength() - iCommaPosition - 1);
iItemCount++;
}
intToStr.Format("%d", iItemCount);
strTemp += str + intToStr + "/n";
vecResult.push_back(strTemp);//temporary store each line processed into a vector
}
fsin.Close();// close the file
return vecResult;
}
// function for saving data into a certain file
void CDataPreprocessView::OnFileSave()
{
CStdioFile fsout;
std::vector<CString> vecData;
std::vector<CString>::iterator it_vecData;
vecData = DataPreprocess();
it_vecData = vecData.begin();
fsout.Open("D://Visual Workspace//Data Mining//DataPreprocess//mushroomsProcessed.txt", CStdioFile::modeReadWrite | CStdioFile::typeText); // open a file for saving data
while(it_vecData != vecData.end())
{ //write each line in the vector into the file
fsout.WriteString(*it_vecData);
it_vecData++;
}
fsout.Close();// close the file
}
本文介绍了一个文本数据预处理的方法,该方法通过在每个字段后添加序号后缀来处理逗号分隔的文本数据,并提供了将处理后的数据保存到新文件的功能。
3426





