【笔试】56、金山笔试最后一题，数据统计-优快云博客

本文介绍了一种解析用户登录日志的方法，通过分析日志文件中的用户GUID、城市及登录时间，实现按城市和登录次数排序输出。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

国庆之前参加了金山的笔试，但是做到最后一题没有做完，并且当时下雨思路有点乱，没有做出来，这里补上，其实倒数第二题思路没问题，但是回来检查的时候，发现疏忽了一个地方，暂且不提，这个题应该有多种解法，这里给出我的解法，我同学也做出了另外一种解法，是重新建立一个结构体，也就是用这个结构体同时存放三个数据，这样做可能会更加简便，但是我这也是一种思路吧，只是写得很繁琐

题目：

a) 用户登录的日志文件中，包含用户GUID、所在城市、登录时间等信息。
b) 解析日志文件，按照如下顺序要求，以“城市，用户GUID，登录次数”格式逐行打印出来：
* ⅰ. 日志文件中，一行表示一次登录。
* ⅱ. 城市的排列顺序，按照用户GUID的数量，由多到少排序。统计用户GUID数量时，同一个用户多次登录只计算一次。
* ⅲ. 同一个城市中的用户，用户GUID按照登录次数由多到少排序。
*
* 日志文件格式如下：
* 用户GUDI 城市登录时间
*72FF6A6C6E7D47AF91F5EEA1AE3CDDB7,ZhuHai,20150831 11:17:13
*A3191FB1D8584FB7BD6C6A6A67BB37A7,GuangZhou,20150831 11:17:21
*66471546EF024ECA8D0677A959A1D8F6,ShenZhen,20150831 11:17:40
*72FF6A6C6E7D47AF91F5EEA1AE3CDDB7,ZhuHai,20150831 12:47:37

Data.h

/**
* 题目：
*	a)	用户登录的日志文件中，包含用户GUID、所在城市、登录时间等信息。
*	b)	解析日志文件，按照如下顺序要求，以“城市，用户GUID，登录次数”格式逐行打印出来：
*		ⅰ. 日志文件中，一行表示一次登录。
*		ⅱ. 城市的排列顺序，按照用户GUID的数量，由多到少排序。统计用户GUID数量时，同一个用户多次登录只计算一次。
*		ⅲ. 同一个城市中的用户，用户GUID按照登录次数由多到少排序。
*
*	日志文件格式如下：
*		用户GUDI							城市		登录时间
*72FF6A6C6E7D47AF91F5EEA1AE3CDDB7,ZhuHai,20150831 11:17:13
*A3191FB1D8584FB7BD6C6A6A67BB37A7,GuangZhou,20150831 11:17:21
*66471546EF024ECA8D0677A959A1D8F6,ShenZhen,20150831 11:17:40
*72FF6A6C6E7D47AF91F5EEA1AE3CDDB7,ZhuHai,20150831 12:47:37
* 时间：2015年10月7日20:57:25
* 作者：cutter_point
*/

#ifndef _DATA_H_
#define _DATA_H_

#include <map>
#include <string>
#include <set>

class Data
{
public:
	Data();
	~Data();

	/**
	 *  这个函数用来读取文件中的数据,一行一行读取
	 */
	void readData(std::string filename);

	/**
	 * 输出数据到相应的文件
	 */
	void outputData(std::string filename);
private:
	/**
	 *  这个用来统计（城市，对应的GUID集合数量）
	 */
	std::map<std::string, std::set<std::string> > *cityGUID;

	/**
	 *  这个是统计GUID和登录次数
	 */
	std::map<std::string, int> *GUIDNum;

	/**
	 * 这个函数用来处理一行数据
	 */
	void handleLine(std::string line);

	/**
	 * 解析一行字符串，返回GUID和城市
	 */
	std::string* pareData(std::string line);

	/**
	 *  获取cityGUID中对应GUID最多的一个对应的city
	 */
	std::string getMaxNumOfCity();

	/**
	 *  获取GUIDNum中中最大的num对应的GUID,查出对应城市的
	 */
	std::string getMaxNumOfGUID(std::string cityname);

	/**
	 * 查看我们的GUIDNum中的first在cityGUID[cityname]包含的set中
	 * 也就是同一个GUID是否包含在这个城市中
	 */
	bool guidIsInCity(std::string cityname, std::string guid);
};

#endif //_DATA_H_

Data.cpp

/**
 * 题目：
 *	a)	用户登录的日志文件中，包含用户GUID、所在城市、登录时间等信息。
 *	b)	解析日志文件，按照如下顺序要求，以“城市，用户GUID，登录次数”格式逐行打印出来：
 *		ⅰ. 日志文件中，一行表示一次登录。
 *		ⅱ. 城市的排列顺序，按照用户GUID的数量，由多到少排序。统计用户GUID数量时，同一个用户多次登录只计算一次。
 *		ⅲ. 同一个城市中的用户，用户GUID按照登录次数由多到少排序。
 *
 *	日志文件格式如下：
 *		用户GUDI							城市		登录时间
 * 72FF6A6C6E7D47AF91F5EEA1AE3CDDB7,ZhuHai,20150831 11:17:13
 * A3191FB1D8584FB7BD6C6A6A67BB37A7,GuangZhou,20150831 11:17:21
 * 66471546EF024ECA8D0677A959A1D8F6,ShenZhen,20150831 11:17:40
 * 72FF6A6C6E7D47AF91F5EEA1AE3CDDB7,ZhuHai,20150831 12:47:37
 * 时间：2015年10月7日20:57:25
 * 作者：cutter_point
 */
#include "stdafx.h"
#include "Data.h"

#include <fstream>

Data::Data()
{
	this->cityGUID = new std::map<std::string, std::set<std::string> >();
	this->GUIDNum = new std::map<std::string, int>();
}


Data::~Data()
{
	delete cityGUID;
	delete GUIDNum;
}

/**
 *  这个函数用来读取文件中的数据,一行一行读取
 */
void Data::readData(std::string filename)
{
	std::ifstream file(filename);
	if (!file.is_open())
	{
		throw new std::exception("文件打开异常");
	}//if

	//一行一行地处理
	std::string line;
	while (getline(file, line))
	{
		//处理一行数据
		this->handleLine(line);
	}//while

	file.close();
}

/**
* 解析一行字符串，返回GUID和城市
*/
std::string* Data::pareData(std::string line)
{
	//72FF6A6C6E7D47AF91F5EEA1AE3CDDB7,ZhuHai,20150831 11:17:13
	if (line == "")
		return nullptr;
	std::string result1 = "";
	std::string result2 = "";
	const char *p = line.c_str();
	int start = 0; int douhaonum = 0;	//第一个是我们字段的起始位置，一个是当前的逗号数
	for (int i = 0; i < line.length(); ++i, ++p)
	{
		if (*p == ',')
		{
			if (douhaonum == 0)
			{
				//找到了逗号
				result1 = line.substr(start, i);
				++douhaonum;
				start = i + 1;
			}//if
			else if (douhaonum == 1)
			{
				result2 = line.substr(start, i - start);
				++douhaonum;
			}//else if
		}//if
	}//for
	std::string *result = new std::string[2]();
	result[0] = result1; result[1] = result2;

	return result;
}

/**
 * 这个函数用来处理一行数据
 */
void Data::handleLine(std::string line)
{
	//我们把一行数据集解析为两部分，一个是前面的GUID，一个是城市
	std::string *result = this->pareData(line);
	//把数据存放进去,首先判断是否已经有这个城市（城市，对应的GUID集合数量）
	std::map<std::string, std::set<std::string> >::iterator it = cityGUID->find(result[1]);
	std::set<std::string> sid;
	if (it == cityGUID->end())
	{
		//map中没有这个城市
		sid.insert(result[0]);
		//插入一条数据
		cityGUID->insert(std::map<std::string, std::set<std::string> >::value_type(result[1], sid));
	}//if
	else
	{
		//如果已经存在了,那么我们就给这个城市的第二个set添加一个GUID
		sid = it->second;
		sid.insert(result[0]);
	}//else	

	//GUID和登录次数，我们对这个数据进行添加
	//std::map<std::string, int> *GUIDNum;
	if (GUIDNum->find(result[0]) != GUIDNum->end())
		++GUIDNum->find(result[0])->second;	//数量++，没有就添加
	else
		GUIDNum->insert(std::pair<std::string, int>(result[0], 1));
}

/**
 *  获取cityGUID中对应GUID最多的一个对应的city
 */
std::string Data::getMaxNumOfCity()
{
	if (cityGUID == nullptr)
		return "";
	//std::map<std::string, std::set<std::string> > *cityGUID;
	std::map<std::string, std::set<std::string> >::iterator it = cityGUID->begin();
	std::map<std::string, std::set<std::string> >::iterator max = it;
	for (; it != cityGUID->end(); ++it)
	{
		//这个循环选出最大的那个
		if (it->second.size() > max->second.size())
		{
			//比当前最大的还大
			max = it;
		}//if
	}//for

	//最后返回城市名
	return max->first;
}

/**
 * 查看我们的GUIDNum中的first在cityGUID[cityname]包含的set中
 * 也就是同一个GUID是否包含在这个城市中
 */
bool Data::guidIsInCity(std::string cityname, std::string guid)
{
	std::set<std::string> sid = cityGUID->find(cityname)->second;
	//查看这个set中是否含有对应的guid
	if (sid.find(guid) != sid.end())
	{
		return true;
	}//if

	return false;
}

/**
 *  获取GUIDNum中中最大的num对应的GUID
 *  std::map<std::string, int> *GUIDNum;
 */
std::string Data::getMaxNumOfGUID(std::string cityname)
{
	if (GUIDNum == nullptr)
		return "";
	
	std::map<std::string, int>::iterator it = GUIDNum->begin();
	std::map<std::string, int>::iterator max;
	for (; it != GUIDNum->end(); ++it)
	{
		//找到第一个相应的城市的位置
		std::set<std::string> sid = cityGUID->find(cityname)->second;
		//判断是否有这个guid
		if (sid.find(it->first) != sid.end())
		{
			//如果找到的话
			max = it;
			break;
		}//if
	}//for
	for (; it != GUIDNum->end(); ++it)
	{
		//这个循环选出最大的那个,并且是要求的城市,也就是这个GUID在这个城市中
		//也就是我们GUIDNum中的first在cityGUID[cityname]包含的set中
		if (it->second > max->second && this->guidIsInCity(cityname, it->first))
		{
			//比当前最大的还大
			max = it;
		}//if
	}//for

	return max->first;
}

/**
 * 输出数据到相应的文件
 */
void Data::outputData(std::string filename)
{
	using namespace std;
	//以追加文件的形式打开文件，并添加相应的数据
	wfstream out(filename, ios::app | ios::out);
	set<string> sid;	//设定一个sid，存放等会查找到的最大的那个city的GUID
	while (cityGUID->size() > 0)
	{
		string maxCity = this->getMaxNumOfCity();
		sid = cityGUID->find(maxCity)->second;
		while (sid.size() > 0)
		{
			std::string maxGUID = this->getMaxNumOfGUID(maxCity);
			out << maxCity.c_str()<<","<< maxGUID.c_str()<<","<<GUIDNum->find(maxGUID)->second<< endl;

			//输出之后删除相应的数据
			GUIDNum->erase(maxGUID);
			sid.erase(maxGUID);
		}//while

		cityGUID->erase(maxCity);
	}//while
}

main.cpp

/**
 * 题目：
 *	a)	用户登录的日志文件中，包含用户GUID、所在城市、登录时间等信息。
 *	b)	解析日志文件，按照如下顺序要求，以“城市，用户GUID，登录次数”格式逐行打印出来：
 *		ⅰ. 日志文件中，一行表示一次登录。
 *		ⅱ. 城市的排列顺序，按照用户GUID的数量，由多到少排序。统计用户GUID数量时，同一个用户多次登录只计算一次。
 *		ⅲ. 同一个城市中的用户，用户GUID按照登录次数由多到少排序。
 *
 *	日志文件格式如下：
 *		用户GUDI							城市		登录时间
 *72FF6A6C6E7D47AF91F5EEA1AE3CDDB7,ZhuHai,20150831 11:17:13
 *A3191FB1D8584FB7BD6C6A6A67BB37A7,GuangZhou,20150831 11:17:21
 *66471546EF024ECA8D0677A959A1D8F6,ShenZhen,20150831 11:17:40
 *72FF6A6C6E7D47AF91F5EEA1AE3CDDB7,ZhuHai,20150831 12:47:37
 * 时间：2015年10月7日20:57:25
 * 作者：cutter_point
 */
#include "stdafx.h"
#include "Data.h"
#include <iostream>
#include <string>

using namespace std;

void test(string filename)
{
	Data *d = new Data();
	d->readData(filename);
	d->outputData("newData.log");
}

int _tmain(int argc, _TCHAR* argv[])
{
	test("log.log");
	return 0;
}

输入文件log.log

72FF6A6C6E7D47AF91F5EEA1AE3CDDB7,ZhuHai,20150831 11:17:13
A3191FB1D8584FB7BD6C6A6A67BB37A7,GuangZhou,20150831 11:17:21
66471546EF024ECA8D0677A959A1D8F6,ShenZhen,20150831 11:17:40
72FF6A6C6E7D47AF91F5EEA1AE3CDDB7,ZhuHai,20150831 12:47:37

输出文件newData.log

GuangZhou,A3191FB1D8584FB7BD6C6A6A67BB37A7,1
ShenZhen,66471546EF024ECA8D0677A959A1D8F6,1
ZhuHai,72FF6A6C6E7D47AF91F5EEA1AE3CDDB7,2

感觉要是完整做出这个题还是蛮吃力的，还是得继续努力啊，当时给的时间是一张卷子2个小时，这儿是最后一题，就算前面的题快点一个小时做完，但是最后一题做下来，我做这个题花了一天！！！

感觉自己弱爆了。。。

努力！！！

文件IO是我的弱点。