运营总有各种各样的需求,今天运营需要我做一个文件上传的功能,文件格式是.txt文件,内容是每一行是一个uid,具体需求是,后台上传一份uid的白名单,如果用户的uid在这份白名单上,则弹窗。总数是500万左右(不定),目前文件是60多M。
接到这个需求后,我首先想到的是数据存在redis上的集合里(因为uid大部分是10位数,不适合用bitmap)。但是数据量太大,占用资源还是很大的(每个弹窗的每个平台和每种语言都有一份不同的白名单,后面运营说其实一次性不会上那么多。所以决定这样做,但弹窗下线后,把redis数据删了)。
废话不多说,直接上代码。
后台form 表单提交,因为文件是60多M,所以采用分片上传,用的是百度的webUpload。http://fex.baidu.com/webuploader
主要前端代码如下
<!--文件分片上传-->
<!--引入CSS-->
<link rel="stylesheet" type="text/css" href="/plugs/webUploader/webuploader.css">
<!--引入JS-->
<script type="text/javascript" src="/plugs/webUploader/webuploader.js"></script>
<div class="form-group wu-example" id="target_common" style="display: none;" class="wu-example">
<label class="col-md-3 control-label"></label>
<div class=" col-md-3 btns">
<div id="picker">选择文件</div>
<!-- <button id="ctlBtn" class="btn btn-default">开始上传</button>-->
</div>
<!--用来存放文件信息-->
<div id="thelist">
<div>
<a href=javascript:;"><?= isset($info['target_user_name']) && $info['target_user_name'] ? $info['target_user_name'] : '' ?></a>
</div>
</div>
<input type="hidden" id="target_common_temp_url" name="target_common_temp_url" />
</div>
<input type="hidden" id="target_user_name" name="target_user_name" value="<?= isset($info['target_user_name']) && !empty($info['target_user_name'])?$info['target_user_name']:''?>"/>
var uploader = WebUploader.create({
swf: '/plugs/webUploader/Uploader.swf',// swf文件路径
server: '/toolcontent/operation_position/popwindow_target_upload',// 文件接收服务端。
pick: '#picker', // 选择文件的按钮。可选。内部根据当前运行是创建,可能是input元素,也可能是flash.
resize: false, // 不压缩image, 默认如果是jpeg,文件上传前会压缩一把再上传!
chunked: true, //是否要分片处理大文件上传
chunkSize:2 * 1024 * 1024, //分片上传,每片2M,默认是5M
auto: true,
chunkRetry : 2, //如果某个分片由于网络问题出错,允许自动重传次数
//runtimeOrder: 'html5,flash',
accept: {
title: '文件',
extensions: 'txt',
mimeTypes: 'text/plain'
}
});
// 当有文件被添加进队列的时候
uploader.on( 'fileQueued', function( file ) {
var $list = $("#thelist");
/*$list.append( '<div id="' + file.id + '" class="item">' +
'<h4 class="info">' + file.name + '</h4>' +
'<p class="state">等待上传...</p>' +
'</div>' );*/
//只显示一个
$list.html( '<div id="' + file.id + '" class="item">' +
'<h4 class="info">' + file.name + '</h4>' +
'<p class="state">等待上传...</p>' +
'</div>' );
});
// 文件上传过程中创建进度条实时显示。
uploader.on( 'uploadProgress', function( file, percentage ) {
var $li = $( '#'+file.id ),
$percent = $li.find('.progress .progress-bar');
// 避免重复创建
if ( !$percent.length ) {
$percent = $('<div class="progress progress-striped active">' +
'<div class="progress-bar" role="progressbar" style="width: 0%">' +
'</div>' +
'</div>').appendTo( $li ).find('.progress-bar');
}
$li.find('p.state').text('上传中');
$percent.css( 'width', percentage * 100 + '%' );
});
//文件上传成功或者失败管理
uploader.on( 'uploadSuccess', function(file,response ) {
console.log(response);
$("#target_common_temp_url").val(response.filePath);
$("#target_user_name").val(response.oldName);
$( '#'+file.id ).find('p.state').text('已上传');
});
uploader.on( 'uploadError', function( file ) {
$( '#'+file.id ).find('p.state').text('上传出错');
});
//文件上传完成
uploader.on( 'uploadComplete', function( file,response ) {
$( '#'+file.id ).find('.progress').fadeOut();
});
主要后端代码:
public function main()
{
$targetDir = '/www/privdata/xxxx/target_user_tmp';//存放分片临时目录
$uploadDir = '/www/privdata/xxxx/target_user';//分片合并存放目录
$cleanupTargetDir = true; // Remove old files
$maxFileAge = 5 * 3600; // Temp file age in seconds
// 创建文件夹
if (!file_exists($targetDir)) {
mkdir($targetDir,0777,true);
}
if (!file_exists($uploadDir)) {
mkdir($uploadDir,0777,true);
}
// 获得文件名称
if (isset($_REQUEST["name"])) {
$fileName = $_REQUEST["name"];
} elseif (!empty($_FILES)) {
$fileName = $_FILES["file"]["name"];
} else {
$fileName = uniqid("file_");
}
$oldName = $fileName;
$fileName = iconv('UTF-8','gb2312',$fileName);
$filePath = $targetDir . DIRECTORY_SEPARATOR . $fileName;
$chunk = isset($_REQUEST["chunk"]) ? intval($_REQUEST["chunk"]) : 0;
$chunks = isset($_REQUEST["chunks"]) ? intval($_REQUEST["chunks"]) : 1;
$response = [
'code' => 0,
'msg' => ''
];
// 移除旧文件
if ($cleanupTargetDir) {
if (!is_dir($targetDir) || !$dir = opendir($targetDir)) {
$response['msg'] = 'Failed to open temp directory111';
echo json_encode($response);exit;
}
while (($file = readdir($dir)) !== false) {
$tmpfilePath = $targetDir . DIRECTORY_SEPARATOR . $file;
// If temp file is current file proceed to the next
if ($tmpfilePath == "{$filePath}_{$chunk}.part" || $tmpfilePath == "{$filePath}_{$chunk}.parttmp") {
continue;
}
// Remove temp file if it is older than the max age and is not the current file
if (preg_match('/\.(part|parttmp)$/', $file) && (filemtime($tmpfilePath) < time() - $maxFileAge)) {
unlink($tmpfilePath);
}
}
closedir($dir);
}
// 打开临时文件
if (!$out = fopen("{$filePath}_{$chunk}.parttmp", "wb")) {
$response['msg'] = 'Failed to open output stream222';
echo json_encode($response);exit;
}
if (!empty($_FILES)) {
if ($_FILES["file"]["error"] || !is_uploaded_file($_FILES["file"]["tmp_name"])) {
$response['msg'] = 'Failed to move uploaded file333';
echo json_encode($response);exit;
}
// Read binary input stream and append it to temp file
if (!$in = fopen($_FILES["file"]["tmp_name"], "rb")) {
$response['msg'] = 'Failed to open input stream444';
echo json_encode($response);exit;
}
} else {
if (!$in = fopen("php://input", "rb")) {
$response['msg'] = 'Failed to open input stream555';
echo json_encode($response);exit;
}
}
while ($buff = fread($in, 4096)) {
fwrite($out, $buff);
}
fclose($out);
fclose($in);
rename("{$filePath}_{$chunk}.parttmp", "{$filePath}_{$chunk}.part");
$done = true;
for( $index = 0; $index < $chunks; $index++ ) {
if ( !file_exists("{$filePath}_{$index}.part") ) {
$done = false;
break;
}
}
if ($done) {
$pathInfo = pathinfo($fileName);
$hashStr = substr(md5($pathInfo['basename']),8,16);
$hashName = time() . $hashStr . '.' .$pathInfo['extension'];
$uploadPath = $uploadDir . DIRECTORY_SEPARATOR .$hashName;
if (!$out = fopen($uploadPath, "wb")) {
$response['msg'] = 'Failed to open output stream';
echo json_encode($response);exit;
}
//flock($hander,LOCK_EX)文件锁
if ( flock($out, LOCK_EX) ) {
for( $index = 0; $index < $chunks; $index++ ) {
if (!$in = fopen("{$filePath}_{$index}.part", "rb")) {
break;
}
while ($buff = fread($in, 4096)) {
fwrite($out, $buff);
}
fclose($in);
unlink("{$filePath}_{$index}.part");
}
flock($out, LOCK_UN);
}
fclose($out);
$response = [
'code' => 1,
'success'=>true,
'oldName'=>$oldName,
'filePath'=>$uploadPath,
// 'fileSize'=>$data['size'],
'fileSuffixes'=>$pathInfo['extension'], //文件后缀名
// 'file_id'=>$data['id'],
];
echo json_encode($response);exit;
}
$response = [
'code' => 1,
'success'=>true,
];
echo json_encode($response);exit;
}
现在大文件上传解决完了,现在提交所有表单到服务端,然后服务端解析txt文件内容
如下代码
function input_file($filename) {
$fp = fopen($filename,'r');//打开文件,如果打开失败,本函数返回 FALSE。
if(!$fp){
return false;
}
/*$data = [];
$i = 0;
while (!feof($fp)) {
if ($i == 0) continue;
$i++;
$line = fgets($fp);
$line = str_replace("\n","",$line);
$data[] = $line;
}
fclose($fp);
return $data;*/
$str = '';
$buffer = 1024 * 1024;//每次读取1024 * 1024字节
while (!feof($fp)) {
$str .= fread($fp, $buffer);
}
$arr = explode("\n", $str);
unset($arr[0]);
fclose($fp);
return $arr;
}
后台解析txt文本压力其实不大,主要是写入redis集合中压力很大。但redis集合可以一次性写入多个value,代码如下
public function setData($data)
{
if ($data) {
//先清空集合
$this->delData();
if (is_array($data)) {
//一次性写1000个
$uidArr = [];
$success = $error = 0;
foreach ($data as $uid) {
if (trim($uid)) {
$uid = trim($uid);
$uidArr[] = $uid;
}
if (count($uidArr) > 1000) {
$res = $this->redis->sAdd($this->key, ...$uidArr);
if ($res) {
$success = $success + count($uidArr);
} else {
$error = $error + count($uidArr);
}
$uidArr = [];
}
}
//剩余的写入
if ($uidArr) {
$res = $this->redis->sAdd($this->key, ...$uidArr);
if ($res) {
$success = $success + count($uidArr);
} else {
$error = $error + count($uidArr);
}
}
return ['success'=>$success, 'error' => $error];
} else {
$this->redis->sAdd($this->key, $data);
}
$this->redis->expire($this->key, self::CACHE_TTL);
}
}
public function delData()
{
$this->redis->del($this->key);
}
可能出现的问题:
1、php 写入大小的限制,修改php.ini文件,设置memory_limit = 2048M,默认是128M(Allowed memory size of 134217728 bytes exhausted (tried to allocate 4096 bytes) in xxxxxxxx on line 209)
2、如果是直接post 提交(本文不存在),则配置php.ini文件:设置post_max_size = 80M,默认才16M
3、nginx 配置:可以自己去查一下
client_max_body_size 100m;//最大上传500M
client_body_buffer_size 100m;//最大上传500M
proxy_read_timeout 300;//该指令设置与代理服务器的读超时时间。它决定了nginx会等待多长时间来获得请求的响应。这个时间不是获得整个response的时间,而是两次reading操作的时间。默认60秒
4、fastcgi配置,可以百度一下这个参数是干嘛的
fastcgi_buffer_size 1024k;
fastcgi_buffers 64 1024k;
改进方法,由于数据太大,redis存储要太久,读取文件数据到数组占用内存太高,所有这边做一个优化:
前台form提交表单,把分片上传的文件path提交到后端。后端只需要用
if (filesize($target_common_temp_url) == 0) {
$this->ajax_error('txt文件没有任何数据!');
}
判断文件内容是否为空,如果不为空,插入redis,用yield和一次性插入多个value。这样降低了内存的使用,也减少了时间,redis代码如下:
public function setDataNew($file)
{
if (is_file($file)) {
$this->delData();//先清空集合
$uidArr = [];
$success = $error = 0;
foreach ($this->getLines($file) as $n => $line) {
if ($n == 0) continue; // 去掉第一行
$uidArr[] = (int)trim($line);
if (count($uidArr) > 20000) {
$res = $this->redis->sAdd($this->key, ...$uidArr);
if ($res) {
$success = $success + count($uidArr);
} else {
$error = $error + count($uidArr);
}
$uidArr = [];
}
}
if ($uidArr) { //剩余的写入
$res = $this->redis->sAdd($this->key, ...$uidArr);
if ($res) {
$success = $success + count($uidArr);
} else {
$error = $error + count($uidArr);
}
}
$this->redis->expire($this->key, self::CACHE_TTL);
return ['success'=>$success, 'error' => $error];
}
}
public function delData()
{
$this->redis->del($this->key);
}
//读取文件
private function getLines($file)
{
$f = fopen($file, 'r');
try {
while ($line = fgets($f)) {
yield $line;
}
} finally {
fclose($f);
}
}
其他可能用得到需要调试的函数:
//用于判断内存的使用
//echo $this->formatBytes(memory_get_peak_usage());
function formatBytes($bytes)
{
if ($bytes < 1024) {
return $bytes . "b";
} else if ($bytes < 1048576) {
return round($bytes / 1024, 2) . "kb";
}
return round($bytes / 1048576, 2) . 'mb';
}
/*
* 十三位时间戳,包含毫秒1535423356248
* https://blog.youkuaiyun.com/tcf_jingfeng/article/details/82143440
*/
function msectime()
{
list($msec, $sec) = explode(' ', microtime());
$msectime = (float)sprintf('%.0f', (floatval($msec) + floatval($sec)) * 1000);
return $msectime;
}