最近在处理一个php脚本,需求是给定一个文件夹,文件夹里面有几千个文件夹,每个文件夹里面的内容是图片加word文档,需要读取他们插入到数据库中。
使用了一个composer包叫phpoffice,可以在composer.json中加入require 执行composer install 安装
"require": { "phpoffice/phpword": "^0.16.0" },
$reader = \PhpOffice\PhpWord\IOFactory::createReader();
if ($reader->canRead($filepath)) {
$word = $reader->load($filepath);
$sections = $word->getSections();
foreach ($sections as $section) {
foreach ($section->getElements() as $element) {
if (get_class($element) === 'PhpOffice\PhpWord\Element\TextRun') {
$str = '';
foreach ($element->getElements() as $text) {
$str .= $text->getText();
}
}
}
}
}
然后因为数据太多,正好在laravel的command中,可以使用多进程去处理,下面是例子代码
$processIds = [];
$count = count($dirs);
//fork 10个子进程
$workers = 10;
$block = (int)ceil($count / $workers);
for ($i = 0; $i < $workers; $i++) {
$left = $block * $i;
$deal = array_slice($dirs, $left, $block);
if ($left < $count) {
$processIds[$i] = pcntl_fork();
switch ($processIds[$i]) {
case -1 :
echo "fork failed : {$i} \r\n";
exit;
case 0 :
// 子进程处理word读取和图片上传
$this->doWork($deal);
exit;
default :
break;
}
} else {
break;
}
}
//子进程完成之后要退出
while (count($processIds) > 0) {
$mypid = pcntl_waitpid(-1, $status, WNOHANG);
foreach ($processIds as $key => $pid) {
if ($mypid == $pid || $mypid == -1) {
unset($processIds[$key]);
}
}
}
PHP多进程必须在php cli模式下才能使用,并且需要pcntl和posix的扩展支持,可以使用php -m 查看是否开启扩展