function geturl($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
//在需要用户检测的网页里需要增加下面两行
//curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
//curl_setopt($ch, CURLOPT_USERPWD, US_NAME.”:”.US_PWD);
$contents = curl_exec($ch);
curl_close($ch);
$contents = str_replace("document.write('","",$contents);
$contents = str_replace("');","",$contents);
$contents = str_replace("\\n","",$contents);
$contents = str_replace("\\","",$contents);
echo $contents;
}
转载于:https://www.cnblogs.com/xuehuai/archive/2011/08/01/2124131.html
本文介绍了一种使用PHP和cURL进行网页内容抓取的方法。通过设置cURL选项如CURLOPT_URL、CURLOPT_RETURNTRANSFER及CURLOPT_CONNECTTIMEOUT等,实现了高效地获取目标网页的内容。此外,还展示了如何通过字符串替换清理抓取到的数据。
2199

被折叠的 条评论
为什么被折叠?



