数据采集类

本文介绍了一个用于网页数据采集的PHP类实现,该类提供了获取URL内容、解析特定数据及正则匹配等功能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 
<?php
//====================================================
//	FileName:class.inc.php
//	Summary: 数据采集类。
//	Author: shenzhe(泽泽时代)
//	CreateTime: 2004-3-30     
//	LastModifed:2004-3-30 
//	Copyright (c)2004 shenzhe163@gmail.com
//      Im : qq:31477177 Msn :shenzhe__@hotmail.com
//====================================================


class collect_cls{

	/**
	*  variables
	*  @access private
	*/
	var $url;
	var $content;
	var $tmp;
	var $tmpflag = true;
	var $result;
	var $flag=true;

	/**
	*  function getUrl
	*  this is a mothod of collect_cls-class
	*  get url
	*  @access public
	*  @param $url  string url of aim
	*/
	function getUrl($url)
	{
		$this->url = $url;
	}

	/**
	*  function getContent
	*  this is a mothod of collect_cls-class
	*  get content
	*  @access public
	*/
	function getContent()
	{
		$this->content = file_get_contents($this->url);
	}

	/**
	*  function returnContent
	*  this is a mothod of collect_cls-class
	*  get tmpcontent
	*  @access public
	*/
	function  returnContent()
	{
		if($this->tmpflag){
			return $this->content;
		}else{
			return $this->tmp;
		}
	}
	/**
	*  function getResult
	*  this is a mothod of collect_cls-class
	*  return result
	*  @access public
	*  @param $head       string  the start matching label
	*  @param $headInt    int     towards right amount
	*  @param $foot       stirng  the end matching lable
	*  @param $footInt    int     torards left amount
	*  @param $tmpContent string    
	*/
	function getResult($head,$headInt,$foot,$footInt,$tmpContent)
	{
		if ($this->flag){
			//if(!$tmpContent) $tmpContent=$this->content;
			$tmpresult = stristr($tmpContent,$head);
			$sint = $headInt+strlen($head);
			$this->tmp = substr($tmpresult,$sint);
			if($this->tmpflag){
				$this->tmpflag = false;
			}
			$eint = strpos($tmpresult,$foot)-$footInt;
			$this->result = substr($tmpresult,$sint,$eint);
			if (strlen($this->result)<1){
				$this->flag=false;
				exit;
			}else{
				return $this->result;
			}
		}
	}

	/**
	*  function resultByReg
	*  this is a mothod of collect_cls-class
	*  return result by reg
	*  @access public
	*  @param $pattern       string  the reg
	*  @param $out           string
	*  @param $prama         int
	*/
	function resultByReg($pattern,$out,$prama)
	{
 		if($prama==1){
			eregi($pattern,$this->content,$out);
		}elseif($prama==2){
			preg_match($pattern,$this->content,$out);
		}else {
			preg_match_all($pattern,$this->content,$out);
		}
		$this->result = $out;
		return $this->result;
	}

	/**
	*  function replace
	*  this is a mothod of collect_cls-class
	*  return replace result 
	*  @access public
	*  @param $oldStr        string  
	*  @param $newStr        string  
	*/
	function replace($oldStr,$newStr)
	{
 		 
		$this->result = str_replace($oldStr,$newStr,$this->result);
		return $this->result;
	}

	/**
	*  function look
	*  this is a mothod of collect_cls-class
	*  look result
	*  @access public
	*  @param $flag       int
	*/
	function look($flag=0)
	{
 		if ($flag==1){
			$value = $this->content;
		}elseif($flag==2){
			$value = $this->tmp;
		}else{
			$value = $this->result;
		}
		echo "<SCRIPT>function runEx(){var winEx2 = window.open(/"/", /"winEx2/", /"width=500,height=300,status=yes,menubar=no,scrollbars=yes,resizable=yes/"); winEx2.document.open(/"text/html/", /"replace/"); winEx2.document.write(unescape(event.srcElement.parentElement.children[0].value)); winEx2.document.close(); }function saveFile(){var win=window.open('','','top=10000,left=10000');win.document.write(document.all.asdf.innerText);win.document.execCommand('SaveAs','','javascript.htm');win.close();}</SCRIPT><center><TEXTAREA id=asdf name=textfield rows=32  wrap=VIRTUAL cols=/"120/">".$value."</TEXTAREA><BR><BR><INPUT name=Button onclick=runEx() type=button value=/"查看效果/">&nbsp;&nbsp;<INPUT name=Button onclick=asdf.select() type=button value=/"全选/">&nbsp;&nbsp;<INPUT name=Button onclick=/"asdf.value=''/" type=button value=/"清空/">&nbsp;&nbsp;<INPUT onclick=saveFile(); type=button value=/"保存代码/"></center>";;
	}
}
?>
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值