更新:- fix command line escapement vulnerability with execution of curl binary on https fetches (mohrt)
Snoopy这个php类有如何功能:
1、方便抓取Web内容
2、方便抓取Web文字(去掉HTML代码)
3、方便抓取Web链接
4、支持代理主机
5、支持基本的用户/密码认证模式
6、支持自定义用户agent,referer,cookies和header等信息
7、支持浏览器转向,并能控制转向深度
8、能把网页中的链接扩展成高质量的url(默认)
9、方便提交数据并且获取返回值
10、支持跟踪HTML框架
11、支持再转向的时候传递cookies
一个使用例子:
<?php
include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->fetchtext("http://www.flashgou.com");
echo $snoopy->results;
?>
这样就能获取到目标页面的信息了。
另外比如抓取链接:
<?php
include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->fetchlinks("http://www.flashgou.com");
print_r($snoopy->results);
?>
还可以使用snoopy提交数据实现登陆:
模拟登陆可以用curl或者socket来实现,当curl需要服务器相应的启用curl module,自己socket实现相对比较麻烦,使用snoopy就简单了很多啦。
帮助内容(以下内容可以在README.txt里找到): include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->fetchtext("http://www.php.net/");
print $snoopy->results;
$snoopy->fetchlinks("http://www.phpbuilder.com/");
print $snoopy->results;
$submit_url = "http://lnk.ispi.net/texis/scripts/msearch/netsearch.html";
$submit_vars["q"] = "amiga";
$submit_vars["submit"] = "Search!";
$submit_vars["searchhost"] = "Altavista";
$snoopy->submit($submit_url,$submit_vars);
print $snoopy->results;
$snoopy->maxframes=5;
$snoopy->fetch("http://www.ispi.net/");
echo "<PRE>
";
echo htmlentities($snoopy->results[0]);
echo htmlentities($snoopy->results[1]);
echo htmlentities($snoopy->results[2]);
echo "</PRE>
";
$snoopy->fetchform("http://www.altavista.com");
print $snoopy->results;
描述:
Snoopy是什么东东?
Snoopy 是一个模拟浏览器的PHP类库,通过它可以异地获取、发送内容。
一些Snoopy的功能:(就是顶部的11条)
* easily fetch the contents of a web page
* easily fetch the text from a web page (strip html tags)
* easily fetch the the links from a web page
* supports proxy hosts
* supports basic user/pass authentication
* supports setting user_agent, referer, cookies and header content
* supports browser redirects, and controlled depth of redirects
* expands fetched links to fully qualified URLs (default)
* easily submit form data and retrieve the results
* supports following html frames (added v0.92)
* supports passing cookies on redirects (added v0.92)
运行环境:
Snoopy requires PHP with PCRE (Perl Compatible Regular Expressions),
Snoopy was developed and tested with PHP 3.0.12.(这么旧的PHP版本估计中国也找不到了)
CLASS METHODS:
fetch($URI)
-----------
此方法可以获取页面所有内容,包括html标签。$URI就是目标页面的url地址。 结果可以通过$this->results输出。如果目标包含框架,那么$this->results将以数组方式包含每一个框架的内容。
fetchtext($URI)
---------------
获取目标内容,不包含html标签
fetchform($URI)
---------------
返回目标页里form的内容。如果无法理解,可以多加实践看结果。
fetchlinks($URI)
----------------
获取目标页里的link,实践:$snoopy->fetchlinks("http://www.flashgou.com");print_r($snoopy->results);
submit($URL,$formvars)
----------------------
模拟表单发送,$formvars为表单各项变量(预提交发送的参数)。看实例应该不难理解。
更具体的帮助请看文件目录里的README.txt(英文)
Snoopy更新站点:http://sourceforge.net/projects/snoopy/
下载地址: http://down1.liehuo.net:8045/soft/1101/Snoopy.rar