学习ruby,这是学习后写的第一个小程序。
这个程序的目的是抓取google搜索hadoop的前一百条数据,写入数据库,并输出。
有几个要点:
1、连接数据库,向数据库中插入数据
2、抓取google的数据
通过查资料要调用google的API,使用google的API还要先申请API key。
然后要抓取100条数据的话,需要start参数,因为API一次返回默认返回10条,使用start参数就可以指定每次开始抓取数据的条数。
require "mysql"
require "rubygems"require "json"
require "open-uri"
require "net/http"
begin
dbh=Mysql.real_connect("127.0.0.1","root","password","rubydatebase")
puts "Connection Successful"
puts "server version:" + dbh.get_server_info
count=1
num=100
urlstr = "https://www.googleapis.com/customsearch/v1?key=MYKEY&cx=013036536707430787589:_pqjad5hr1a&q=hadoop&alt=json"
while(count < num) do
uri = urlstr + "&start=" + count.to_s
url = URI.parse(uri)
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new(url.request_uri)
response = http.request(request)
json = response.body
out=[]
out=JSON.parse(json)
outData=[]
outData=out["items"]
for value in outData
url=value["link"]
title=value["title"]
context=value["snippet"]
puts("#{count}:","#{url}","#{title}","#{context}")
col2 = dbh.escape_string("#{title}")
col3 = dbh.escape_string("#{context}")
count = count +1
end
end
dbh.close if dbh
end