require 'nokogiri'
require 'rest-client'
def print_title url
begin
res = RestClient.get url
doc = Nokogiri::HTML res.body
doc.css('.title a').each do |i|
puts i.attributes['title'].value
end
puts '-' * 100
rescue
puts "invalid url"
end
end
puts 'Please input url, and enter "e" to exit.'
while 'e' != url = gets.chomp!
print_title url
end
比如去爬https://ruby-china.org/topics?page=60,感觉浏览器访问的时候被过滤了一些数据....
爬到的数据应该比浏览器访问的多,导致分页页数大了以后,顶部出现了浏览器访问到的 59 页的东西