新手问题 谁能把下面这个地址的图片下载下来呀,试了好几种办法都没有实现

fayake · 2015年08月13日 · 最后由 w7938940 回复于 2015年08月13日 · 2884 次阅读

http://img.sootuu.com/pic/gaojindu/shejibonan/zhongguang/yinliao/1/2006051831.jpg 浏览器中是可以直接打开,但在程序中就提示 403 本来以为很简单的事情,没想到弄了一天也没有出来 谁有时间帮看看,感谢!!

require 'typhoeus'

path = "#{Rails.root}/data/"
downloaded_file = File.open(path + "test.jpg", 'wb')
request = Typhoeus::Request.new("http://img.sootuu.com/pic/gaojindu/shejibonan/zhongguang/yinliao/1/2006051831.jpg", followlocation: true)

request.on_body do |chunk|
  downloaded_file.write(chunk)
end
request.run
downloaded_file.close

#1 楼 @meeasyhappy 感谢,我用这个 GEM 可以拿到文件流,我改了下您的保存方式,您看下,真的非常感谢

begin
    img_url = "http://img.sootuu.com/pic/gaojindu/shejibonan/zhongguang/yinliao/1/2006051831.jpg"
    request = Typhoeus.get(img_url)
    img_file = request.response_body { |f| f.read }
    file_name = img_url.split('/').last
#puts file_name
    FileUtils.makedirs(("public/meizi/")) unless Dir.exists?("public/meizi/")
    open("public/meizi/"+file_name, "wb") { |f| f.write(img_file) }
    puts "/public/meizi/"+file_name
  rescue => err
    puts err
    return ''
  end

#1 楼 @meeasyhappy 您知道为什么用下面的这种方式就报 403 错误吗

RestClient.get http://img.sootuu.com/pic/gaojindu/shejibonan/zhongguang/yinliao/1/2006051831.jpg'
MiniMagick::Image.open("http://img.sootuu.com/pic/gaojindu/shejibonan/zhongguang/yinliao/1/2006051831.jpg", "jpg")

#3 楼 @fayake 403 一般是 服务器对图片 进行了 防盗链

这个是 RestClient.get ‘http://img.sootuu.com/pic/gaojindu/shejibonan/zhongguang/yinliao/1/2006051831.jpg' 的 request 的内容 (里面没有 User-Agent) 如下:

@args=
 {:method=>:get,
  :url=>
   "http://img.sootuu.com/pic/gaojindu/shejibonan/zhongguang/yinliao/1/2006051831.jpg",
  :headers=>{}},
@block_response=nil,
@cookies={},
@headers={},
@max_redirects=10,
@method=:get,
@password=nil,
@payload=nil,
@processed_headers=
 {"Accept"=>"*/*; q=0.5, application/xml",
  "Accept-Encoding"=>"gzip, deflate"},
@raw_response=false,
@ssl_opts=
 {:verify_ssl=>1,
  :cert_store=>
   #<OpenSSL::X509::Store:0x007fe633f959b8
    @chain=nil,
    @error=nil,
    @error_string=nil,
    @time=nil,
    @verify_callback=nil>,
  :ciphers=>
   "!aNULL:!eNULL:!EXPORT:!SSLV2:!LOW:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-DSS-AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA:DHE-DSS-AES128-SHA256:DHE-DSS-AES256-SHA256:DHE-DSS-AES128-SHA:DHE-DSS-AES256-SHA:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:ECDHE-ECDSA-RC4-SHA:ECDHE-RSA-RC4-SHA:RC4-SHA:HIGH:+RC4:RC4-MD5"},
@tf=nil,
@url=
 "http://img.sootuu.com/pic/gaojindu/shejibonan/zhongguang/yinliao/1/2006051831.jpg",
@user=nil>

这个是

request = Typhoeus::Request.new("http://img.sootuu.com/pic/gaojindu/shejibonan/zhongguang/yinliao/1/2006051831.jpg", followlocation: true)

的请求里面是有 User-Agent

所以解决方法

private_resource = RestClient::Resource.new 'http://img.sootuu.com/pic/gaojindu/shejibonan/zhongguang/yinliao/1/2006051831.jpg'
private_resource.get "User-Agent"=>"Typhoeus - https://github.com/typhoeus/typhoeus"

当然了 你可以设置其他的 User-Agent

#3 楼 @fayake 网站可能限制了爬虫,带上user_agent参数就可以了

RestClient.get 'http://img.sootuu.com/pic/gaojindu/shejibonan/zhongguang/yinliao/1/2006051831.jpg', user_agent: "Mozilla/5.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/600.1.3 (KHTML, like Gecko) Version/8.0 Mobile/12A4345d Safari/600.1.4"
需要 登录 后方可回复, 如果你还没有账号请 注册新账号