Ruby 使用 mechanize 模拟登陆 ruby-china 成功,但是使用 curl 失败了。

flowerwrong · 2014年08月19日 · 最后由 mok 回复于 2015年01月23日 · 11108 次阅读

一.mechanize 代码

require "mechanize"

agent = Mechanize.new
login_page = agent.get "https://ruby-china.org/account/sign_in"
login_form = login_page.forms[1]

username_field = login_form.field_with(:name => "user[login]")
username_field.value = "[email protected]"
password_field = login_form.field_with(:name => "user[password]")
password_field.value = "xxx"

step2_page = agent.submit login_form
tmp_cookie = agent.cookie_jar

agent.cookie_jar = tmp_cookie
index_page = agent.get("https://ruby-china.org/")
re = /^flowerwrong/
puts index_page.body.to_s[0..5000]

请问使用 curl 怎么模拟登陆,我使用的是 curb 库,麻烦看那下问题所在,谢谢。

require 'curb'

USER_NAME = "[email protected]"
PASS = "xxx"

user = {"user\[login\]" => USER_NAME, "user\[password\]" => PASS}
login_url = "https://ruby-china.org/account/sign_in"
c = Curl::Easy.new
c.url = login_url
c.enable_cookies = true
c.cookiejar = "cookies.txt"
c.http_post(c.url, user)

puts c.header_str

http header 打印结果

flowerwrong@flowerwrong:~/dev/ruby/mechanize$ ruby ruby_china.rb 
HTTP/1.1 200 OK
Server: nginx/1.6.0
Date: Tue, 19 Aug 2014 09:13:31 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Strict-Transport-Security: max-age=31536000
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
ETag: "51a9034a6f409ef3e8944e80113415bb"
Cache-Control: max-age=0, private, must-revalidate
X-Request-Id: 4f6f03ef-b598-43dd-9fc2-49d388bb3d5c
X-Runtime: 0.039903

mechanize 是根据页面表单来提交的,里面有一些隐藏表单. 你 curl 模拟登录缺少了了 authenticity_token,要先抓页面的 meta 才能登录

#1 楼 @saiga 我改了下,先 get 取得 token,然后 post,可是还是不行

require 'curb'
require 'nokogiri'

USER_NAME = "[email protected]"
PASS = "xxx"
login_url = "https://ruby-china.org/account/sign_in"

c = Curl::Easy.new
c.url = login_url
c.http_get

doc = Nokogiri::HTML(c.body_str)
csrf_token = doc.xpath("//meta[@name='csrf-token']/@content")[0].value.strip
c.url = login_url
c.enable_cookies = true
c.cookiejar = "cookies.txt"

user = {
  "user\[login\]" => USER_NAME,
  "user\[password\]" => PASS,
  "authenticity_token" => csrf_token,
  "user\[remember_me\]" => 0,
  "utf8" => "✓",
  "commit" => "登陆"
}
c.http_post(c.url,user)

puts csrf_token
puts "---" * 20
puts c.header_str

打印结果

flowerwrong@flowerwrong:~/dev/ruby/mechanize$ ruby ruby_china.rb 
2O85a3ZBPvdTyo3XQNvvvVlg57VLQ/+vhm3ktyIXXbE=
------------------------------------------------------------
HTTP/1.1 200 OK
Server: nginx/1.6.0
Date: Tue, 19 Aug 2014 10:09:45 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Strict-Transport-Security: max-age=31536000
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
ETag: "c889b4d41d8ddd454f2d022cdc8ef457"
Cache-Control: max-age=0, private, must-revalidate
X-Request-Id: ffce7524-a903-4844-943e-0b54b872ccfc
X-Runtime: 0.042324

帮你改了一下可以了...

# -*- coding: utf-8 -*-
require 'curb'
require 'nokogiri'

USER_NAME = "xx"
PASS = "xxx"
login_url = "https://ruby-china.org/account/sign_in"

c = Curl::Easy.new
# cookie 一开始就要开启,auth_token 会记录在cookie的
c.enable_cookies = true
c.cookiejar = "cookies.txt"
c.url = login_url
c.http_get

doc = Nokogiri::HTML(c.body_str)
csrf_token = doc.xpath("//meta[@name='csrf-token']/@content")[0].value.strip


user = {
  "user\[login\]" => USER_NAME,
  "user\[password\]" => PASS,
  "authenticity_token" => csrf_token,
  "user\[remember_me\]" => 0,
  "utf8" => "✓",
  "commit" => "登陆"
}

# 看api......
c.http_post c.url, user.map { |k, v| Curl::PostField.content k, v }

puts csrf_token
puts "---" * 20
puts c.header_str

#3 楼 @saiga 多谢,我也学了一招,顺便分享一下如何在 Windows 环境下安装 curb

补充 watir 版本

# -*- coding: utf-8 -*-
require 'watir-webdriver'
b = Watir::Browser.new

USER_NAME = "[email protected]"
PASS = "xxx"
login_url = "https://ruby-china.org/account/sign_in"



b.goto login_url
b.text_field(:name => 'user[login]').set USER_NAME
b.text_field(:name => 'user[password]').set PASS

b.button(:name => 'commit').click
puts b.text

curl -X POST 'https://ruby-china.org/account/sign_in.json' -d 'username=Tim_Lang&password=password'

#7 楼 @Tim_Lang 如果是 GBK 网页单纯的 crul 怎么处理乱码呢?

==========method=get========== HTTP/1.1 200 OK Date: Thu, 22 Jan 2015 09:55:51 GMT Server: IBM_HTTP_Server Pragma: no-cache Expires: Thu, 01 Jan 1970 00:00:00 GMT Set-Cookie: JSESSIONID=0000nnKUCci1wVHUkTPEdmCSNR7:-1; Path=/ Cache-Control: no-store, no-cache=set-cookie Keep-Alive: timeout=10, max=100 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html; charset=UTF-8 Content-Language: zh-CN ==========method=post========== HTTP/1.1 500 Internal Server Error Date: Thu, 22 Jan 2015 09:55:51 GMT Server: IBM_HTTP_Server Pragma: no-cache Expires: Thu, 01 Jan 1970 00:00:00 GMT $WSEP: Content-Length: 13 Set-Cookie: JSESSIONID=0000ugRJQAmTQ9D1dC56Jswkoei:-1; Path=/ Cache-Control: no-store, no-cache=set-cookie Connection: close Content-Type: text/html;charset=UTF-8 Content-Language: zh-CN

我 get 拿到的 cookies 和 post 拿到的 cookie 不一致 导致 500 错误 如何才能让他们保持一致

如果 我直接用 login_url = "*********" c = Curl::Easy.new c.url = login_url c.http_get puts c.header_str

拿到 response header HTTP/1.1 200 OK Date: Thu, 22 Jan 2015 10:03:14 GMT Server: IBM_HTTP_Server Pragma: no-cache Expires: Thu, 01 Jan 1970 00:00:00 GMT Set-Cookie: JSESSIONID=0000OiEy8TiTGJ0t_ikYEP4ExC6:-1; Path=/ Cache-Control: no-store, no-cache=set-cookie Transfer-Encoding: chunked Content-Type: text/html; charset=UTF-8 Content-Language: zh-CN

显然 比 直接 Curl.get(login_url) 少了下面两个键值 Keep-Alive: timeout=10, max=100 Connection: Keep-Alive

麻烦大神们帮忙解答下,谢谢!!

以上问题解决了 接下来还有一个问题 我请求的页面中又验证码,然后 通过 uri = URI('http://xxx/image.jsp') open("logo.jpg","wb"){|f|f.write(Net::HTTP.get(uri))} 下载到本地并 RTesseract 识别出 图片中的字符串 然后提交的时候 老是提示验证码错误 我肯定的是识别 没有问题 问题是不是就是 cookie 问题 请问如何解决

已经解决了 谢谢

form data 的内容如下 origins:CTU dests:HKG departs:2015-01-30 origins:HKG dests:BKK departs:2015-01-31 origins:BKK dests:HKG departs:2015-02-01 origins:HKG dests:CTU departs:2015-02-02 origins: dests: departs: 我通过 如下方式来处理 表单内容 search_form_data = { 'origins' => 'CTU', 'dests' => 'HKG', 'departs' => '2015-01-30',

'origins' => 'HKG', 'dests' => 'BKK', 'departs' => '2015-01-31',

'origins' => 'BKK', 'dests' => 'HKG', 'departs' => '2015-02-01',

'origins' => 'HKG', 'dests' => 'CTU', 'departs' => '2015-02-02' } puts search_form_data # => {"origins"=>"HKG", "dests"=>"CTU", "departs"=>"2015-02-02"} 请问 这种情况下如何提交? @saiga

需要 登录 后方可回复, 如果你还没有账号请 注册新账号