最近读了《理解 Unix 进程》,感觉挺不错。前两天又把书里的 Spyglass 源码读了下,有所收获,就顺便想动手写写笔记,记录下读代码的感受,加深下理解,顺便推荐下这本书。 Spyglass 是一个非常简单的 preforking 的 Web 服务器,只适用于研究和学习 Unix 进程的工作方式。 进入代码部分,作者有非常明晰的注释,所以阅读起来非常舒畅。大部分的理解(中文部分)在作者的代码都有注释,所以谈不上分析,只是又根据代码理解,并且梳理了本书所学的理解。
bin/spyglass.rb
#!/usr/bin/env ruby # load和require THIS_FILE = File.symlink?(__FILE__) ? File.readlink(__FILE__) : __FILE__ $LOAD_PATH << File.dirname(THIS_FILE) + '/../lib' require 'rubygems' require 'spyglass' require 'optparse' # 命令行解析,社区里有人推荐的gli也不错。 opts = OptionParser.new do |opts| opts.banner = "Usage: spyglass [options]"
opts.separator "" opts.separator "Ruby options:"
lineno = 1 opts.on("-e", "--eval LINE", "evaluate a LINE of code") { |line| eval line, TOPLEVEL_BINDING, "-e", lineno lineno += 1 }
opts.on("-d", "--debug", "set debugging flags (set $DEBUG to true)") { $DEBUG = true }
opts.on("-w", "--warn", "turn warnings on for your script") { $-w = true }
opts.on("-I", "--include PATH", "specify $LOAD_PATH (may be used more than once)") { |path| $LOAD_PATH.unshift(path.split(":")) }
opts.on("-r", "--require LIBRARY", "require the library, before executing your script") { |library| require library }
opts.separator "" opts.separator "Spyglass options:"
opts.on("-p", "--port PORT", "use PORT (default: 4222)") { |port| Spyglass::Config.port port }
opts.on("-o", "--host HOST", "list on HOST (default: 0.0.0.0)") { |host| Spyglass::Config.host host }
opts.on("-c", "--configru FILE", "Load the rackup file at FILE (default: config.ru in current directory)") { |path| Spyglass::Config.config_ru_path path }
opts.on("-w", "--workers COUNT", "Prefork COUNT workers when handling requests (default: 2)") { |count| Spyglass::Config.workers count.to_i }
opts.on("-t", "--timeout SEC", "Time out the master process after SEC seconds (default: 30)") { |sec| Spyglass::Config.timeout sec.to_i }
opts.on("-v", "--verbose", "Enable verbose output") { |verbose| Spyglass::Config.verbose true }
opts.on("--vverbose", "Enable very verbose output") { |vverbose| Spyglass::Config.vverbose true }
opts.on_tail("-h", "--help", "Show this message") do puts opts abort end
# Another typical switch to print the version. opts.on_tail("--version", "Show version") do puts Spyglass::Version exit end end
opts.parse!(ARGV) # 直接调用 Spyglass::Server.instance 单例的 start 方法 Spyglass::Server.instance.start
>lib/spyglass/server.rb
```ruby
#---
# Excerpted from "Working with Unix Processes",
# published by The Pragmatic Bookshelf.
# Copyrights apply to this code. It may not be used to create training material,
# courses, books, articles, and the like. Contact us if you are in doubt.
# We make no guarantees that this code is fit for any purpose.
# Visit http://www.pragmaticprogrammer.com/titles/jsunix for more book information.
#---
module Spyglass
class Server
include Singleton
include Logging
def start
# Opens the main listening socket for the server. Now the server is responsive to
# incoming connections.
sock = TCPServer.open(Config.host, Config.port)
out "Listening on port #{Config.host}:#{Config.port}"
# 调用Lookout单例的start方法
Lookout.instance.start(sock)
end
end
end
lib/spyglass/lookout.rb
#--- # Excerpted from "Working with Unix Processes", # published by The Pragmatic Bookshelf. # Copyrights apply to this code. It may not be used to create training material, # courses, books, articles, and the like. Contact us if you are in doubt. # We make no guarantees that this code is fit for any purpose. # Visit http://www.pragmaticprogrammer.com/titles/jsunix for more book information. #--- module Spyglass class Lookout include Singleton, Logging
# This method is the main entry point for the Lookout class. It takes # a socket object. def start(socket) # 定义捕获信号 trap_signals
# The Lookout doesn't know anything about the app itself, so there's
# no app related setup to do here.
# 不间断的接受 socket connection
loop do
# Accepts a new connection on our socket. This class won't actually
# do anything interesting with this connection, it will pass it down
# to the Master
class created below to do the actual request handling.
conn = socket.accept
out "Received incoming connection"
# In this block the Lookout forks a new process and invokes a Master, # passing along the socket it received and the connection it accepted # above. # fork 出@master_pid,并且在 Master 进程中调用 start 方法 @master_pid = fork do master = Master.new(conn, socket) master.start end
# The Lookout can now close its handle on the client socket. This doesn't # translate to the socket being closed on the clients end because the # forked Master process also has a handle on the same socket. Since this # handle is now cleaned up it's up to the Master process to ensure that # its handle gets cleaned up. # Master 子进程中有该 socket handle,所以 connection 不会关闭,当该 socket 所有子进程 handle 调用关闭后,connection 才会关闭。 conn.close # Now this process blocks until the Master process exits. The Master process # will only exit once traffic is slow enough that it has reached its timeout # without receiving any new connections. # 阻塞式等待 Process.waitpid(@master_pid)
# The interaction of fork(2)/waitpid(2) above deserve some explanation.
# ### Why fork(2)? Why not just spin up the Master? # The whole point of the Lookout process is to be very lean. The only resource # that it initializes is the listening socket for the server. It doesn't load # any of your application into memory, so its resource footprint is very small.
# The reason that it does a fork(2) before invoking the Master is because once # the Master times out we want the Lookout process to remain lean when accepting # the next connection.
# If it were to load the application code without forking # then there would be no (simple) way for it to later unload the application code.
# By doing a fork(2), then waiting for the Master process to exit, that guarantees # that all resources (notably memory usage) that were in use by the Master process # will be reclaimed by the kernel.
# ### Who knows what your app will demand! # While handling requests your app may require lots of memory. Containing this in a # child process, and exiting that process, is the easiest way to ensure that memory # bloat isn't shared with our simple parent process.
# This allows our Lookout process will to go back around # the loop with nothing more than it started with, just a listening socket.
# The fork(2)/waitpid(2) approach requires little code to implement, and pushes # responsibility down to the kernel to track resource usage and nicely clean up # the Master process when it's finished. end end # 当有中断或者退出信号则 kill @aster_pid def trap_signals [:INT, :QUIT].each do |sig| trap(sig) { begin Process.kill(sig, @master_pid) if @master_pid rescue Errno::ESRCH end exit } end end end end
作者在这里对Lookout主进程中fork出Master进程做了几点解释:
1,因为Lookout很简单,只是有一个socket连接用于监听请求,所以forked Master进程共享的内存也很少。
2,另外使用fork原因是,当Master进程time out或者其他原因关闭时,Lookout接受到新请求,可以再次fork出新的Master。
3,如果不实用fork,在通过time out的方式退出Master进程的时候不能利用Unix系统来管理释放application code。
4,因为所有资源调用fork,都是直接调用系统的fork,所以可以确保Master进程退出后,内存可以很好的回收。
5,确保看顾进程可以使用非常少的内存。Master进程处理实际的请求,会消耗比较多内存,退出后系统可以对Master的内存进行回收。
6,比较简单的使Lookout进程作为守护进程。
回忆下守护进程的创建方式,第一次```exit if fork```,退出当前进程,然将fork出的子进程```Process.setsid```,使子进程变成新的进程组和会话组并且脱离终端控制。然后再次使用```exit if fork```,使再次fork出的进程不是进程组组长,也不是会话领导,同时没有控制终端,变成守护进程。1.9以后直接使用```Process.daemon```即可。
>lib/spyglass/master.rb
```ruby
#---
# Excerpted from "Working with Unix Processes",
# published by The Pragmatic Bookshelf.
# Copyrights apply to this code. It may not be used to create training material,
# courses, books, articles, and the like. Contact us if you are in doubt.
# We make no guarantees that this code is fit for any purpose.
# Visit http://www.pragmaticprogrammer.com/titles/jsunix for more book information.
#---
module Spyglass
class Master
include Logging
def initialize(connection, socket)
@connection, @socket = connection, socket
@worker_pids = []
# The Master shares this pipe with each of its worker processes. It
# passes the writable end down to each spawned worker while it listens
# on the readable end. Each worker will write to the pipe each time
# it accepts a new connection. If The Master doesn't get anything on
# the pipe before `Config.timeout` elapses then it kills its workers
# and exits.
# 通过IO.pipe生成一对关联的IO管道。worker子进程共享write管道
@readable_pipe, @writable_pipe = IO.pipe
end
# This method starts the Master. It enters an infinite loop where it creates
# processes to handle web requests and ensures that they stay active. It takes
# a connection as an argument from the Lookout instance. A Master will only
# be started when a connection is received by the Lookout.
def start
trap_signals
load_app
out "Loaded the app"
# The first worker we spawn has to handle the connection that was already
# passed to us.
# fork worker子进程来处理socket
spawn_worker(@connection)
# The Master can now close its handle on the client socket since the
# forked worker also got a handle on the same socket. Since this one
# is now closed it's up to the Worker process to close its handle when
# it's done. At that point the client connection will perceive that
# it's been closed on their end.
@connection.close
# fork其他workers
# We spawn the rest of the workers.
(Config.workers - 1).times { spawn_worker }
out "Spawned #{Config.workers} workers. Babysitting now..."
loop do
if timed_out?(IO.select([@readable_pipe], nil, nil, Config.timeout))
out "Timed out after #{Config.timeout} s. Exiting."
kill_workers(:QUIT)
exit
else
# Clear the data on the pipe so it doesn't appear to be readable
# next time around the loop.
@readable_pipe.read_nonblock 1
end
end
end
def timed_out?(select_result)
!select_result
end
def spawn_worker(connection = nil)
@worker_pids << fork { Worker.new(@socket, @app, @writable_pipe, connection).start }
end
def trap_signals
# The QUIT signal triggers a graceful shutdown. The master shuts down
# immediately and lets each worker finish the request they are currently
# processing.
# Master进程退出时,将worker子进程退出。
trap(:QUIT) do
verbose "Received QUIT"
kill_workers(:QUIT)
exit
end
# worker子进程退出时
trap(:CHLD) do
# 阻塞式等待退出子进程
dead_worker = Process.wait
# 从列表中移除
@worker_pids.delete(dead_worker)
# 非阻塞的遍历等待其他worker_pid
@worker_pids.each do |wpid|
begin
dead_worker = Process.waitpid(wpid, Process::WNOHANG)
@worker_pids.delete(dead_worker)
rescue Errno::ECHILD
end
end
spawn_worker
end
end
def kill_workers(sig)
@worker_pids.each do |wpid|
Process.kill(sig, wpid)
end
end
def load_app
@app, options = Rack::Builder.parse_file(Config.config_ru_path)
end
end
end
lib/spyglass/worker.rb
#--- # Excerpted from "Working with Unix Processes", # published by The Pragmatic Bookshelf. # Copyrights apply to this code. It may not be used to create training material, # courses, books, articles, and the like. Contact us if you are in doubt. # We make no guarantees that this code is fit for any purpose. # Visit http://www.pragmaticprogrammer.com/titles/jsunix for more book information. #--- require 'time' require 'rack/utils'
# module Spyglass class Worker include Logging
def initialize(socket, app, writable_pipe, connection = nil) @socket, @app, @writable_pipe = socket, app, writable_pipe @parser = Spyglass::HttpParser.new
handle_connection(connection) if connection end
def start trap_signals
loop do handle_connection @socket.accept end end
def handle_connection(conn)
verbose "Received connection"
# This notifies our Master that we have received a connection, expiring
# it's IO.select
and preventing it from timing out.
@writable_pipe.write_nonblock('.')
# This clears any state that the http parser has lying around # from the last connection that was handled. @parser.reset
# The Rack spec requires that 'rack.input' be encoded as ASCII-8BIT. empty_body = '' empty_body.encode!(Encoding::ASCII_8BIT) if empty_body.respond_to?(:encode!)
# The Rack spec requires that the env contain certain keys before being # passed to the app. These are the keys that aren't provided by each # incoming request, server-specific stuff. env = { 'rack.input' => StringIO.new(empty_body), 'rack.multithread' => false, 'rack.multiprocess' => true, 'rack.run_once' => false, 'rack.errors' => STDERR, 'rack.version' => [1, 0] }
# This reads data in from the client connection. We'll read up to # 10000 bytes at the moment. data = conn.readpartial(10000) # Here we pass the data and the env into the http parser. It parses # the raw http request data and updates the env with all of the data # it can withdraw. @parser.execute(env, data, 0)
# Call the Rack app, goes all the way down the rabbit hole and back again. status, headers, body = @app.call(env)
# These are the default headers we always include in a response. We # only speak HTTP 1.1 and we always close the client connection. At # the monment keepalive is not supported. head = "HTTP/1.1 #{status}\r\n" \ "Date: #{Time.now.httpdate}\r\n" \ "Status: #{Rack::Utils::HTTP_STATUS_CODES[status]}\r\n" \ "Connection: close\r\n"
headers.each do |k,v| head << "#{k}: #{v}\r\n" end conn.write "#{head}\r\n"
body.each { |chunk| conn.write chunk } body.close if body.respond_to?(:close) # Since keepalive is not supported we can close the client connection # immediately after writing the body. conn.close
verbose "Closed connection" end
def trap_signals trap(:QUIT) do out "Received QUIT" exit end end end end