新手问题 如何 parse 特定格式的文件,比如 Kindle 的 My Clippings.txt?

toctan · 2013年09月01日 · 最后由 WolfLee 回复于 2013年09月02日 · 2707 次阅读
Strange Stones: Dispatches from East and West (P.S.) (Hessler, Peter)
- Your Highlight Location 73-73 | Added on Friday, August 16, 2013 2:55:42 AM

life is more interesting if you can step outside of your own world every once in a while.
==========
Strange Stones: Dispatches from East and West (P.S.) (Hessler, Peter)
- Your Highlight Location 115-116 | Added on Friday, August 16, 2013 3:05:40 AM

The joy of nonfiction is searching for balance between storytelling and reporting, finding a way to be both loquacious and observant.
==========
Strange Stones: Dispatches from East and West (P.S.) (Hessler, Peter)
- Your Highlight Location 1306-1306 | Added on Saturday, August 17, 2013 1:50:31 PM

“Russian women get fat because they don’t care,”

上面是 My Clipping.txt 的一个片段,每段 highlights,以==========分割,包括书籍,作者,位置,时间,内容等信息,我想把这文件 parse 成包含以下结构 Hash 的 Array:

{
  book: 'Strange Stones: Dispatches from East and West',
  author: 'Hessler, Peter',
  location: '1306-1306',
  time: DateTime.parse('Saturday, August 17, 2013 1:50:31 PM'),
  content: 'Russian women get fat because they don’t care.'
}

请大家提供一个思路,谢谢。

String#split('==========')

def parse(clippings)
  rule = Regexp.new /\n?(?<book>.*) \(P\.S\.\) \((?<author>.*)\)\n.*Location (?<location>\d+-\d+) \|.*on (?<date>.*)\n*(?<content>.*)\n?/
  clippings.split('==========').map do |c|
    m = c.match rule
    {book: m[:book], author: m[:author], location: m[:location], date: DateTime.parse(m[:date]), content: m[:content] }
  end
end
需要 登录 后方可回复, 如果你还没有账号请 注册新账号