新手问题如何 parse 特定格式的文件，比如 Kindle 的 My Clippings.txt?

toctan · September 01, 2013 · Last by WolfLee replied at September 02, 2013 · 2716 hits

Strange Stones: Dispatches from East and West (P.S.) (Hessler, Peter)
- Your Highlight Location 73-73 | Added on Friday, August 16, 2013 2:55:42 AM

life is more interesting if you can step outside of your own world every once in a while.
==========
Strange Stones: Dispatches from East and West (P.S.) (Hessler, Peter)
- Your Highlight Location 115-116 | Added on Friday, August 16, 2013 3:05:40 AM

The joy of nonfiction is searching for balance between storytelling and reporting, finding a way to be both loquacious and observant.
==========
Strange Stones: Dispatches from East and West (P.S.) (Hessler, Peter)
- Your Highlight Location 1306-1306 | Added on Saturday, August 17, 2013 1:50:31 PM

“Russian women get fat because they don’t care,”

上面是 My Clipping.txt 的一个片段，每段 highlights，以==========分割，包括书籍，作者，位置，时间，内容等信息，我想把这文件 parse 成包含以下结构 Hash 的 Array：

{
  book: 'Strange Stones: Dispatches from East and West',
  author: 'Hessler, Peter',
  location: '1306-1306',
  time: DateTime.parse('Saturday, August 17, 2013 1:50:31 PM'),
  content: 'Russian women get fat because they don’t care.'
}

请大家提供一个思路，谢谢。

kikyous #0 September 01, 2013

String#split('==========')

WolfLee #1 September 02, 2013

def parse(clippings)
  rule = Regexp.new /\n?(?<book>.*) \(P\.S\.\) \((?<author>.*)\)\n.*Location (?<location>\d+-\d+) \|.*on (?<date>.*)\n*(?<content>.*)\n?/
  clippings.split('==========').map do |c|
    m = c.match rule
    {book: m[:book], author: m[:author], location: m[:location], date: DateTime.parse(m[:date]), content: m[:content] }
  end
end

You need to Sign in before reply, if you don't have an account, please Sign up first.

新手问题 如何 parse 特定格式的文件，比如 Kindle 的 My Clippings.txt?

新手问题 如何 parse 特定格式的文件，比如 Kindle 的 My Clippings.txt?

新手问题如何 parse 特定格式的文件，比如 Kindle 的 My Clippings.txt?

新手问题如何 parse 特定格式的文件，比如 Kindle 的 My Clippings.txt?