分享 12306 验证码简单识别

debbbbie · December 22, 2013 · Last by blackanger replied at December 22, 2013 · 9065 hits

12306 的图形验证码非常简单，简单的程序即可识别。

1-1

利用 photoshop 的色调分离技术，即可得到辨识度很高的图片。色调分离很容易实现，比如当参数为 3 时，会把图片红蓝绿每个通道简化成 3 个颜色值，并最终形成 3 x 3 = 9 个颜色值。

1-2

然后配上简单的优化，即把没有上下相邻都没有颜色的点排除掉。 1-3

将此图片交给 Tesseract-OCR，10% 的识别率有木有。

10% 的合格率已经合格了哟，配上代理服务器，还有 gem useragents。。。

附色调分离的代码实现（python 版）

def color_sep(img, count=3):

    min, max = [127, 127, 127], [128, 128, 128]
    w, h = img.size

    pix = img.load()
    for x in xrange(w):
        for y in xrange(h):
            if max[0] < pix[x, y][0]: max[0] = pix[x, y][0]
            if max[1] < pix[x, y][1]: max[1] = pix[x, y][1]
            if max[2] < pix[x, y][2]: max[2] = pix[x, y][2]

            if min[0] > pix[x, y][0]: min[0] = pix[x, y][0]
            if min[1] > pix[x, y][1]: min[1] = pix[x, y][1]
            if min[2] > pix[x, y][2]: min[2] = pix[x, y][2]

    deep_color   = [(max[0] - min[0]) * 1.0 / count, (max[1] - min[1]) * 1.0 / count, (max[2] - min[2]) * 1.0 / count]
    target_color = [255.0 / (count - 1), 255.0 / (count - 1), 255.0 / (count - 1)]

    for x in xrange(w):
        for y in xrange(h):
            pix[x, y] = ( int(int((pix[x, y][0] - min[0]) / deep_color[0]) * target_color[0]), \
                    int(int((pix[x, y][1] - min[1]) / deep_color[1]) * target_color[1]), \
                    int(int((pix[x, y][2] - min[2]) / deep_color[2]) * target_color[2]) )