• 有种感觉:ruby+postgresql 就是魔法 + 魔法。打算学习一波。

  • 试了。瓶颈在磁盘的情况下,用并行 copy 或者 直接 copy 时间差别不大。

    at 30s, row rate 666546.92/sec (period), row rate 666546.92/sec (overall), 2.000000E+07 total rows
    at 1m0s, row rate 653333.72/sec (period), row rate 659940.91/sec (overall), 3.960000E+07 total rows
    at 1m30s, row rate 530000.43/sec (period), row rate 616630.03/sec (overall), 5.550000E+07 total rows
    at 2m0s, row rate 546666.11/sec (period), row rate 599139.82/sec (overall), 7.190000E+07 total rows
    at 2m30s, row rate 546659.98/sec (period), row rate 588644.12/sec (overall), 8.830000E+07 total rows
    at 3m0s, row rate 530004.70/sec (period), row rate 578871.27/sec (overall), 1.042000E+08 total rows
    at 3m30s, row rate 543335.38/sec (period), row rate 573794.86/sec (overall), 1.205000E+08 total rows
    at 4m0s, row rate 523331.66/sec (period), row rate 567487.09/sec (overall), 1.362000E+08 total rows
    

    4 分钟插入了一亿多条数据,13 亿差不多也得 40 分钟左右。

    但是,关闭日志(建表时 unlogged), 速度提升很大:

    at 30s, row rate 1826650.79/sec (period), row rate 1826650.79/sec (overall), 5.480000E+07 total rows
    at 1m0s, row rate 2005766.79/sec (period), row rate 1916228.49/sec (overall), 1.150000E+08 total rows
    at 1m30s, row rate 1990897.15/sec (period), row rate 1941106.77/sec (overall), 1.747000E+08 total rows
    

    一亿多条数据,只需要 1 分钟左右。

    综合看,如果用 HDD,瓶颈大概率在硬盘这里,关闭日志可以提升写性能。原推可能用了比较慢的硬盘。 回头有空,找一台 SSD 的物理服务器测试下。

  • 没注意到。待我一试。

  • 测试环境的物理机,HDD。

    同样配置,我又用 postgresql 试了一下。

    • create table 没加 unlogged
    • cerate index 的 workers 设置到 32 个。

    导入数据用 copy 从 csv 复制,忘记开 \timing,不过大概也是 30-40 分钟。 加索引耗时大概 13 分钟。

    postgres=# CREATE INDEX idx ON test (ip);
    
    Time: 781496.452 ms (13:01.496)
    
  • 是的。磁盘压力比较大。

    70521 be/4 mysql     924.43 K/s  280.63 M/s  0.00 %  0.68 % mysqld --daemonize --pid-file=/run/mysqld/mysqld.pid
    
  • 在 96 核、192G 内存的机器上,Ubuntu 18, mysql-server 5.7,没有调参数优化。13 亿行数据。

    Load 到数据库花了大概 1 小时。

    mysql> LOAD DATA INFILE '/data/mysql/ii.csv' INTO TABLE `ip` FIELDS TERMINATED BY ',' ;
    Query OK, 1280000000 rows affected (1 hour 9 min 55.01 sec)
    Records: 1280000000  Deleted: 0  Skipped: 0  Warnings: 0
    

    加索引花了大概 40 分钟:

    mysql> alter table `ip` add key `idx_ip` (`ip`);
    Query OK, 0 rows affected (43 min 54.54 sec)
    Records: 0  Duplicates: 0  Warnings: 0
    

    观察下来,CPU 核心只用到了两个。主要瓶颈还在磁盘 IO 上。