有种感觉:ruby+postgresql 就是魔法 + 魔法。打算学习一波。
试了。瓶颈在磁盘的情况下,用并行 copy 或者 直接 copy 时间差别不大。
at 30s, row rate 666546.92/sec (period), row rate 666546.92/sec (overall), 2.000000E+07 total rows
at 1m0s, row rate 653333.72/sec (period), row rate 659940.91/sec (overall), 3.960000E+07 total rows
at 1m30s, row rate 530000.43/sec (period), row rate 616630.03/sec (overall), 5.550000E+07 total rows
at 2m0s, row rate 546666.11/sec (period), row rate 599139.82/sec (overall), 7.190000E+07 total rows
at 2m30s, row rate 546659.98/sec (period), row rate 588644.12/sec (overall), 8.830000E+07 total rows
at 3m0s, row rate 530004.70/sec (period), row rate 578871.27/sec (overall), 1.042000E+08 total rows
at 3m30s, row rate 543335.38/sec (period), row rate 573794.86/sec (overall), 1.205000E+08 total rows
at 4m0s, row rate 523331.66/sec (period), row rate 567487.09/sec (overall), 1.362000E+08 total rows
4 分钟插入了一亿多条数据,13 亿差不多也得 40 分钟左右。
但是,关闭日志(建表时 unlogged), 速度提升很大:
at 30s, row rate 1826650.79/sec (period), row rate 1826650.79/sec (overall), 5.480000E+07 total rows
at 1m0s, row rate 2005766.79/sec (period), row rate 1916228.49/sec (overall), 1.150000E+08 total rows
at 1m30s, row rate 1990897.15/sec (period), row rate 1941106.77/sec (overall), 1.747000E+08 total rows
一亿多条数据,只需要 1 分钟左右。
综合看,如果用 HDD,瓶颈大概率在硬盘这里,关闭日志可以提升写性能。原推可能用了比较慢的硬盘。 回头有空,找一台 SSD 的物理服务器测试下。
没注意到。待我一试。
测试环境的物理机,HDD。
同样配置,我又用 postgresql 试了一下。
导入数据用 copy 从 csv 复制,忘记开 \timing,不过大概也是 30-40 分钟。 加索引耗时大概 13 分钟。
postgres=# CREATE INDEX idx ON test (ip);
Time: 781496.452 ms (13:01.496)
是的。磁盘压力比较大。
70521 be/4 mysql 924.43 K/s 280.63 M/s 0.00 % 0.68 % mysqld --daemonize --pid-file=/run/mysqld/mysqld.pid
在 96 核、192G 内存的机器上,Ubuntu 18, mysql-server 5.7,没有调参数优化。13 亿行数据。
Load 到数据库花了大概 1 小时。
mysql> LOAD DATA INFILE '/data/mysql/ii.csv' INTO TABLE `ip` FIELDS TERMINATED BY ',' ;
Query OK, 1280000000 rows affected (1 hour 9 min 55.01 sec)
Records: 1280000000 Deleted: 0 Skipped: 0 Warnings: 0
加索引花了大概 40 分钟:
mysql> alter table `ip` add key `idx_ip` (`ip`);
Query OK, 0 rows affected (43 min 54.54 sec)
Records: 0 Duplicates: 0 Warnings: 0
观察下来,CPU 核心只用到了两个。主要瓶颈还在磁盘 IO 上。