Unit AccumulateInsert; { This unit makes it easy to build up SQL statements that use the extended insert syntax. The MySQL manual suggests that this can improve performance. Early testing showed that this could approximately double the performance. I was inserting alerts into the alerts table and doing it as fast as I could. This was old hardware. I was using INSERT DELAYED, so some data was copied to a buffer, but that buffer was finite, so eventually the program that was sending the data would stall. More recent testing showed a performance boost of 165 times. I'm not certain what the difference was. We are now using InnoDB tables, rather than MyISAM tables. This is the candles_5m table, which has less data per row than the alerts table. This is much faster hardware. In particular this is a RAID 0 (which is the best for write speed) of 2 15k SCSI drives. The original measurements came from an IDE drive, possibly before I got the DMA settings right. These were replace statements, all of which deleted one row and added a new row. (It appears that the speedup is slightly greater when the replace does not have to do the delete.) This test was based on real work that we were doing. It was not a hypothetical test. Some databases would spend hours trying to process our overnight update. The disk would be on continuously. (One CPU spent 100% of it's time in a wait state.) This was devistating to the replication setup as all other changes had to wait for these changes. The exact time seemed to vary from one machine to the next. But the improvement appears to be big in all cases. } Interface Type TAccumulateInsert = Class Private FCommon : String; FAccumulated : String; FAccumulatedSize : Integer; Public // Store the common part, such as "INSERT IGNORE INTO my_table(field1, field2)" // or "REPLACE INTO mytable". Constructor Create(Common : String); // Add new data to be inserted, such as "(now(), 3)" Procedure Add(ToInsert : String); // If Empty is true there is no data to insert. Do no try to call Get // if this is empty. You CAN call get if and only if Empty returns // false. Function Empty : Boolean; // If Full is true you should call Get as soon as possible. If Full // is true, that implies that Empty is false. Function Full : Boolean; // This returns a full SQL statement. It also clears the buffer so that // Empty will return true and the next call to Get will not return the // same data. Function Get : String; End; Implementation Uses Math; Constructor TAccumulateInsert.Create(Common : String); Begin FCommon := Common End; Procedure TAccumulateInsert.Add(ToInsert : String); Var NewMinSize, I : Integer; Begin { It would be nice if strings had a capacity argument, like lists. The last time I checked, Delphi had an n-squared problem when processing code like this! So I have to do it myself. } NewMinSize := Succ(Length(ToInsert)) + FAccumulatedSize; If NewMinSize > Length(FAccumulated) Then SetLength(FAccumulated, Max(FAccumulatedSize * 2, NewMinSize)); If FAccumulatedSize > 0 Then Begin Inc(FAccumulatedSize); FAccumulated[FAccumulatedSize] := ',' End; For I := 1 To Length(ToInsert) Do Begin Inc(FAccumulatedSize); FAccumulated[FAccumulatedSize] := ToInsert[I] End End; Function TAccumulateInsert.Empty : Boolean; Begin Result := FAccumulatedSize = 0 End; Function TAccumulateInsert.Full : Boolean; Begin // We dump the query as soon as it gets above 1 meg. All of the database // servers are using the default value of max_allowed_packet which is // 16 meg. A long time ago the default was 1 meg, so this would have // caused problems. Result := FAccumulatedSize >= 1048576 End; Function TAccumulateInsert.Get : String; Begin Assert(Not Empty); Result := FCommon + ' VALUES ' + Copy(FAccumulated, 1, FAccumulatedSize); FAccumulatedSize := 0 End; { I used the following notes in my performance testing: mysqldump --no-create-info --skip-extended-insert --skip-add-locks --skip-add-drop-table -u root -p -h dom --where="end_time > '2006-08-01' and symbol like 'A%'" mydb candles_5m > candles_a.sql mysqldump --no-create-info --skip-add-locks --skip-add-drop-table -u root -p -h dom --where="end_time > '2006-08-01' and symbol like 'A%'" mydb candles_5m > candles_b.sql date +%s.%N;mysql -u root -p -h dom mydb < candles_r.sql;date +%s.%N [phil@guru /tmp]$ date +%s.%N ; mysql -u root -pxxx123 -h dom mydb < candles_r.sql ; date +%s.%N 1154748723.555193000 1154749744.198615000 [phil@guru /tmp]$ (echo 'begin;';cat candles_r.sql ; echo 'commit;') > candles_rt.sql 1020.643422 seconds, 17 minutes and 0.6 seconds all defaults innodb_log_buffer_size = 8M, innodb_buffer_pool_size = 14G 1034.184375 seconds innodb_log_buffer_size = 32M 1013.893772 seconds innodb_buffer_pool_size = 2G 1013.045068 seconds put it all into a transaction. 32.054542 seconds insert ignore. 6.178972 seconds replace with extended insert syntax 3.430884 seconds insert ignore with extended insert syntax 899.614726 seconds, 15 minutes put it all between lock and unlock After running these tests I changed the BIOS settings on the RAID card on a slow server. I switched from write-through to write-back, allowing the controler to do more with its cache. That made the writes over 100 times as fast when we did them one at a time. I haven't tried the combined query to see how much faster it is. } End.