Large Blog image

Golang Multi-Threaded CSV Importer

Lightning-Fast CSV Import with Golang: Unlocking Database Performance

Listen to the Blog and Follow Along with the Transcript

Hi Go developers,

Have you ever faced the challenge of importing massive amounts of CSV data into a database and wished there was a way to do it at blazing speeds? Let’s dive into how we can leverage Golang’s concurrency capabilities and database-specific techniques to achieve just that.

In this post, I’ll walk you through a CSV importer I’ve written in Go that works seamlessly with multiple database engines, including SQLite, MySQL, PostgreSQL, and Firebird (formerly known as Interbase).

While the fundamental goal is the same—importing data quickly—the approach varies depending on the database engine’s unique characteristics. Here's a breakdown:


Database-Specific Considerations

SQLite

SQLite supports multiple concurrent connections to the same database file, but its concurrency is limited by its locking mode and the nature of the operations. Due to these constraints, we avoid concurrent writing in our implementation when working with SQLite to ensure data integrity and optimal performance.

MySQL and PostgreSQL

Both of these engines shine when it comes to high-speed imports. They support all the techniques we’ll discuss, including transactions, batch inserts, and concurrent connections. With their mature ecosystems and robust performance, they’re ideal for this use case.

Firebird SQL

Firebird (Interbase) presents a unique challenge: it doesn’t support batch-level SQL insert statements. This limitation makes it less performant compared to the other engines. However, we’ll still maximize its capabilities within the constraints it imposes.


Techniques to Boost Import Speed

To make the CSV import process lightning fast, we’ll utilize the following techniques:

  1. Database TransactionsTransactions group multiple operations into a single, atomic unit of work. This reduces the overhead associated with committing each individual operation and ensures data consistency in case of failures.

  2. Batch Inserts (except for Firebird SQL)Instead of inserting records one by one, batch inserts allow us to send multiple rows in a single statement. This drastically reduces the number of database calls, improving throughput.

  3. Concurrent ConnectionsBy leveraging Go’s powerful concurrency model, we can utilize multiple connections to insert data simultaneously. This approach is especially effective for engines like MySQL and PostgreSQL, which handle concurrent writes efficiently.


Measuring the Import Speed

Let’s measure the results of the techniques we used. The test was conducted on the following system:

  • Operating System: Ubuntu Linux
  • Processor: Intel® Core™ i7-3770S @ 3.10GHz
  • Storage: SSD

The dataset consists of a customer database with 2 million rows. Here's how the performance stacked up for each database engine:


SQLite

Import Time:

Batch InsertTransactionsMultiple ConnectionsImport Speed
OFFOFFOFF120 minutes 49 seconds
ONOFFOFF1 minutes 52 seconds
ONONOFF0 minutes 44 seconds
ONONINVALID(invalid)

Improvement 7249 seconds -> 44 seconds: 164.75 times faster


MySQL

Import Time:

Batch InsertTransactionsMultiple ConnectionsImport Speed
OFFOFFOFF117 minutes 26 seconds
ONOFFOFF1 minutes 39 seconds
ONONOFF1 minutes 20 seconds
ONONON0 minutes 53 seconds

Improvement 7046 seconds -> 53 seconds: **133 times faster


PostgreSQL

Import Time:

Batch InsertTransactionsMultiple ConnectionsImport Speed
OFFOFFOFF50 minutes 50 seconds
ONOFFOFF1 minutes 38 seconds
ONONOFF1 minutes 25 seconds
ONONON0 minutes 36 seconds

Improvement 3050 seconds -> 36 seconds: **84.7 times faster


Firebird SQL

Import Time:

Batch InsertTransactionsMultiple ConnectionsImport Speed
INVALIDOFFOFF116 minutes 14 seconds
INVALIDONOFF35 minutes 54 seconds
INVALIDONON4 minutes 19 seconds

Improvement 6974 seconds -> 259 seconds: **26.9 times faster

General Summary

Positives:

  • Biggest improvement: SQLite
  • Best-performing database (before optimization): PostgreSQL
  • Best-performing database (after optimization): PostgreSQL

Negatives:

  • Smallest improvement: Firebird SQL
  • Lowest-performing database (before optimization): SQLite (Note: Minimal difference observed with MySQL; results may vary with different measurements.)
  • Lowest-performing database (after optimization): Firebird SQL

Overall Verdict:

The winner is PostgreSQL, while the loser is Firebird SQL.


Conclusion

This demonstration highlights how a combination of techniques—transactions, batch inserts, and concurrency—can significantly improve data import speeds across various database engines.

We’ve seen that by using these techniques, we can reduce the import time from over 2 hours just 44 seconds—a truly remarkable improvement.

For more details on how to install this Go command-line utility, please visit my GitHub repository. You’ll also find the source code there, so feel free to explore and see how these techniques are implemented.

GitHub repository

Reference data: detailed measurements from console:

SqLite

no transaction no batch insert
Analyzing CSV...
Found 12 fields
Row count:2000000

1 Connection opened
Importing: 100% Active threads: [ ] 
Done
0 transactions committed
1 connections closed

Full Analysis time: 0 minutes 19 seconds
Full duration time: 120 minutes 30 seconds
Total: 120 minutes 49 seconds

-------

no transaction, batch insert

Analyzing CSV...
Found 12 fields
Row count:2000000

Running in batch insert mode
Batch size is 100
1 Connection opened
Importing: 100% Active threads: [ ] 
Done
0 transactions committed
1 connections closed

Full Analysis time: 0 minutes 18 seconds
Full duration time: 1 minutes 33 seconds
Total: 1 minutes 52 seconds

----

Transaction + Batch insert:

Analyzing CSV...
Found 12 fields
Row count:2000000

Running in transactional mode
Running in batch insert mode
Batch size is 100
1 Connection opened
1 Transaction started
Importing: 100% Active threads: [ ] 
Done
1 transactions committed
1 connections closed

Full Analysis time: 0 minutes 14 seconds
Full duration time: 0 minutes 30 seconds
Total: 0 minutes 44 seconds

MYSQL

Batch OFF, Transaction OFF, Multi Connection OFF
Analyzing CSV...
Found 12 fields
Row count:2000000

1 Connection opened
Importing: 100% Active threads: [ ] 
Done
0 transactions committed
1 connections closed

Full Analysis time: 0 minutes 18 seconds
Full duration time: 117 minutes 8 seconds
Total: 117 minutes 26 seconds

----

Batch ON, Transaction OFF, Multi Connection OFF
Analyzing CSV...
Found 12 fields
Row count:2000000

Running in batch insert mode
Batch size is 500
1 Connection opened
Importing: 100% Active threads: [ ] 
Done
0 transactions committed
1 connections closed

Full Analysis time: 0 minutes 19 seconds
Full duration time: 1 minutes 19 seconds
Total: 1 minutes 39 seconds

----

Batch ON, Transaction ON, Multi Connection OFF

Analyzing CSV...
Found 12 fields
Row count:2000000

Running in transactional mode
Running in batch insert mode
Batch size is 500
1 Connection opened
1 Transaction started
Importing: 100% Active threads: [ ] 
Done
1 transactions committed
1 connections closed

Full Analysis time: 0 minutes 18 seconds
Full duration time: 1 minutes 1 seconds
Total: 1 minutes 20 seconds

---

Batch ON, Transaction ON, Multi Connection ON

Analyzing CSV...
Found 12 fields
Row count:2000000

Running in transactional mode
Running in multiple threads mode
Running in batch insert mode
Batch size is 500
25 Connection opened
25 Transaction started
Importing: 100% Active threads: [OOOOOOOOOOOOOOOOOOOOOOOOO] 
Done
25 transactions committed
25 connections closed

Full Analysis time: 0 minutes 14 seconds
Full duration time: 0 minutes 38 seconds
Total: 0 minutes 53 seconds

-----
Postgres:

Batch OFF, Transaction OFF, Multi Connection OFF

Analyzing CSV...
Found 12 fields
Row count:2000000

1 Connection opened
Importing: 100% Active threads: [ ] 
Done
0 transactions committed
1 connections closed

Full Analysis time: 0 minutes 18 seconds
Full duration time: 50 minutes 50 seconds
Total: 51 minutes 9 seconds

----

Batch ON, Transaction OFF, Multi Connection OFF

Analyzing CSV...
Found 12 fields
Row count:2000000

Running in batch insert mode
Batch size is 500
1 Connection opened
Importing: 100% Active threads: [ ] 
Done
0 transactions committed
1 connections closed

Full Analysis time: 0 minutes 16 seconds
Full duration time: 1 minutes 21 seconds
Total: 1 minutes 38 seconds

---

Batch ON, Transaction ON, Multi Connection OFF
Analyzing CSV...
Found 12 fields
Row count:2000000

Running in transactional mode
Running in batch insert mode
Batch size is 500
1 Connection opened
1 Transaction started
Importing: 100% Active threads: [ ] 
Done
1 transactions committed
1 connections closed

Full Analysis time: 0 minutes 19 seconds
Full duration time: 1 minutes 5 seconds
Total: 1 minutes 25 seconds

---

Batch ON, Transaction ON, Multi Connection OFF
Analyzing CSV...
Found 12 fields
Row count:2000000

Running in transactional mode
Running in multiple threads mode
Running in batch insert mode
Batch size is 500
25 Connection opened
25 Transaction started
Importing: 100% Active threads: [OOOOOOOOOOOO ] 
Done
25 transactions committed
25 connections closed

Full Analysis time: 0 minutes 19 seconds
Full duration time: 0 minutes 17 seconds
Total: 0 minutes 36 seconds


---
Firebird:
Batch Invalid for this type, Transaction OFF, Multi Connection OFF

Analyzing CSV...
Found 12 fields
Row count:2000000

1 Connection opened
Importing: 100% Active threads: [ ] 
Done
0 transactions committed
1 connections closed

Full Analysis time: 0 minutes 19 seconds
Full duration time: 115 minutes 54 seconds
Total: 116 minutes 14 seconds

---

Batch Invalid for this type, Transaction ON, Multi Connection OFF
Analyzing CSV...
Found 12 fields
Row count:2000000

Running in transactional mode
1 Connection opened
1 Transaction started
Importing: 100% Active threads: [ ] 
Done
1 transactions committed
1 connections closed

Full Analysis time: 0 minutes 19 seconds
Full duration time: 35 minutes 35 seconds
Total: 35 minutes 54 seconds

--- 

Batch Invalid for this type, Transaction ON, Multi Connection opened
Analyzing CSV...
Found 12 fields
Row count:2000000

Running in transactional mode
Running in multiple threads mode
25 Connection opened
25 Transaction started
Importing: 100% Active threads: [OOOOOOOOOOO OOOOOOOOOOOOO] 
Done
25 transactions committed
25 connections closed

Full Analysis time: 0 minutes 20 seconds
Full duration time: 3 minutes 59 seconds
Total: 4 minutes 19 seconds