MySQL load data infile: How to handle duplicate keys, column assignments, and input preprocessing

sibbidersdezrext
Aug 19, 2023
7 min read

Note: In the above example, the command assumes that the employee1.txt file located under the database directory. For example, if you are executing the above command in thegeekstuff database, then place the file under: /var/lib/mysql/thegeekstuff/

To upload values from the input record to a specific column in the table, specify the column names during the load data infile as shown below. The last line in the following command has the column names that should be used to upload the records from the input text file.

MySQL load data infile

Download

Give the execute permission to this load-data.sh script, and execute it from the command line, which will load the data automatically to the table. You can also schedule this as cronjob to load the data from the file automatically to the table at a scheduled interval.

To import large amounts of data into MySQL, we can either use the technique for inserting multiple rows (outlined above) or use LOAD DATA INFILE instead. As the name suggests, LOAD DATA INFILE loads the data into a table from a file. The basic syntax of the LOAD DATA INFILE query looks like so:

This statement is very powerful for one simple reason: if our given dataset is vast and we don't even need half of the information in it (say, we only need the values of the first column, but not the second, third, etc.), this statement would conveniently "forget" to load data into those columns and ignore any errors that MySQL might encounter on the way. Talk about saving time.

When working with MySQL and comparing the import time between INSERT INTO and LOAD DATA INFILE statements in MySQL, you are likely to notice that the LOAD DATA INFILE statement loads data into a database way, way faster than INSERT does. The reason why is pretty simple: LOAD DATA INFILE bothers MySQL less than INSERT statements do: for example, the LOAD DATA INFILE statement can skip lines, columns, or, if we wish, load data only into specific columns, skipping all others (refer to the example above.) The INSERT statement offers no such features; although it can easily insert a couple of rows at once, it's ill-equipped to handle data sets more extensive than, say, a million rows. That's not to say that LOAD DATA INFILE is without its flaws either, though; for example, run LOAD DATA INFILE with 100 million or more rows, and you will notice that it will probably slow down after a while: to avoid that, you might want to break up the files you use to load data into the database and instead of loading data with, say, 100 or 200 million rows at once, load it in smaller chunks: have, for example, ten files with 10 million rows each, get your LOAD DATA INFILE statements from a loop and run them one by one. You're done!

When you execute the LOAD DATA INFILE statement, MariaDB Server attempts to read the input file from its own file system. By contrast, when you execute the LOAD DATA LOCAL INFILE statement, the client attempts to read the input file from its file system, and it sends the contents of the input file to the MariaDB Server. This allows you to load files from the client's local file system into the database.

If you load data from a file into a table that already contains data and has a primary key, you may encounter issues where the statement attempts to insert a row with a primary key that already exists. When this happens, the statement fails with Error 1064, protecting the data already on the table. If you want MariaDB to overwrite duplicates, use the REPLACE keyword.

The REPLACE keyword works like the REPLACE statement. Here, the statement attempts to load the data from the file. If the row does not exist, it adds it to the table. If the row contains an existing primary key, it replaces the table data. That is, in the event of a conflict, it assumes the file contains the desired row.

Use the IGNORE keyword when you want to skip any rows that contain a conflicting primary key. Here, the statement attempts to load the data from the file. If the row does not exist, it adds it to the table. If the row contains an existing primary key, it ignores the addition request and moves on to the next. That is, in the event of a conflict, it assumes the table contains the desired row.

The statement interprets all fields in the file as having the same character-set, regardless of the column data type. To properly interpret file contents, you must ensure that it was written with the correct character-set. If you write a data file with mysqldump -T or with the SELECT INTO OUTFILE statement with the mysql client, be sure to use the --default-character-set option, so that the output is written with the desired character-set.

The LOAD DATA INFILE statement supports progress reporting. You may find this useful when dealing with long-running operations. Using another client you can issue a SHOW PROCESSLIST query to check the progress of the data load.

Using mariadb-import/mysqlimport you can compress the file using the --compress option, to get better performance over slow networks, providing both the client and server support the compressed protocol. Use the --local option to load from the local file system.

If the cluster is part of an Aurora global database, set this parameter for each Aurora cluster in the global database. Although only the primary cluster in an Aurora global database can load data, another cluster might be promoted by the failover mechanism and become the primary cluster.

The AWS_LOAD_S3_ACCESS role and LOAD FROM S3 privilege are specific to Amazon Aurora and are not available for MySQL databases or RDS for MySQL DB instances. If you have set up replication between an Aurora DB cluster as the replication master and a MySQL database as the replication client, then the GRANT statement for the role or privilege causes replication to stop with an error. You can safely skip the error to resume replication. To skip the error on an RDS for MySQL DB instance, use the mysql_rds_skip_repl_error procedure. To skip the error on an external MySQL database, use the SET GLOBAL sql_slave_skip_counter statement (Aurora MySQL version 1 and 2) or SET GLOBAL sql_replica_skip_counter statement (Aurora MySQL version 3).

You can use the LOAD DATA FROM S3 statement to load data from any text file format that is supported by the MySQL LOAD DATA INFILE statement, such as text data that is comma-delimited. Compressed files are not supported.

The following statement loads data from an Amazon S3 bucket that is in the same region as the Aurora DB cluster. The statement reads the comma-delimited data in the file customerdata.txt that is in the dbbucket Amazon S3 bucket, and then loads the data into the table store-schema.customer-table.

The following statement loads data from an Amazon S3 bucket that is in a different region from the Aurora DB cluster. The statement reads the comma-delimited data from all files that match the employee-data object prefix in the my-data Amazon S3 bucket in the us-west-2 region, and then loads the data into the employees table.

The only difference is the LOCAL option in the statement. If you load a big CSV file, you will see that with the LOCAL option, it will be a little bit slower to load the file because it takes time to transfer the file to the database server.

One of the most common formats to record and store data is CSV(Comma Separated Values) and Excel files. As the data in these CSV files grows it becomes important to store and maintain this data in a scalable Database Management System like MySQL. This allows organizations to effectively manage and analyze their data from a central repository. To load data from file to table in MySQL, you can either opt for writing SQL statements or use the GUI(Graphical User Interface) of MySQL Workbench, phpMyAdmin, or other data integration tools.

The non-local rule means that the server reads the file named ./product.txt associated with the data directory and the file named product.txt from the database directory of the default database. For instance, if you run the following LOAD DATA statement when FirrstDB is the default database, the server will read the test.txt file from the firstdb database directory, even if the statement explicitly loads the file into a table in the seconddb database.

It is to be noted that the LOCAL works only if both the server and the client are configured to allow it. For instance, if you start mysqld with the local_infile system variable disabled, LOCAL will generate an error.

Apart from the LOAD DATA statement to load data from file to table in MySQL, you can use the INSERT statement to insert new rows into an existing table. You can use the INSERT statement to load data from file to table in MySQL in the following forms:

As shown above, you can directly use NULL instead of the N used in LOAD DATA statements. However, when you have multiple entries to load data from file to table in MySQL, you have to execute several INSERT statements as compared to a single LOAD DATA command. Generally, when you want to load data from a text file, it is suggested to use the LOAD DATA command as it is 20 times faster than the INSERT command.

MySQL is a popular Open-Source Relational Database Management system that allows you to effectively store and manage data using simple SQL commands or via its simple user interface MySQL Workbench. To load data from file to table in MySQL, check out the following three methods:

Hevo Data, an Automated Data Pipeline, provides you a hassle-free solution to easily load data from 100+ data sources like FTP/SFTP & Google Sheets to your desired destination such as MySQL, Data Warehouses, or BI tools in a completely hassle-free & automated manner.

Using Hevo you can easily upload files in formats such as CSV, JSON, and XML to your MySQL Database within minutes with an easy-to-use no-code interface. Hevo is fully managed and completely automates the process of not only loading data from CSV files but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. 2ff7e9595c

Random Musings.

MySQL load data infile: How to handle duplicate keys, column assignments, and input preprocessing

MySQL load data infile

Recent Posts

Comentários