Round-trips are only part of the data insertion problem
When inserting data, it’s not all about reducing the number of round-trips between the application and the database: don’t forget to check the SQL optimizer ‘explain plan’ and tune this too.
As the maintainers of Oracle Database drivers, including for Python and Node.js, my team has an understandable focus on the “client side” of the stack. We concern ourselves with driver features and often treat the “server side” (the database) as something that DBA’s worry about. Our primary performance goal for the new features we add, and first suggestion for apps is to reduce the number of ‘round-trips’.
A round-trip is defined as the travel of a message from the driver to the database and back. Calling each python-oracledb (or your favorite language driver) function, or accessing each attribute, will require zero or more round-trips. For example, inserting a simple row involves sending data to the database and getting a success response back. This is a round-trip. Each round-trip has a cost: the network can be slow; the database listener has to wake up; the database has to do some processing. Reducing round-trips can significantly improve performance and scalability. When fetching query results, we suggest tuning the prefetch and array sizes to reduce round-trips. When inserting records we recommend using ‘array DML’ to do batch inserts; this similarly reduces round-trips.
In the real world, you know it is not this simple. I recently read a StackOverflow post Virtual column index not used in execution plan in Oracle database. It was a wakeup reminder to me that even when implementing best practice batch insertion with python-oracledb's cursor.executemany()
call, the database should not be ignored. The post highlights that benefits can be made by checking the SQL optimizer efficiency. The insertion time dropped from 50,000 seconds to 500.
In some other data loading scenarios you may also wish to explore architecture changes, such as loading all data before indexing, using partitioning, or using multiple connections to load different parts of the data. And don’t forget that SQL*Loader exists, and has many data loading capabilities.