Always Use Connection Pools — and How

Christopher Jones
11 min readMar 7, 2022

--

This is why you should always use a connection pool, even if your application is small or doesn’t get much load. The conclusions and tips apply to many Oracle Database APIs, including those for Node.js, Python, and Go, since they can all take advantage of the Oracle Call Interface (OCI) Session Pool from the Oracle Client libraries, or have ‘thin’ modes with similar driver pooling behaviors.

Recently I’ve been helping an internal team with the Node.js part of their Oracle Cloud service. The Node.js node-oracledb component, an internal web service, was created a few years back for a different project that was infrequently invoked.

The team I’m helping runs their Node.js web service in a cluster of four processes with a different load profile — many more users — than the original team. This web service has been successfully running in production for some time. However, a few weeks back, they saw a connection load spike resulting in cascading failures as work backed up. That is when they contacted me and gave a quick briefing about the symptoms.

Allocate Worker Threads for Node.js

My first thought was a Node.js thread shortage. You may recall that Node.js has four background worker threads by default (not to be confused with the new user space worker_threads module). Each node-oracledb 5.x connection needs, and holds onto, a thread when it does any database work. If database work takes some time, no one else will get a thread to do their own work. [Update: with the newer node-oracledb 6, this is still true in ‘Thick’ mode, but is not true in the default ‘Thin’ mode.] When we checked, the number of connections in each web service process exceeded the number of threads available.

Unfortunately, this wasn’t the sole cause. The team increased the number of threads by bumping UV_THREADPOOL_SIZE and ran a quick load test. This test still exhibited symptoms of potential concern.

Use Connection Pooling

My other main recommendation had been to use a connection pool since pools provide scalability and resiliency. Node-oracledb has a pooling API.

Image showing two application processes, each with a session pool of connections with lines to a database tier. Each line has a server process. Some connections are show active, and some inactive. All are present and using resources, even when not in use.

The diagram above shows how connection pools provide applications with pre-created connections and database server processes. The connections remain open when they are doing database work (the boxes with gears) and also when the application is not currently using them (the white boxes). Pools provide a readily available resource for applications, removing the need for re-authentication and establishment of a database server process for each application request.

Application connection pools are useful in applications that:

  • Have many user requests that do some database work and then release the connection for reuse.
  • Or use a minimal number of connections (even just 1) that are infrequently used.

Oracle connection pools support Oracle Database features like:

  • Fast Application Notification (FAN)
  • Runtime Load Balancing (RLB)
  • Application Continuity (AC)
  • Database Resident Connection Pooling (DRCP)

Along with these great features, there are also simple benefits. When pool.getConnection() is called in node-oracledb, the pool checks that the network socket of the connection about to be returned to the application hasn’t been killed. If it has been destroyed, then the pool cleans up, and a different connection is returned to the application. The pool.getConnection() call also initiates internal ‘ping’ calls if connections have been idle for some time. These heavier calls go all the way to the database to make sure the database session hasn’t been killed by a resource profile, firewall, or a zealous DBA. If it has been killed, the pool similarly tidies up and returns a different connection to the application.

Luckily the original development team had actually implemented connection pooling in the web service, and a configuration switch enabled it. Running a load test showed no problem. Yay! Problem fixed.

Choose the Best Connection Pool Size

With connection pooling enabled and able to cope with some load, now the task was to find the optimal sizing. Although I have been strongly recommending a fixed size based on the Oracle Real-World Performance team guidelines, the web service team is still evaluating using a dynamic pool. The service has an expected load spike during an extended warm-up phase. The database is shared with other applications. So the team thinks they want the number of connections in the pool to start high-ish and then shrink to a steady-state. They continued, and are continuing, testing to validate this goal.

Application instrumentation is always helpful. With node-oracledb connection pools, gathering the pool statistics at regular intervals or on-demand lets you check connections are being closed (i.e., released back to the pool for someone else to use) and lets you see whether the pool size is correct. The team’s web service logs fortunately included connection pool statistics.

The team ran some load tests, and the first thing they noticed was that the connection pool didn’t shrink to the configured minimum size when the web service was left idle. This is because they were using Oracle Instant Client 19c. In versions before 21c, there is a Catch-22 about pool shrinkage. In the older versions, pool shrinkage is initiated when the pool is accessed. I.e., the pool actually has to be active for idle connections to be closed. This isn’t an issue for most people since their pools always have some activity, and a steady state with minimal connections is readily attained. However, if you have an application with an extended idle period, nothing triggers the connection clean-up. I know some customers with this scenario overnight will send infrequent dummy connection requests to initiate pool clean-up. Luckily, with the Oracle Client 21c libraries, the clean-up will be started by a timer in the background, so that simple workaround is no longer needed.

Release Unused Connections

With another load test from the team, the pool statistics showed that a large number of connections went to the pool queue. I.e., the app code requested connections, but none were immediately available. This meant the pool was too small. The pool statistics also showed that the length of time those waiting ‘get connection’ requests spent in the queue was very high. This second behavior concerned me.

It turns out that the application does a quick query to the database but then does a non-database-related call that can take some time. The database connection is only closed after that call finishes. So although the application was using connections from a pool, I might argue that it wasn’t really using pooling. Holding onto connections unnecessarily isn’t optimal. Overall, more connections are needed, meaning more database resources get allocated. This design hadn’t been a problem for the light load experienced by the original development team, so it wasn’t identified as an issue. The current team is going to modify the code base to close connections earlier.

Don’t Terminate “Idle” Connections

In a further load test, the team noticed that the number of connections in the pool shrank below the configured minimum. Sadly the database is configured to kill database sessions after 5 minutes of idle time. This would only be visible to most application users as a minuscule, relative slowdown since connection pool features automatically reconnect when pool.getConnection() is called. The pool also then grows as needed according to the configured pool increment value. But reconnection is obviously unnecessary overhead. The killing of sessions in the database also doesn’t leave an available set of connections ready to handle a user load spike. Another Oracle Client 21c enhancement plays a part here: restoring the configured minimum number of pooled connections now can happen in the background. With earlier versions, a number of pool.getConnection() calls might be needed for pool re-establishment. Overall I strongly recommend that databases and firewalls should not kill idle connections, or at least should have a very long timeout. I’ve asked the team to remove the idle timeout in the database.

Conclusion

The web service’s use of connections is still undergoing tuning. Still, I’m confident the main issues are identified, that connection pooling is already beneficial, and that the other planned changes will further improve the application’s resiliency.

Along the way I’ve given the team some extra tips. These are listed below along with some general advice to improve availability and reliability that you may find helpful. The node-oracledb documentation has more detail.

You may be interested in the technical brief Application Programming Using Pooling and Caching.

SOME BEST PRACTICES FOR ORACLE DATABASE CONNECTION POOLING

  • With node-oracledb 5.5 and earlier (or when using node-oracledb 6’s Thick mode), make sure UV_THREADPOOL_SIZE is set appropriately. It should be bigger than the maximum number of database connections opened by the Node.js process. The environment variable should be set before the Node.js process starts.
  • Use a connection pool even if you only have a few connections. Although you may not really need a pool now, your app usage may change over time. Connection pools also support high availability features, both big and small. These are beneficial to almost all applications.
  • Don’t hold onto connections when slow, non-database work is being done. Release them back to the pool as soon as possible.
  • Stop DBA’s and firewalls closing ‘idle’ connections. The connection pool is there to do work. You don’t want the overhead of re-creating connections (authentication, opening sockets, allocating session memory etc). You don’t want connection storms when lots of connections have to be re-created all at once.
  • In general, a fixed connection pool size is strongly recommended.
  • Update node-oracledb to the latest version:
    - Back in version 5.2 we did some work to reduce thread usage if the application tried to run multiple database tasks on a single connection, such as might be initiated by a JavaScript parallel.all() call. The new behavior of 5.2 helps to avoid thread starvation if an app does a parallel request like this either now or in the future.
    - More directly useful to the web service scenario, node-oracledb 5.2 introduced a pool.reconfigure() function allowing the pool size to be changed at runtime without needing to restart the web service. This is very useful if the pool size doesn’t suit whatever user load eventuates. This function can also be used to enable or disable the gathering of pool statistics. Note that the Node.js thread pool size can’t be changed after the thread pool starts, so don’t increase the maximum number of connections beyond the number of available threads.
    - We also improved ‘dead connection detection’ in node-oracledb 5.2, allowing better clean up and recovery when connections become unusable, such as from network dropouts. Applications can check for the one error message, DPI-1080, to know if a connection has become unusable.
  • Use a connection pool session callback if connections need state set such as NLS settings. Session callbacks can reduce database load by eliminating round-trips to the database that otherwise might be required after each pool.getConnection() call.
  • Implement application tracing and monitor node-oracledb connection pool statistics.
  • Log pool statistics via a timer or generate them on-demand. Do some log post-processing to analyze pool usage and queueing.
  • Implement application administrator alerts if many connections get queued, or if queueTimeout or queueMax errors are seen. The administrator can then take action to prevent system overload.
  • Set a pool.close() drainTime to gracefully allow running database statements to finish during an application restart.
  • For reasons I am still waiting to hear about for the web service, each Node.js process actually begins two connection pools. Possibly one pool would have been better, but I don’t know the original team’s design rationale. If there are concerns about not sharing session state between two different uses of a pool in an application, this can be solved by using a single pool with session tagging (or by connecting as different database users).
  • Set a connectionClass attribute. The web service is running on a small cluster of four nodes, and I’m confident that the number of connections in each node’s pools won’t overload the database. In fact, I believe the pool sizes can be further reduced when the application logic is changed to close connections after database work is complete. However, if the cluster grows, or connection pools were bigger, or if there are a huge number of connections from other applications all resulting in the database hardware running into memory issues, then there is always the possibility of using Database Resident Connection Pooling (DRCP). As shown in the diagram below, DRCP allows the database server processes that handle connections in the database to be shared as needed across all application processes, including when the application is scaled over multiple machines. Only active application connections need to use a server process. The idle application connections do not need to hold onto database resources, but will be allocated a database server process when they need to do database work. DRCP is most efficient when used in conjunction with application connection pools but also can benefit standalone connections. To use DRCP the application tier needs a simple connection string change and the setting of a connectionClass attribute. Setting that attribute now makes it easier to use DRCP in the future. (Update: see my new post Oracle Database DRCP and Node.js and the technical brief Extreme Oracle Database Connection Scalability with Database Resident Connection Pooling (DRCP)).
  • Keep Oracle Client libraries updated. Oracle Client 21c has a couple of nice enhancements for connection pools.
  • Set the ‘client identifier’ and other end-to-end tracing attributes so database behaviors can be correlated with the application code and the application users. Although the database work done by the web service is mostly a single module, and the queries are tightly related to the ‘users’, setting the clientId attribute would still be useful for tracing. And who knows what future application growth will occur or if code will be reused by other projects?
  • Enable FAN on the database service and turn on ‘events’ mode when the pool is created. This will let the database send notifications so unusable connections in the node-oracledb connection pool can be proactively closed. FAN also allows connection pools to get Runtime Load Balancing (RLB) events so the pools can balance work across the nodes of a RAC cluster.
  • Consider setting SQLNET.OUTBOUND_CONNECT_TIMEOUT or TCP.CONNECT_TIMEOUT in the application sqlnet.ora file. These allow the initial creation of connections across to the database to be canceled if there is a delay, such as a network failure. If one of these options is not set, the application may not get a connection establishment failure error until TCP times out, which may be minutes or hours.
  • Use oracledb.callTimeout to limit the time taken for any executing SQL statement.
  • Consider setting EXPIRE_TIME on the application tier to aid with dead connection detection and stop firewalls from killing idle connections. This wasn’t an issue for the web service, but I know some other customer scenarios where it helps. The real solution is to stop the firewall from killing connections, but that isn’t always possible. Set the value to half of the time that the firewall terminates idle connections.
    - With 18c client libraries, it can be added as (EXPIRE_TIME=n) to the DESCRIPTION section of a connect descriptor
    - With 19c client libraries, it can also be used via Easy Connect: host/service?expire_time=n.
    - With 21c client libraries, it can also be used in an application sqlnet.ora configuration file.
  • In the initial scenario I looked at, Oracle Net connection rate limits may not have helped because the app would still be opening lots of connections, and Node.js would still have had a thread shortage. However, in some applications, limits may be useful to prevent connection storms from overloading the database. The database listener.ora RATE_LIMIT or QUEUESIZE settings can be used.
  • Check errors after every SQL execution and take appropriate recovery. Failures can happen at any time, so even if a pool.getConnection() call returns a usable connection, then a problem may occur and prevent you using that connection. With recent versions of node-oracledb, the error DPI-1080 will be thrown if a SQL statement fails because of a connection issue. DPI-1010 will be thrown if you attempt to use a connection that is already known to be unusable. Application-specific recovery can then be programmed.
  • Enable Application Continuity or Transparent Application Continuity so application work can continue uninterrupted even if there is a database node outage. The technical brief on maximum availability has some good content.

Join the Slack discussion about node-oracledb here.

For more Oracle Developer content, visit https://developer.oracle.com/

--

--

Christopher Jones

Oracle Database Product Manager for language drivers including Python python-oracledb, Node.js node-oracledb, PHP OCI8 and more! On Mastodon: @cjbj@phpc.social