Relative Performance of A-L vs. PKFK vs. JT

The Aggregate-Link (A-L) representation employs two structure relations. The more conventional Primary Key / Foreign Key (PKFK) representation uses no structure relations and the Junction Table (JT) uses one. Intuitively, one would expect the PKFK representation to be fastest, followed by JT and then A-L. In a brief study using a database modeled after the Wisconsin Database, the performance penalty for using the A-L representation is shown to be quite modest.

This web page is the summary of a series of performance experiments. For more detail, see Performance Detail.

Figure 1. PKFK, JT and A-L Relationship Representations.

With respect to performance, the 2-way, 3-way and 4-way joins of the PKFK, JT and A-L structures (see Figure 1) intuitively suggest a corresponding ordering of the response times to retrieve all Parent-Child entity pairs from each representation. To test this hypothesis and obtain some gauge of relative performance, a simple performance test was undertaken.

Using the database from the Wisconsin Benchmark as a guide, a database was designed and populated so that equivalent PKFK, JT and A-L structures could be constructed and queried and their response times compared. The test platform was a widely used commercial RDBM server (not identified here for legal reasons) running on a laptop. Although a laptop is not appropriate for production database applications, the test platform should nonetheless demonstrate any discernable differences in relative performance. No additional indexes were added to improve performance and the serverís SQL optimizing compiler was allowed to choose all execution plans without directives ("hints"). (NOTE: Every relation automatically has a primary key index, and unless the relation is clustered on some other attribute, the RDBMS will cluster the relation on the primary key. When we say "No additional indexes", we mean in addition to this primary key index.) Tests were run first with no limit on system buffer cache memory, then repeated with a 5Mb limit, the minimum amount the RDB Server would allow. Test-to-test bias was removed by rebooting the platform between tests. This voided any disk pages cached by the first test, making them unavailable for the subsequent test. To minimize resource competition, no additional applications were running during the tests.

All tests were based on a 21Mb Parent relation tenk containing 10,000 entity tuples of 2100 bytes and a 210Mb Child relation hundredk containing 100,000 entity tuples, also 2100 bytes each. Using knowledge of the distribution of attribute values, tenk and hundredk were joined on different attribute pairs to retrieve 1-to-M and M-to-M relationships with predictable numbers and ratios of parent-child entity pairs. For 1-to-M relationships, the ratios were 2:50,000 (that is, 2 parent entities with 50,000 child entities each), 20:5000, 100:1000 and 1000:100. For M-to-M relationships, the ratios were 100:1000, 1000:100, 5000:20 and 10,000:10. A total of 100,000 tuple pairs were related within each relationship. The PKFK, JT and A-L representations were built and queried for the four 1-to-M relationships, but only JT and A-L were compared for the four M-to-M relationships. For each relationship, equivalent SQL queries were defined for retrieving all related tenk-hundredk tuple pairs and run five times each. The best and worst elapsed times were discarded and the three remaining times averaged. (See Performance Detail for a more detailed description.)

Table 1 shows the averages of the average elapsed times over these four 1-to-M and four M-to-M relationships. Elapsed times are in seconds.

Table 1. PKFK, JT and A-L Average Response Times (seconds)
CardinalityRepresentationno bcm limit5Mb bcm limit

For 1-to-M relationships, PKFK performed slightly faster than both JT and A-L when buffer cache memory (bcm) was not limited. With bcm limited to 5Mb, the differences were greater. Compared to PKFK, JT incurred a 4% penalty with no bcm limit and 25% with bcm limited to 5Mb. A-L incurred a 11% penalty with no bcm limit and 27% with bcm limited to 5Mb. With no bcm limit for M-to-M relationships, the A-L structure incurred a 4% penalty compared to JT. With bcm limited to 5Mb, the penalty for using A-L was 1% compared to JT.

This limited performance study was not intended to draw sweeping conclusion about the performance of the Aggregate-Link structure compared to more conventional methods. Updates weren't even addressed. But while the A-L method does appear to incur a performance penalty, the magnitude of that penalty is far less than might be expected based only on the number of joins. The very small tuple size in the Aggregate and Link structure relations implies a high ratio of tuples per disk page. As a consequence, the additional steps in the query execution plan for input of the A-L structure relations adds little disk access time to the overall totals compared to the disk access times for the Parent and Child entity relations.

It should be noted that the developers of the test system's query optimizer had never seen the Aggregate-Link representation before so would have no way of exploiting it's unusual characteristics (like the tiny tuple size). Further, the query optimizer appears so biased in favor of PKFK that it treats the JT representation as if it were two PKFKs and the A-L representation as if it were three.

There are obvious and interesting optimization problems associated with the Aggregate-Link representation. We would appreciate hearing from anyone venturing into this area.

Page Content first published: November 13, 2007
Page Content last updated: November 13, 2007