Dynamic-Sql Act III

All RDb queries are structured in a strict syntax. This is a simple axiom which can and will be exploited by our experiment on the most fundamental level: building the tree. The elementary fact is that whatever comes next must be related to something which has preceded it. It is this relationship which we will use to link and unlink our tree.

Using our previous example, we can already see how rapidly a RDb can gain entities and relationships as correctly storing information can require many additional entities. For our “intelligent SQL constructor,” the system will need to weed through the myriad of tables to get to find the fastest path from start to finish.

Now suppose we expand our example yet again and include a second path between Baby and Deceased: “A Baby can be born and soon after become Deceased.” This new relationship throws a monkey wrench in some interfaces, as the most direct path from Baby to Deceased is now a single hop. For some users, that we may abstract the fact a Person will become Deceased can lead to erroneous expectations either from the developer or the user; in either event, this is a discrepancy which will cost patrons and development costs in the end.

So how can we expose this fact? My personal inclination is to be explicit is possible. By describing to users what you, as the developer, were intending, harmful reactions should realistically be cut to a minimum. To do this, the system will need to be mathematically aware of how the RDb is structured and what the relationships imply. As proposed in the exposition, knowledge of existing information, its meaning, and a relation to other algorithms should help us greatly.

The first tool we’ll need is a mathematical expectation of how to go from Baby to Deceased. Because of the wonders of modern medicine, users expect that life for a Baby does not cease until having consumed a significant time on Earth. From the data-modeling standpoint, this is to trace the relationship from Baby to Deceased through the Person entity. However, some UIs may not account for this and choose the direct, 1-hop, relationship.

From a user’s standpoint, this is misleading. “I’m calculating social security checks for the elderly; why do the results include a dozen records which are extraneous?” The system’s presumption that it must follow the most direct path from Baby to Deceased seems flawed, though – for the criteria – its data may be perfectly correct. Shouldn’t we instead initially propose to follow the path with the highest probability of success?

In such an example, based on our knowledge of the use of the system, we estimate that there are ten times the number of records for paths between Baby to Deceased using the Person table as without. Taking this into account, we almost assuredly are retrieving the right result set for social security calculations. Then, for the remaining ten percent, we offer the option to “trim” the Person entity from our query.

Additionally, what the system hasn’t exposed here is the semantic relationship the surmised from the entities provided. Perhaps the results were appropriate for what was provided, just not what was expected. This is the second tool we must produce: a means of exposing to the user our presumption of the semantic meaning of their query. This is, admittedly, less for our system to act upon than to influence what the user will input next, or perhaps withdraw and try again.

By now it might sound familiar to problems faced in graph theory. At the moment, there are two that I can think of which aptly fit into this scenario: maximal flow and minimal cut. Both are incredibly annoying problems with surprisingly low-tech solutions. They also both make use of our mathematical model tool we prescribed earlier. An improvement to this theory which deserves its own article will be proposed at the end of the article.

Luckily, there are means to do all of these things when given a well thought-out framework. Semantic ties can be stored within the framework and the mathematical model can be derived quickly from the data via a few SQL queries. (I already anticipate the cries of outrage over additional SQL queries, but those drawbacks will be addressed in the conclusion; so please, save your flames until the end of the series.)


About this entry