| |
The adjacency list method
is difficult only in that recursion is used to read the indent/level
information from the list.
Some would propose a more 'elegant' solution where the levels, in a way,
are fully laid out and available for a SQL query without resort to recursion;
or at least with resort to flat looping/iteration, rather than recursion.
Author Joe Celko proposed the idea of - nested sets - and where he originally
referred to his index structure as a "modified pre-order traversal tree" (the
'pre-order traversal tree' being something of a construct in theory).
He imagined something like a shoelace (say, a string, even with 'knots' on it) being threaded
though the 'eyelets' on either side of a category.
It would snake down through subcategories, be 'threaded'
back out, and continue down through the list above, and down
to subcats, again, in the same way.
So there is an 'eyelet' to the left of a category, and one to the right;
a pair of index values uniquely identifying each entry in the outline,
and reflecting its order in a list, to boot.
In other words, the complete hierarchy is sort of 'flattened', laid out to view from any
given category, and each category on its own level, its own list, is sorted in the order one chooses.
The 'string' does all that, and holds everything together.
A
a__
| b__
| e
| f__
| k
| g
| c__
| h
| i
| d
|
|
|
B
| Category | | Down | | Up |
| a__ | | 0 | | 19 |
| | b__ | | 1 | | 10 |
| | e | | 2 | | 3 |
| | f__ | | 4 | | 7 |
| | k | | 5 | | 6 |
| | g | | 8 | | 9 |
| | c__ | | 11 | | 16 |
| | h | | 12 | | 13 |
| | i | | 14 | | 15 |
| | d | | 17 | | 18 |
|
You can see an example full outline on the left, Table A.
Table B shows the index values, for each category, that reflect
the outline structure, and the sort order in each list, and which also are
used for retrieval.
Clearly the 'string' is 'threaded' down, increasing by one each time, on the 'down' or left index
until there are no more subcategories.
So here you go, a, b, e, and there are no more subcats at e.
Then it is brought back up to the previous level, then down that list, following
subcats, as before, until you finally work back out to the last category
in the outline.
So e is the bottom of that sequence, move over to Up and then resume at the
category following e.
You can note that this creates an interval under each node,
which takes in not merely the immediate sublist, but all
sublists under that, as well.
The node, b, for example, of (1,10) contains the list below, and the keys for the list below that (which
is only the node, k).
But there is an obvious problem, here.
Any time a category is deleted or added, the entire
set of indices, or at least one of each pair, must be
recalculated for - each item - in the entire hierarchy.
That's fine for small outlines. But as the data grows, so
does the time it takes to update the table.
It's a 'volatile', and constantly changing key structure, in other words.
That is, it's not a bad method if changes to the outline are infrequent.
But changes, most any change, could potentially locked things up on the server
if the lists became very long.
| Tropashko Nested Intervals |
Celko suggested a fix for the problem might be to widen the step when running
down and back out. Instead of adding one each time, one could add 10 or 100, and so on.
Then you would not have to always recalculate the entire set of indices.
It's dirty. But it actually could work, and reduce the frequency of complete index recalculation.
Author and Oracle whiz, Vadim Tropashko,
noted that you could still bunch up in a particular list and that this
'widening' only put off a potential problem.
He suggested, instead, a more flexible - (fractional) nested interval, using integer fractions (rational
numbers - technically, any integer divided by any non-zero natural number), and binary exponentiation.
And, as above, this method of 'tagging' the categories would allow for a fast, and non-recursive,
retrieval of any and all information from the entire tree, without the 'volatile' maintenance
overhead of nested sets.
| Category | |
Path | | Numer | | Denom |
| a__ | |
.1 | | 3 | | 2 |
| | b__ | |
.1.1 | | 7 | | 4 |
| | e | |
.1.1.1 | | 15 | | 8 |
| | f__ | |
.1.1.2 | | 27 | | 16 |
| | k | |
.1.1.2.1 | | 55 | | 32 |
| | g | |
.1.1.3 | | 51 | | 32 |
| | c__ | |
.1.2 | | 11 | | 8 |
| | h | |
.1.2.1 | | 23 | | 16 |
| | i | |
.1.2.2 | | 43 | | 32 |
| | d | |
.1.3 | | 19 | | 16 |
|
For reference,
the path corresponding to some or any category
is specified with integers, separated by periods, as shown (you could use any sort of separator, though).
The denominator is obviously just a binary shift, 2 to the power of x,
for the sum of the digits in the 'path'.
For the numerator, the same is done, except that at each step
of the way, a further amount is added or subtracted, depending.
At each level (l, if you like), the value x at that level (between the periods),
becomes the exponent for 2, left to right:
| Denominator | (2 ^ x) * running total for denominator |
| Numerator | (2 ^ x) * running total for numerator - ((2 ^ x) - 3) |
Both numerator and denominator are 'seeded', or started, at 1.
So with "1.1.2", for example, the denominator is 2 to the sum of the digits, in this case - 2 ^ 4.
As for the numerator, the first "1" gives 3 (2-(-1)), the second is 3 shifted plus 1 (-(-1)), for 7,
and the "2" give that 7 multiplied by (2^2), or 4, and then one (4-3) is subtracted.
However, it should
be noted that there's obviously some potential problems with this method, as well.
First, as with nested sets, a lot of overhead is involved in moving things around.
Perhaps not every category key has to be updated.
But all subcat and subcat under that, and so on, recursively, keys have to
be modified, if things get moved around.
It would be unavoidable with such methods (then again, the sort field in the adjacency list
had to be updated, too).
Second, and more importantly, the binary shifting doesn't really use a lot of numbers
out of the numbers available to the system before it crashes into an overflow condition.
It's a geometric progression, that at a certain point, explodes into gigantic
floating point numbers, even before overflow is reached.
| Tropashko Triangular Navigation |
The designer suggested an alternative, instead.
Combine the 'numer' and 'denom' calculations into a single integer key.
| Category | |
Path | | N (key) |
| a__ | |
.1 | | 1 |
| | b__ | |
.1.1 | | 2 |
| | e | |
.1.1.1 | | 4 |
| | f__ | |
.1.1.2 | | 9 |
| | k | |
.1.1.2.1 | | 18 |
| | g | |
.1.1.3 | | 19 |
| | c__ | |
.1.2 | | 5 |
| | h | |
.1.2.1 | | 10 |
| | i | |
.1.2.2 | | 21 |
| | d | |
.1.3 | | 11 |
|
One property is fairly apparent.
If the category is the first in its list, that is
if the last digit in the Path is, 1, then a) the key is not only an even number,
but b) the key for the superior category is exactly half.
Those in the same list are different.
You can see that N for the next in the list
is double that for the category above, plus 1.
So where ".1.1.1" is, 4, the next item down, ".1.1.2", is double that, plus 1, or 9.
But the first item in the list just below, ".1.1.2" is twice 9, or 18.
Conversely to find the 'parent', or superior category, for any given
(sub)category, one would first subtract 1 and then divide by 2.
Then the result would be checked to see if it's an even number.
If not, then repeat the process.
But if it is an even number, then divide by 2, once more, and the
result will be the key for the superior category.
It turns out there's a simple transform of the previous method, into this one:
The problems of before remain, however.
While it's true that neither denominator nor numerator are calculated,
the same calculation is performed on this single integer key.
The same doubling is performed just going down a list, at the same level.
And 500, 1000, or more categories at the same level isn't necessarily
unreasonable.
It seems reasonable to conclude that the desire to 'flatten',
to make unique keys for every outline category,
involves a lot of maintenance overhead, no matter how you slice it,
for just any addition or deletion to the tree.
In addition, just trying to explicitly specify both the level and list order,
the structure and the sort,
in some fashion, can introduce new problems.
The advantage, on the other hand, of the adjacency method
is that lists of subcategories are 'hidden', isolated from all but the immediately
superior category.
One doesn't need to know the structure to add or delete items, except right at
the affected categories.
There's no overall list to maintain, and no need to update every subcat key, each time.
The trade-off for all that greatly reduced cost
of time in list maintenance overhead, is that perhaps the time to query an adjacency
list could be improved were it either, or simultaneously, represented by one of the
schemes, briefly outlined, above.
CONTINUE
|