The base Iconclass vocabularly has been available as Linked Open Data since 2011. This meant that each notation on the site
could also be viewed as RDF in either RDX-XML or as a simple JSON view by appending .rdf
or .json
behind the URI for a notation.
For example 71F2172
This was done in the Python server software to convert the data for a notation dynamically per-request. There was not a "dump" of the
underlying data in a database that showed snippets, but the relatively complex data structure
was being processed and served.
There is a large difference between the "base" number of notations, and the number of notations including "keys"
What are these "keys"? It is essentially mini sub-trees that can be switched on and off at various point in the main hierarchy.
It adds extra digits between brackets, prefixed with a plus sign. For example, at the item 11R1 personifications ~ the life of man note the children at this point in the tree:
json
["11R1(+0)", "11R1(+1)", "11R1(+2)", "11R1(+3)", "11R1(+4)", "11R1(+5)"]
When we for example add the digit (+1) to the notation it becomes personifications ~ the life of man (+ Holy Trinity)
But when we look at some other part in the tree, and we add the digit (+1) the text is modified in a different way, for example, at various places these are the extra items that are added:
notation | text added with a (+1) |
---|---|
11R1 | (+ Holy Trinity) |
25F | (+ animals used symbolically) |
25FF | (+ parts, limbs and organs larger than normal) |
...and the list is actually quite a bit longer:
ID | count of branches |
---|---|
11k | 14 |
12k | 1 |
23k | 61 |
25D11k | 3 |
25D3k | 8 |
25FFk | 334 |
25FFkq | 18 |
25Fk | 204 |
25Gk | 39 |
25Hk | 2 |
25Ik | 2 |
25Kk | 2 |
25Lk | 2 |
2k | 1 |
31A42k | 151 |
31A42kq | 24 |
31A44k | 151 |
31A44kq | 24 |
31A45k | 13 |
31A54k | 4 |
31E23k | 24 |
31F2k | 7 |
31k | 116 |
32Bk | 12 |
34k | 241 |
3k | 1 |
41B3k | 4 |
41Cm | 12 |
41Cn | 11 |
41Dk | 26 |
41k | 8 |
43Ak | 18 |
43B3k | 11 |
43C1k | 71 |
43Ck | 33 |
44Ak | 13 |
44Gk | 55 |
45k | 61 |
46Ck | 274 |
47k | 353 |
48A98k | 112 |
48k | 174 |
49Bk | 21 |
49k | 37 |
4k | 1 |
5k | 39 |
5kq | 6 |
61Ak | 5 |
61Bk | 24 |
6k | 1 |
7k | 14 |
8k | 1 |
91k | 15 |
92k | 15 |
93k | 15 |
96Ak | 15 |
98k | 24 |
9k | 1 |
So when we expand the tree by adding these keys, the counts look like this:
Count | |
---|---|
Base Notations | 39,956 |
All including keys | 1,346,423 |
A significant increase. But many of these extensions are fairly "dumb" - in the sense that it only takes the base notations and add some repetitive modifier texts. So there is huge amount of redundancy in these additional nodes. Can't we just leave them out?
This is tempting to do. Certain services, like here, do that, as it means the dataset is much smaller. To be fair, that was also the download available on the Iconclass website, but expressly saying that the crucial "keys" part is missing, as it makes the data mushroom to quite large sizes.
What do you miss when you leave out the keys?
My favourite examples are these, compare the results for: lion sleeping and lion sleeping
In the first case we do not add the keys, and miss out on the important - more relevant exact notation for that concept: /25F23%28LION%29%28+46%29.json
To not even mentioned that these notations with keys have been used in various databases all over the world in both analog and digital forms over the last 50 years, and we can not magically wave a wand and make them disappear. (even though as a software developer I have often cursed and wished this were the case 😅 )
And now back to the SKOS/SPARQL question:
Or do we devise some form of expressing these "keys" as extra items in the SKOS representation, and place the onus on knowledgebase users to construct the textual labels including the extra key texts by themselves?
...and typing out the question and explanation like this almost answers the question by itsself.