Quantcast
Viewing latest article 15
Browse Latest Browse All 204

[data-shapes] consider computational complexity (#321)

VladimirAlexiev has just created a new issue for https://github.com/w3c/data-shapes:

== consider computational complexity ==
It would be nice for each new feature (or more realistically a bundle of features, i.e. Profile) to have some idea about its implementation and execution complexity.

Consider this scenario:
- a database of 1, 10 or 100B triples (data at rest), which are assumed valid (eg parts have been validated, parts are from a trusted valid source)
- a transaction of 1, 10 or 100M triples (data in motion)
- a shapes graph of 1 or 10k shapes. Shapes refer to both data in motion, and also data at rest.

How do you validate this scenario in a reasonable time?
That's a difficult question. 
- Many implementations use in-memory models ("give me data files, give me shape files, I load them in memory and output a validation report"). 
  - I think all JS implementations are like this. Which is not efficient in the large-data scenario described above (@bergos , I'd love it if you prove me wrong!)
- SPARQL opens up a "door" towards "unknown" complexity. At least I'm not aware of work on assessing the practical complexity of SPARQL queries
  - If you need to run say 1M queries coming from `SPARQLConstraint` (part of SHACL 1.0 Core), that will ruin all efficiency
  - So much so that, a good advice is https://github.com/Sveino/Inst4CIM-KG/tree/develop/shacl-improved#use-complex-sparqltarget-but-simple-sparqlconstraint : to put the complex query in `SPARQLTarget` (part of SHACL 1.0 SPARQL)

Relevant issues:
- https://github.com/w3c/data-shapes/issues/242 that gives an example of constant-time shape and linear-time shape.
But I didn't define it just for the low complexity: this would be useful in a SHACL profile for Modeling.
- https://github.com/w3c/data-shapes/issues/216 that asks for SHACL Profiles. Now I'll add "based on the complexity of features"
- https://github.com/w3c/data-shapes/issues/235 that removed SPARQL from Core
- https://github.com/w3c/data-shapes/issues/312 that asks for structuring the spec along profiles (and https://github.com/w3c/data-shapes/issues/312#issuecomment-2716744035 that comments on complexity)



Please view or discuss this issue at https://github.com/w3c/data-shapes/issues/321 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Viewing latest article 15
Browse Latest Browse All 204

Trending Articles