modules/s3dgraphy/sync/graph_projector.py¶
Overview¶
This file contains 13 documented elements.
Functions¶
compute_primary(memberships, priority_order)¶
Pick exactly one primary group_uuid per us_id following priority.
memberships: list of dicts with at least these keys: - us_id: graph node_id of the US - group_uuid: target group node_id - group_kind: pyarchinit dimension (struttura, area, ..., toponym)
priority_order: list of group_kind names, highest priority first. Toponym is excluded automatically (never primary).
Returns: dict us_id → group_uuid (the primary). US without any eligible spatial/activity membership get no entry.
Parameters:
- memberships
- priority_order
_is_us_node(node)¶
Return True if node is a stratigraphic unit (US/USM/USVs/...).
Parameters:
- node
_is_epoch_node(node)¶
Return True if node is an EpochNode. Defensive — avoids importing EpochNode at module top because it forces s3dgraphy load too early.
Parameters:
- node
Classes¶
ProjectionError¶
Read-side failure during GraphProjector.populate_graph().
Inherits from: GraphSyncError
GraphProjector¶
Stratigraphic-layer projection from PyArchInit DB to s3dgraphy Graph.
Usage: projector = GraphProjector() graph = projector.populate_graph(db_path, sito="Scavo archeologico")
The graph contains StratigraphicUnit / USM / USVs / USVn / USD /
SF / VSF / CON / DOC / Extractor / Combinar / property nodes plus
EpochNodes for the (periodo, fase) tuples present in the filtered
rows. Edges follow the rapporti column conventions decoded by
_RAPPORTI_TO_EDGE_TYPE and _RAPPORTI_SHORTHAND in
graphml_writer.py.
Methods¶
init(self, vocab_provider)¶
__init__¶
populate_graph(self, db_path, sito, include_paradata, strict_schema, groups, primary_priority)¶
Build and return a s3dgraphy.Graph populated with the
stratigraphic rows of sito from the SQLite at db_path.
Args:
db_path: filesystem path to the pyarchinit SQLite DB.
sito: site identifier (us_table.sito value). Mandatory
— multi-graph projections are out of scope for AI04.
include_paradata: when True (default), merge any
paradata.graphml produced by :class:ParadataStore
for the (db, sito) pair. When False, return the pure
stratigraphic layer (backward-compat for AI04 callers
like graphml_writer.export_graphml). On read errors
we log a warning and fall back to strat-only — never
fatal.
strict_schema: when True (default), require that the
Phase-1 migration has been applied (i.e.
us_table.node_uuid exists) and propagate node-UUID
attributes onto each StratigraphicUnit so AI04 can do a
round-trip. When False, skip both the schema check and
_propagate_node_uuid_and_us: useful for the AI03
strat-only export path (graphml_writer.export_graphml)
which only needs labels/edges/swimlanes — node_uuid is
irrelevant there and AC-2 fixtures pre-date the
migration.
primary_priority: optional list of dimension names ordered
from highest to lowest priority for the AI07
is_primary selection (compute_primary). When None,
DEFAULT_PRIMARY_PRIORITY is used. Toponym is
always excluded.
Returns:
A populated s3dgraphy.Graph. Empty graph (zero nodes) is
valid: it just means the site has no rows.
Raises: ProjectionError: on any failure reaching the DB or instantiating the in-memory graph.
_verify_node_uuid_column(self, db_path)¶
Ensure the Phase-1 migration that added us_table.node_uuid
has been applied. Raises :class:ProjectionError otherwise.
Extracted from populate_graph in AI05 Group C step 2 so the
schema-check is testable in isolation and reusable by any future
method that touches strat tables.
_merge_paradata(self, graph, db_path, sito)¶
Read paradata.graphml for sito and add its nodes to
graph.
Non-fatal on read errors — logs a warning and returns. The caller still gets the strat layer.
De-duplication: nodes whose node_id already exists on the
target graph are skipped (the strat layer wins). Edges from the
paradata graph are NOT merged here; AI05 Group C does
node-only merging because the paradata graph is currently
author/license/embargo nodes with no connecting edges.
_propagate_node_uuid_and_us(self, graph, db_path, sito)¶
Set attributes['node_uuid'], 'us' and the remaining mapped columns on each StratigraphicUnit-family node.
Match nodes by name (the importer emits name=str(us_table.us))
within the requested sito. Idempotent: re-running yields the
same attribute values.
_enrich_into(self, graph, db_path, sito_filter)¶
Phase 2 / Strategy A — full-class implementation.
Body absorbed verbatim from the now-deleted standalone function
formerly named _enrich_pyarchinit_graph in graphml_writer.
Bake epoch swimlanes + topological rapporti edges into graph.
The vendored s3dgraphy 0.1.40 PyArchInitImporter is incomplete: it imports only US columns mapped in the JSON mapping (us_table → StratigraphicNode + PropertyNodes), and does NOT:
- read periodizzazione_table → create EpochNodes
- add
has_first_epochedges from each US to its periodo - parse the
rapportiJSON column → create topological edges
Without those, the GraphMLExporter has no input for swimlanes and no input for the TemporalInferenceEngine — both AI03 acceptance criteria fail. We perform the enrichment here, in the orchestrator's filter+enrich layer, so the bridge stays a one-call surface and so the test fixture remains pure pyArchInit-shaped data.
Mutates the graph in place. No-op if the DB file lacks the expected tables.
_merge_groups(self, graph, db_path, sito, dimensions, primary_priority)¶
Materialize group nodes per dimension. AI07: dispatch per dimension — 6 spatial dims → LocationNodeGroup + is_in_location, attivita → ActivityNodeGroup + is_in_activity (unchanged).
primary_priority: list[str] of dimension names ordered from highest to lowest priority for is_primary selection. None = use DEFAULT_PRIMARY_PRIORITY.
_emit_toponym_chain(self, graph, db_path, sito)¶
AI07 Stage 3: emit a recursive LocationNodeGroup(kind='toponym') chain from site_table.{nazione,regione,provincia,comune}.
Empty levels are skipped (Q4=c). If all 4 levels are empty, no chain is emitted.
Cross-site dedupe: each (name, "toponym") pair maps to a deterministic group_uuid (sha1) so two sites in the same comune share the node.
Each US in the projected graph gets one is_in_location edge to
the DEEPEST non-empty level (typically comune), always
is_primary=false (toponym never primary).
The chain itself is structured top-down via is_in_location edges: nazione ← regione ← provincia ← comune (each lower level "is_in_location" of the next level up).