modules/s3dgraphy/sync/graph_ingestor.py¶
Overview¶
This file contains 24 documented elements.
Functions¶
_promote_legacy_activitynodegroup(graph)¶
AI07 Stage B: scan graph for ActivityNodeGroup nodes whose
attributes carry group_kind ∈ SQL_BACKED_KINDS_SPATIAL, and
promote them in-memory to LocationNodeGroup with kind set
per :data:_DIM_TO_KIND. Also rewires incoming is_in_activity
edges to is_in_location.
Detection: looks at node.attributes for either:
- direct key
'group_kind'(newer AI07 exports), or - any
'pyarchinit.<dim>'key where<dim>is inSQL_BACKED_KINDS_SPATIAL(legacy 5.5.x exports that retained the data attributes through the importer).
Emits exactly one DeprecationWarning per call (not per node) if
any promotion happens. The warning references AI07 / pyarchinit
5.6.0+ and instructs the user to re-export to migrate the on-disk
representation.
Returns: number of nodes promoted.
Parameters:
- graph
_apply_group_folders_to_sql(cur, graphml_path, sito)¶
AI07: recursive walker — descend yEd folder-in-folder structures
and apply UPDATE us_table SET <kind>=<group_name> per
SQL-backed folder.
Toponym / unknown / ad-hoc kinds are skipped (AC-14 unchanged).
Cycle detection: a visited set guards against malformed GraphML
where folder A contains folder B contains folder A.
Member US identification: prefer pyarchinit.node_uuid (when
available, byte-identical match to the DB row); fall back to
(pyarchinit.us, pyarchinit.area, sito) (always available
because the AI03 enrichment writes those onto every strat node
regardless of strict_schema).
Parameters:
- cur
- graphml_path
- sito
_values_equal(col, a, b)¶
Loose equality matching the conventions in graphml_writer enrichment. JSON-serialised columns (rapporti) get parse-then-compare.
Parameters:
- col
- a
- b
_is_epoch_node_local(node)¶
_is_epoch_node_local¶
Parameters:
- node
_hydrate_pyarchinit_data_keys(graph, graphml_path)¶
Re-parse the GraphML at graphml_path via lxml and merge the
pyarchinit-specific data keys (pyarchinit.us, pyarchinit.area,
pyarchinit.unita_tipo, etc.) into graph node attributes.
This is the IMPORT-side counterpart of graphml_writer._embed_pyarchinit_data_keys. s3dgraphy's GraphMLImporter strips unknown attributes; we recover the pyarchinit-specific ones by reading the same XML directly.
No-op if the GraphML doesn't contain our custom data keys (older files / files from non-pyarchinit producers).
Parameters:
- graph
- graphml_path
_find_first_epoch(graph, node_uuid)¶
Walk has_first_epoch edges from node_uuid and return the
(periodo, fase) tuple of the linked EpochNode, or (None, None).
s3dgraphy's GraphMLImporter strips most attributes but preserves
edge edge_type. The EpochNode keeps node.name (e.g. "XV secolo")
and any attributes['periodo'] / attributes['fase'] set by the
projector. When attrs are stripped, fall back to parsing from
EpochNode.node_id (which the projector formats as
epoch_<periodo>_<fase>).
Parameters:
- graph
- node_uuid
_rewrite_rapporti_sito(rapporti_str, target_sito)¶
Parse a pyarchinit rapporti list-of-lists string and rewrite the sito (4th element of each rapporto) to target_sito. Returns the re-serialised string. Defensive — returns the input unchanged if parsing fails.
Parameters:
- rapporti_str
- target_sito
_build_rapporti_from_edges(graph, default_sito)¶
Walk graph.edges and return a dict {source_node_id: rapporti_list}
where each rapporti_list is the pyarchinit list-of-lists serialisation
[[type, target_us, area, sito], …].
The target_us is extracted from the target node's name with the
unita_tipo prefix stripped. area defaults to '1' when the graph
didn't preserve it (compatible with most legacy pyarchinit data).
Parameters:
- graph
- default_sito
_strip_us_prefix(name)¶
Strip the unita-tipo prefix from a node name.
Examples: "USM6" → "6" "USV102" → "102" "US103a" → "103a" "D.4001" → "4001" "C.900" → "900" "6" → "6" (no prefix → unchanged)
Parameters:
- name
_resolve_unita_tipo(node, attrs)¶
Return the unita_tipo for node, prefering attrs (set by GraphProjector) over s3dgraphy class name (when graphml round-trip has stripped attrs).
Parameters:
- node
- attrs
Classes¶
GraphSyncError¶
Base class for all GraphProjector / GraphIngestor errors.
Inherits from: Exception
GraphIngestError¶
Write-side failure. Always means DB rolled back to pre-call state.
Inherits from: GraphSyncError
CycleDetectedError¶
AI07: recursive walker detected a cycle in yEd folder nesting.
Inherits from: GraphIngestError
SchemaMismatchError¶
us_table.node_uuid column missing (Phase 1 migration not applied).
Hint: run scripts/migrations/2026_05_node_uuid_backfill.py --apply.
Inherits from: GraphIngestError
UnknownUnitaTipoError¶
Graph node has unita_tipo not in the vocabulary.
Hint: run scripts/migrations/2026_05_us_vocabulary_alignment.py --apply.
Inherits from: GraphIngestError
SiteMismatchError¶
Graph contains a node whose attributes['sito'] != populate_list(sito=...).
Inherits from: GraphIngestError
MissingEpochError¶
One or more EpochNodes reference (periodo, fase) not present in periodizzazione_table while create_missing_epochs=False.
The exception carries missing: list[tuple[int, str]] so callers
can show all the missing keys at once instead of one per call.
Inherits from: GraphIngestError
Methods¶
init(self, missing)¶
__init__¶
GraphIngestor¶
Persist a s3dgraphy Graph back to the PyArchInit SQL tables.
Single atomic transaction (BEGIN/COMMIT/ROLLBACK). Idempotent on re-runs against the same input. AI04 always uses ConflictResolution.GRAPH_WINS for value diffs.
Methods¶
init(self, conflict_resolver)¶
__init__¶
populate_list(self, graph, db_path, sito, dry_run, create_missing_epochs, graphml_path, sql_apply_groups)¶
See spec §3.2 docstring for full contract.
When graphml_path is provided, AI04's custom data-keys
(pyarchinit.us, pyarchinit.area, etc. — see
graphml_writer._embed_pyarchinit_data_keys) are parsed from
the file and merged into graph node attributes, so the
round-trip preserves columns that s3dgraphy's own importer
would otherwise drop.
AI06 D.2: when sql_apply_groups is True (default False), the
importer parses group folder nodes (yfiles.foldertype="group"
with id="grp_...") from the GraphML at graphml_path (or
graph if it is itself a path-like) and queues
UPDATE us_table SET <kind>=<group_name> for every member
US whose folder maps to a SQL-derived group_kind (the
basic 7: area / struttura / attivita / settore / ambient /
saggio / quad_par). Ad-hoc groups (group_kind not in this set)
never touch SQL — they always live in the GroupStore (AC-14).
Convenience: when graph is a Path-like (str or Path) instead of a Graph, the importer auto-loads it as a Graph via s3dgraphy's GraphMLImporter and uses the same path for graphml_path. This lets callers pass just the .graphml file.
_verify_schema(self, db_path)¶
_verify_schema¶
_verify_sito(self, graph, sito)¶
Validate the sito parameter.
Note: AI04.1 changed semantics — we no longer raise on graph nodes carrying a different sito. The user's workflow is "load this graph and ingest into MY sito X", so we treat the parameter as authoritative. The per-node loop overrides each node's sito attribute to sito before INSERT/UPDATE.