Builder
These classes and methods make it easier to construct MetadataChangeProposals and MetadataChangeEvents.
- class datahub.emitter.mcp.MetadataChangeProposalWrapper(entityType='ENTITY_TYPE_UNSET', changeType='UPSERT', entityUrn=None, entityKeyAspect=None, auditHeader=None, aspectName=None, aspect=None, systemMetadata=None)
Bases:
object
- Parameters:
entityType (
str
)changeType (
Union
[str
,ChangeTypeClass
])entityUrn (
Optional
[str
])entityKeyAspect (
Optional
[_Aspect
])auditHeader (
Optional
[KafkaAuditHeaderClass
])aspectName (
Optional
[str
])aspect (
Optional
[_Aspect
])systemMetadata (
Optional
[SystemMetadataClass
])
-
entityType:
str
= 'ENTITY_TYPE_UNSET'
-
changeType:
Union
[str
,ChangeTypeClass
] = 'UPSERT'
-
entityUrn:
Optional
[str
] = None
-
entityKeyAspect:
Optional
[_Aspect
] = None
-
auditHeader:
Optional
[KafkaAuditHeaderClass
] = None
-
aspectName:
Optional
[str
] = None
-
aspect:
Optional
[_Aspect
] = None
-
systemMetadata:
Optional
[SystemMetadataClass
] = None
- classmethod construct_many(entityUrn, aspects)
- Parameters:
entityUrn (
str
)aspects (
List
[Optional
[_Aspect
]])
- Return type:
- make_mcp()
- Return type:
- validate()
- Return type:
bool
- to_obj(tuples=False, simplified_structure=False)
- Parameters:
tuples (
bool
)simplified_structure (
bool
)
- Return type:
dict
- classmethod from_obj(obj, tuples=False)
Attempt to deserialize into an MCPW, but fall back to a standard MCP if we’re missing codegen’d classes for the entity key or aspect.
- Parameters:
obj (
dict
)tuples (
bool
)
- Return type:
Union
[MetadataChangeProposalWrapper
,MetadataChangeProposalClass
]
- classmethod try_from_mcpc(mcpc)
Attempts to create a MetadataChangeProposalWrapper from a MetadataChangeProposalClass. Neatly handles unsupported, expected cases, such as unknown aspect types or non-json content type.
- Raises:
Exception if the generic aspect is invalid, e.g. contains invalid json.
- Parameters:
mcpc (
MetadataChangeProposalClass
)- Return type:
Optional
[MetadataChangeProposalWrapper
]
- classmethod from_obj_require_wrapper(obj, tuples=False)
- Parameters:
obj (
dict
)tuples (
bool
)
- Return type:
- as_workunit(*, treat_errors_as_warnings=False)
- Parameters:
treat_errors_as_warnings (
bool
)- Return type:
MetadataWorkUnit
Convenience functions for creating MCEs
- datahub.emitter.mce_builder.set_dataset_urn_to_lower(value)
- Parameters:
value (
bool
)- Return type:
None
- class datahub.emitter.mce_builder.OwnerType(value)
Bases:
Enum
An enumeration.
- USER = 'corpuser'
- GROUP = 'corpGroup'
- datahub.emitter.mce_builder.get_sys_time()
- Return type:
int
- datahub.emitter.mce_builder.make_data_platform_urn(platform)
- Parameters:
platform (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_dataset_urn(platform, name, env='PROD')
- Parameters:
platform (
str
)name (
str
)env (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_dataplatform_instance_urn(platform, instance)
- Parameters:
platform (
str
)instance (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_dataset_urn_with_platform_instance(platform, name, platform_instance, env='PROD')
- Parameters:
platform (
str
)name (
str
)platform_instance (
Optional
[str
])env (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_schema_field_urn(parent_urn, field_path)
- Parameters:
parent_urn (
str
)field_path (
str
)
- Return type:
str
- datahub.emitter.mce_builder.schema_field_urn_to_key(schema_field_urn)
- Parameters:
schema_field_urn (
str
)- Return type:
Optional
[SchemaFieldKeyClass
]
- datahub.emitter.mce_builder.dataset_urn_to_key(dataset_urn)
- Parameters:
dataset_urn (
str
)- Return type:
Optional
[DatasetKeyClass
]
- datahub.emitter.mce_builder.dataset_key_to_urn(key)
- Parameters:
key (
DatasetKeyClass
)- Return type:
str
- datahub.emitter.mce_builder.make_container_urn(guid)
- Parameters:
guid (
str
)- Return type:
str
- datahub.emitter.mce_builder.container_urn_to_key(guid)
- Parameters:
guid (
str
)- Return type:
Optional
[ContainerKeyClass
]
- datahub.emitter.mce_builder.datahub_guid(obj)
- Parameters:
obj (
dict
)- Return type:
str
- datahub.emitter.mce_builder.make_assertion_urn(assertion_id)
- Parameters:
assertion_id (
str
)- Return type:
str
- datahub.emitter.mce_builder.assertion_urn_to_key(assertion_urn)
- Parameters:
assertion_urn (
str
)- Return type:
Optional
[AssertionKeyClass
]
- datahub.emitter.mce_builder.make_user_urn(username)
Makes a user urn if the input is not a user urn already
- Parameters:
username (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_group_urn(groupname)
Makes a group urn if the input is not a group urn already
- Parameters:
groupname (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_tag_urn(tag)
Makes a tag urn if the input is not a tag urn already
- Parameters:
tag (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_owner_urn(owner, owner_type)
- Parameters:
owner (
str
)owner_type (
OwnerType
)
- Return type:
str
- datahub.emitter.mce_builder.make_term_urn(term)
Makes a term urn if the input is not a term urn already
- Parameters:
term (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_data_flow_urn(orchestrator, flow_id, cluster='prod', platform_instance=None)
- Parameters:
orchestrator (
str
)flow_id (
str
)cluster (
str
)platform_instance (
Optional
[str
])
- Return type:
str
- datahub.emitter.mce_builder.make_data_job_urn_with_flow(flow_urn, job_id)
- Parameters:
flow_urn (
str
)job_id (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_data_process_instance_urn(dataProcessInstanceId)
- Parameters:
dataProcessInstanceId (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_data_job_urn(orchestrator, flow_id, job_id, cluster='prod', platform_instance=None)
- Parameters:
orchestrator (
str
)flow_id (
str
)job_id (
str
)cluster (
str
)platform_instance (
Optional
[str
])
- Return type:
str
- datahub.emitter.mce_builder.make_dashboard_urn(platform, name, platform_instance=None)
- Parameters:
platform (
str
)name (
str
)platform_instance (
Optional
[str
])
- Return type:
str
- datahub.emitter.mce_builder.dashboard_urn_to_key(dashboard_urn)
- Parameters:
dashboard_urn (
str
)- Return type:
Optional
[DashboardKeyClass
]
- datahub.emitter.mce_builder.make_chart_urn(platform, name, platform_instance=None)
- Parameters:
platform (
str
)name (
str
)platform_instance (
Optional
[str
])
- Return type:
str
- datahub.emitter.mce_builder.chart_urn_to_key(chart_urn)
- Parameters:
chart_urn (
str
)- Return type:
Optional
[ChartKeyClass
]
- datahub.emitter.mce_builder.make_domain_urn(domain)
- Parameters:
domain (
str
)- Return type:
str
- datahub.emitter.mce_builder.make_ml_primary_key_urn(feature_table_name, primary_key_name)
- Parameters:
feature_table_name (
str
)primary_key_name (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_ml_feature_urn(feature_table_name, feature_name)
- Parameters:
feature_table_name (
str
)feature_name (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_ml_feature_table_urn(platform, feature_table_name)
- Parameters:
platform (
str
)feature_table_name (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_ml_model_urn(platform, model_name, env)
- Parameters:
platform (
str
)model_name (
str
)env (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_ml_model_deployment_urn(platform, deployment_name, env)
- Parameters:
platform (
str
)deployment_name (
str
)env (
str
)
- Return type:
str
- datahub.emitter.mce_builder.make_ml_model_group_urn(platform, group_name, env)
- Parameters:
platform (
str
)group_name (
str
)env (
str
)
- Return type:
str
- datahub.emitter.mce_builder.is_valid_ownership_type(ownership_type)
- Parameters:
ownership_type (
Optional
[str
])- Return type:
bool
- datahub.emitter.mce_builder.validate_ownership_type(ownership_type)
- Parameters:
ownership_type (
Optional
[str
])- Return type:
str
- datahub.emitter.mce_builder.make_lineage_mce(upstream_urns, downstream_urn, lineage_type='TRANSFORMED')
- Parameters:
upstream_urns (
List
[str
])downstream_urn (
str
)lineage_type (
str
)
- Return type:
- datahub.emitter.mce_builder.can_add_aspect(mce, AspectType)
- Parameters:
mce (
MetadataChangeEventClass
)AspectType (
Type
[TypeVar
(Aspect
, bound=_Aspect
)])
- Return type:
bool
- datahub.emitter.mce_builder.assert_can_add_aspect(mce, AspectType)
- Parameters:
mce (
MetadataChangeEventClass
)AspectType (
Type
[TypeVar
(Aspect
, bound=_Aspect
)])
- Return type:
None
- datahub.emitter.mce_builder.get_aspect_if_available(mce, AspectType)
- Parameters:
mce (
MetadataChangeEventClass
)AspectType (
Type
[TypeVar
(Aspect
, bound=_Aspect
)])
- Return type:
Optional
[TypeVar
(Aspect
, bound=_Aspect
)]
- datahub.emitter.mce_builder.remove_aspect_if_available(mce, aspect_type)
- Parameters:
mce (
MetadataChangeEventClass
)aspect_type (
Type
[TypeVar
(Aspect
, bound=_Aspect
)])
- Return type:
bool
- datahub.emitter.mce_builder.get_or_add_aspect(mce, default)
- Parameters:
mce (
MetadataChangeEventClass
)default (
TypeVar
(Aspect
, bound=_Aspect
))
- Return type:
TypeVar
(Aspect
, bound=_Aspect
)
- datahub.emitter.mce_builder.make_global_tag_aspect_with_tag_list(tags)
- Parameters:
tags (
List
[str
])- Return type:
- datahub.emitter.mce_builder.make_ownership_aspect_from_urn_list(owner_urns, source_type, owner_type='DATAOWNER')
- Parameters:
owner_urns (
List
[str
])source_type (
Union
[str
,OwnershipSourceTypeClass
,None
])owner_type (
Union
[str
,OwnershipTypeClass
])
- Return type:
- datahub.emitter.mce_builder.make_glossary_terms_aspect_from_urn_list(term_urns)
- Parameters:
term_urns (
List
[str
])- Return type:
- datahub.emitter.mce_builder.set_aspect(mce, aspect, aspect_type)
Sets the aspect to the provided aspect, overwriting any previous aspect value that might have existed before. If passed in aspect is None, then the existing aspect value will be removed
- Parameters:
mce (
MetadataChangeEventClass
)aspect (
Optional
[TypeVar
(Aspect
, bound=_Aspect
)])aspect_type (
Type
[TypeVar
(Aspect
, bound=_Aspect
)])
- Return type:
None
- class datahub.emitter.mcp_builder.DatahubKey(**data)
Bases:
BaseModel
- Parameters:
data (
Any
)
- guid_dict()
- Return type:
Dict
[str
,str
]
- guid()
- Return type:
str
- class datahub.emitter.mcp_builder.PlatformKey(**data)
Bases:
DatahubKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
-
platform:
str
-
instance:
Optional
[str
]
-
env:
Optional
[str
]
-
backcompat_env_as_instance:
bool
- guid_dict()
- Return type:
Dict
[str
,str
]
- property_dict()
- Return type:
Dict
[str
,str
]
- class datahub.emitter.mcp_builder.DatabaseKey(**data)
Bases:
PlatformKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
database (str)
-
database:
str
- class datahub.emitter.mcp_builder.SchemaKey(**data)
Bases:
DatabaseKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
database (str)
schema (str)
-
db_schema:
str
- class datahub.emitter.mcp_builder.ProjectIdKey(**data)
Bases:
PlatformKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
project_id (str)
-
project_id:
str
- class datahub.emitter.mcp_builder.MetastoreKey(**data)
Bases:
PlatformKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
metastore (str)
-
metastore:
str
- class datahub.emitter.mcp_builder.CatalogKey(**data)
Bases:
MetastoreKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
metastore (str)
catalog (str)
-
catalog:
str
- class datahub.emitter.mcp_builder.UnitySchemaKey(**data)
Bases:
CatalogKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
metastore (str)
catalog (str)
unity_schema (str)
-
unity_schema:
str
- class datahub.emitter.mcp_builder.BigQueryDatasetKey(**data)
Bases:
ProjectIdKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
project_id (str)
dataset_id (str)
-
dataset_id:
str
- class datahub.emitter.mcp_builder.FolderKey(**data)
Bases:
PlatformKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
folder_abs_path (str)
-
folder_abs_path:
str
- class datahub.emitter.mcp_builder.BucketKey(**data)
Bases:
PlatformKey
- Parameters:
data (
Any
)platform (str)
instance (str | None)
env (str | None)
backcompat_env_as_instance (bool)
bucket_name (str)
-
bucket_name:
str
- class datahub.emitter.mcp_builder.DatahubKeyJSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)
Bases:
JSONEncoder
- default(obj)
Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
- Parameters:
obj (
Any
)- Return type:
Any
- datahub.emitter.mcp_builder.add_domain_to_entity_wu(entity_urn, domain_urn)
- Parameters:
entity_urn (
str
)domain_urn (
str
)
- Return type:
Iterable
[MetadataWorkUnit
]
- datahub.emitter.mcp_builder.add_owner_to_entity_wu(entity_type, entity_urn, owner_urn)
- Parameters:
entity_type (
str
)entity_urn (
str
)owner_urn (
str
)
- Return type:
Iterable
[MetadataWorkUnit
]
- datahub.emitter.mcp_builder.add_tags_to_entity_wu(entity_type, entity_urn, tags)
- Parameters:
entity_type (
str
)entity_urn (
str
)tags (
List
[str
])
- Return type:
Iterable
[MetadataWorkUnit
]
- datahub.emitter.mcp_builder.wrap_aspect_as_workunit(entityName, entityUrn, aspectName, aspect)
- Parameters:
entityName (
str
)entityUrn (
str
)aspectName (
str
)aspect (
_Aspect
)
- Return type:
MetadataWorkUnit
- datahub.emitter.mcp_builder.gen_containers(container_key, name, sub_types, parent_container_key=None, extra_properties=None, domain_urn=None, description=None, owner_urn=None, external_url=None, tags=None, qualified_name=None, created=None, last_modified=None)
- Parameters:
container_key (
TypeVar
(KeyType
, bound=PlatformKey
))name (
str
)sub_types (
List
[str
])parent_container_key (
Optional
[PlatformKey
])extra_properties (
Optional
[Dict
[str
,str
]])domain_urn (
Optional
[str
])description (
Optional
[str
])owner_urn (
Optional
[str
])external_url (
Optional
[str
])tags (
Optional
[List
[str
]])qualified_name (
Optional
[str
])created (
Optional
[int
])last_modified (
Optional
[int
])
- Return type:
Iterable
[MetadataWorkUnit
]
- datahub.emitter.mcp_builder.add_dataset_to_container(container_key, dataset_urn)
- Parameters:
container_key (
TypeVar
(KeyType
, bound=PlatformKey
))dataset_urn (
str
)
- Return type:
Iterable
[MetadataWorkUnit
]
- datahub.emitter.mcp_builder.add_entity_to_container(container_key, entity_type, entity_urn)
- Parameters:
container_key (
TypeVar
(KeyType
, bound=PlatformKey
))entity_type (
str
)entity_urn (
str
)
- Return type:
Iterable
[MetadataWorkUnit
]
- datahub.emitter.mcp_builder.mcps_from_mce(mce)
- Parameters:
mce (
MetadataChangeEventClass
)- Return type:
Iterable
[MetadataChangeProposalWrapper
]
- datahub.emitter.mcp_builder.create_embed_mcp(urn, embed_url)
- Parameters:
urn (
str
)embed_url (
str
)
- Return type: