These classes and methods make it easier to construct MetadataChangeProposals and MetadataChangeEvents.

class datahub.emitter.mcp.MetadataChangeProposalWrapper(entityType='ENTITY_TYPE_UNSET', changeType='UPSERT', entityUrn=None, entityKeyAspect=None, auditHeader=None, aspectName=None, aspect=None, systemMetadata=None)

Bases: object

Parameters:
entityType: str = 'ENTITY_TYPE_UNSET'
changeType: Union[str, ChangeTypeClass] = 'UPSERT'
entityUrn: Optional[str] = None
entityKeyAspect: Optional[_Aspect] = None
auditHeader: Optional[KafkaAuditHeaderClass] = None
aspectName: Optional[str] = None
aspect: Optional[_Aspect] = None
systemMetadata: Optional[SystemMetadataClass] = None
classmethod construct_many(entityUrn, aspects)
Parameters:
  • entityUrn (str)

  • aspects (List[Optional[_Aspect]])

Return type:

List[MetadataChangeProposalWrapper]

make_mcp()
Return type:

MetadataChangeProposalClass

validate()
Return type:

bool

to_obj(tuples=False, simplified_structure=False)
Parameters:
  • tuples (bool)

  • simplified_structure (bool)

Return type:

dict

classmethod from_obj(obj, tuples=False)

Attempt to deserialize into an MCPW, but fall back to a standard MCP if we’re missing codegen’d classes for the entity key or aspect.

Parameters:
  • obj (dict)

  • tuples (bool)

Return type:

Union[MetadataChangeProposalWrapper, MetadataChangeProposalClass]

classmethod try_from_mcpc(mcpc)

Attempts to create a MetadataChangeProposalWrapper from a MetadataChangeProposalClass. Neatly handles unsupported, expected cases, such as unknown aspect types or non-json content type.

Raises:

Exception if the generic aspect is invalid, e.g. contains invalid json.

Parameters:

mcpc (MetadataChangeProposalClass)

Return type:

Optional[MetadataChangeProposalWrapper]

classmethod try_from_mcl(mcl)
Parameters:

mcl (MetadataChangeLogClass)

Return type:

Union[MetadataChangeProposalWrapper, MetadataChangeProposalClass]

classmethod from_obj_require_wrapper(obj, tuples=False)
Parameters:
  • obj (dict)

  • tuples (bool)

Return type:

MetadataChangeProposalWrapper

as_workunit(*, treat_errors_as_warnings=False)
Parameters:

treat_errors_as_warnings (bool)

Return type:

MetadataWorkUnit

Convenience functions for creating MCEs

datahub.emitter.mce_builder.set_dataset_urn_to_lower(value)
Parameters:

value (bool)

Return type:

None

class datahub.emitter.mce_builder.OwnerType(value)

Bases: Enum

An enumeration.

USER = 'corpuser'
GROUP = 'corpGroup'
datahub.emitter.mce_builder.get_sys_time()
Return type:

int

datahub.emitter.mce_builder.make_data_platform_urn(platform)
Parameters:

platform (str)

Return type:

str

datahub.emitter.mce_builder.make_dataset_urn(platform, name, env='PROD')
Parameters:
  • platform (str)

  • name (str)

  • env (str)

Return type:

str

datahub.emitter.mce_builder.make_dataplatform_instance_urn(platform, instance)
Parameters:
  • platform (str)

  • instance (str)

Return type:

str

datahub.emitter.mce_builder.make_dataset_urn_with_platform_instance(platform, name, platform_instance, env='PROD')
Parameters:
  • platform (str)

  • name (str)

  • platform_instance (Optional[str])

  • env (str)

Return type:

str

datahub.emitter.mce_builder.make_schema_field_urn(parent_urn, field_path)
Parameters:
  • parent_urn (str)

  • field_path (str)

Return type:

str

datahub.emitter.mce_builder.schema_field_urn_to_key(schema_field_urn)
Parameters:

schema_field_urn (str)

Return type:

Optional[SchemaFieldKeyClass]

datahub.emitter.mce_builder.dataset_urn_to_key(dataset_urn)
Parameters:

dataset_urn (str)

Return type:

Optional[DatasetKeyClass]

datahub.emitter.mce_builder.dataset_key_to_urn(key)
Parameters:

key (DatasetKeyClass)

Return type:

str

datahub.emitter.mce_builder.make_container_urn(guid)
Parameters:

guid (Union[str, DatahubKey])

Return type:

str

datahub.emitter.mce_builder.container_urn_to_key(guid)
Parameters:

guid (str)

Return type:

Optional[ContainerKeyClass]

datahub.emitter.mce_builder.datahub_guid(obj)
Parameters:

obj (dict)

Return type:

str

datahub.emitter.mce_builder.make_assertion_urn(assertion_id)
Parameters:

assertion_id (str)

Return type:

str

datahub.emitter.mce_builder.assertion_urn_to_key(assertion_urn)
Parameters:

assertion_urn (str)

Return type:

Optional[AssertionKeyClass]

datahub.emitter.mce_builder.make_user_urn(username)

Makes a user urn if the input is not a user urn already

Parameters:

username (str)

Return type:

str

datahub.emitter.mce_builder.make_group_urn(groupname)

Makes a group urn if the input is not a group urn already

Parameters:

groupname (str)

Return type:

str

datahub.emitter.mce_builder.make_tag_urn(tag)

Makes a tag urn if the input is not a tag urn already

Parameters:

tag (str)

Return type:

str

datahub.emitter.mce_builder.make_owner_urn(owner, owner_type)
Parameters:
Return type:

str

datahub.emitter.mce_builder.make_term_urn(term)

Makes a term urn if the input is not a term urn already

Parameters:

term (str)

Return type:

str

datahub.emitter.mce_builder.make_data_flow_urn(orchestrator, flow_id, cluster='prod', platform_instance=None)
Parameters:
  • orchestrator (str)

  • flow_id (str)

  • cluster (str)

  • platform_instance (Optional[str])

Return type:

str

datahub.emitter.mce_builder.make_data_job_urn_with_flow(flow_urn, job_id)
Parameters:
  • flow_urn (str)

  • job_id (str)

Return type:

str

datahub.emitter.mce_builder.make_data_process_instance_urn(dataProcessInstanceId)
Parameters:

dataProcessInstanceId (str)

Return type:

str

datahub.emitter.mce_builder.make_data_job_urn(orchestrator, flow_id, job_id, cluster='prod', platform_instance=None)
Parameters:
  • orchestrator (str)

  • flow_id (str)

  • job_id (str)

  • cluster (str)

  • platform_instance (Optional[str])

Return type:

str

datahub.emitter.mce_builder.make_dashboard_urn(platform, name, platform_instance=None)
Parameters:
  • platform (str)

  • name (str)

  • platform_instance (Optional[str])

Return type:

str

datahub.emitter.mce_builder.dashboard_urn_to_key(dashboard_urn)
Parameters:

dashboard_urn (str)

Return type:

Optional[DashboardKeyClass]

datahub.emitter.mce_builder.make_chart_urn(platform, name, platform_instance=None)
Parameters:
  • platform (str)

  • name (str)

  • platform_instance (Optional[str])

Return type:

str

datahub.emitter.mce_builder.chart_urn_to_key(chart_urn)
Parameters:

chart_urn (str)

Return type:

Optional[ChartKeyClass]

datahub.emitter.mce_builder.make_domain_urn(domain)
Parameters:

domain (str)

Return type:

str

datahub.emitter.mce_builder.make_ml_primary_key_urn(feature_table_name, primary_key_name)
Parameters:
  • feature_table_name (str)

  • primary_key_name (str)

Return type:

str

datahub.emitter.mce_builder.make_ml_feature_urn(feature_table_name, feature_name)
Parameters:
  • feature_table_name (str)

  • feature_name (str)

Return type:

str

datahub.emitter.mce_builder.make_ml_feature_table_urn(platform, feature_table_name)
Parameters:
  • platform (str)

  • feature_table_name (str)

Return type:

str

datahub.emitter.mce_builder.make_ml_model_urn(platform, model_name, env)
Parameters:
  • platform (str)

  • model_name (str)

  • env (str)

Return type:

str

datahub.emitter.mce_builder.make_ml_model_deployment_urn(platform, deployment_name, env)
Parameters:
  • platform (str)

  • deployment_name (str)

  • env (str)

Return type:

str

datahub.emitter.mce_builder.make_ml_model_group_urn(platform, group_name, env)
Parameters:
  • platform (str)

  • group_name (str)

  • env (str)

Return type:

str

datahub.emitter.mce_builder.is_valid_ownership_type(ownership_type)
Parameters:

ownership_type (Optional[str])

Return type:

bool

datahub.emitter.mce_builder.validate_ownership_type(ownership_type)
Parameters:

ownership_type (Optional[str])

Return type:

str

datahub.emitter.mce_builder.make_lineage_mce(upstream_urns, downstream_urn, lineage_type='TRANSFORMED')

Note: this function only supports lineage for dataset aspects. It will not update lineage for any other aspect types.

Parameters:
  • upstream_urns (List[str])

  • downstream_urn (str)

  • lineage_type (str)

Return type:

MetadataChangeEventClass

datahub.emitter.mce_builder.can_add_aspect(mce, AspectType)
Parameters:
Return type:

bool

datahub.emitter.mce_builder.assert_can_add_aspect(mce, AspectType)
Parameters:
Return type:

None

datahub.emitter.mce_builder.get_aspect_if_available(mce, AspectType)
Parameters:
Return type:

Optional[TypeVar(Aspect, bound= _Aspect)]

datahub.emitter.mce_builder.remove_aspect_if_available(mce, aspect_type)
Parameters:
Return type:

bool

datahub.emitter.mce_builder.get_or_add_aspect(mce, default)
Parameters:
Return type:

TypeVar(Aspect, bound= _Aspect)

datahub.emitter.mce_builder.make_global_tag_aspect_with_tag_list(tags)
Parameters:

tags (List[str])

Return type:

GlobalTagsClass

datahub.emitter.mce_builder.make_ownership_aspect_from_urn_list(owner_urns, source_type, owner_type='DATAOWNER')
Parameters:
Return type:

OwnershipClass

datahub.emitter.mce_builder.make_glossary_terms_aspect_from_urn_list(term_urns)
Parameters:

term_urns (List[str])

Return type:

GlossaryTermsClass

datahub.emitter.mce_builder.set_aspect(mce, aspect, aspect_type)

Sets the aspect to the provided aspect, overwriting any previous aspect value that might have existed before. If passed in aspect is None, then the existing aspect value will be removed

Parameters:
  • mce (MetadataChangeEventClass)

  • aspect (Optional[TypeVar(Aspect, bound= _Aspect)])

  • aspect_type (Type[TypeVar(Aspect, bound= _Aspect)])

Return type:

None

class datahub.emitter.mcp_builder.DatahubKey(**data)

Bases: BaseModel

Parameters:

data (Any)

guid_dict()
Return type:

Dict[str, str]

guid()
Return type:

str

class datahub.emitter.mcp_builder.ContainerKey(**data)

Bases: DatahubKey

Base class for container guid keys. Most users should use one of the subclasses instead.

Parameters:
  • data (Any)

  • platform (str)

  • instance (str | None)

  • env (str | None)

  • backcompat_env_as_instance (bool)

platform: str
instance: Optional[str]
env: Optional[str]
backcompat_env_as_instance: bool
guid_dict()
Return type:

Dict[str, str]

property_dict()
Return type:

Dict[str, str]

as_urn()
Return type:

str

datahub.emitter.mcp_builder.PlatformKey

alias of ContainerKey

class datahub.emitter.mcp_builder.DatabaseKey(**data)

Bases: ContainerKey

Parameters:
  • data (Any)

  • platform (str)

  • instance (str | None)

  • env (str | None)

  • backcompat_env_as_instance (bool)

  • database (str)

database: str
class datahub.emitter.mcp_builder.SchemaKey(**data)

Bases: DatabaseKey

Parameters:
  • data (Any)

  • platform (str)

  • instance (str | None)

  • env (str | None)

  • backcompat_env_as_instance (bool)

  • database (str)

  • schema (str)

db_schema: str
class datahub.emitter.mcp_builder.ProjectIdKey(**data)

Bases: ContainerKey

Parameters:
  • data (Any)

  • platform (str)

  • instance (str | None)

  • env (str | None)

  • backcompat_env_as_instance (bool)

  • project_id (str)

project_id: str
class datahub.emitter.mcp_builder.MetastoreKey(**data)

Bases: ContainerKey

Parameters:
  • data (Any)

  • platform (str)

  • instance (str | None)

  • env (str | None)

  • backcompat_env_as_instance (bool)

  • metastore (str)

metastore: str
class datahub.emitter.mcp_builder.CatalogKey(**data)

Bases: MetastoreKey

Parameters:
  • data (Any)

  • platform (str)

  • instance (str | None)

  • env (str | None)

  • backcompat_env_as_instance (bool)

  • metastore (str)

  • catalog (str)

catalog: str
class datahub.emitter.mcp_builder.UnitySchemaKey(**data)

Bases: CatalogKey

Parameters:
  • data (Any)

  • platform (str)

  • instance (str | None)

  • env (str | None)

  • backcompat_env_as_instance (bool)

  • metastore (str)

  • catalog (str)

  • unity_schema (str)

unity_schema: str
class datahub.emitter.mcp_builder.BigQueryDatasetKey(**data)

Bases: ProjectIdKey

Parameters:
  • data (Any)

  • platform (str)

  • instance (str | None)

  • env (str | None)

  • backcompat_env_as_instance (bool)

  • project_id (str)

  • dataset_id (str)

dataset_id: str
class datahub.emitter.mcp_builder.FolderKey(**data)

Bases: ContainerKey

Parameters:
  • data (Any)

  • platform (str)

  • instance (str | None)

  • env (str | None)

  • backcompat_env_as_instance (bool)

  • folder_abs_path (str)

folder_abs_path: str
class datahub.emitter.mcp_builder.BucketKey(**data)

Bases: ContainerKey

Parameters:
  • data (Any)

  • platform (str)

  • instance (str | None)

  • env (str | None)

  • backcompat_env_as_instance (bool)

  • bucket_name (str)

bucket_name: str
class datahub.emitter.mcp_builder.DatahubKeyJSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: JSONEncoder

default(obj)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
Parameters:

obj (Any)

Return type:

Any

datahub.emitter.mcp_builder.add_domain_to_entity_wu(entity_urn, domain_urn)
Parameters:
  • entity_urn (str)

  • domain_urn (str)

Return type:

Iterable[MetadataWorkUnit]

datahub.emitter.mcp_builder.add_owner_to_entity_wu(entity_type, entity_urn, owner_urn)
Parameters:
  • entity_type (str)

  • entity_urn (str)

  • owner_urn (str)

Return type:

Iterable[MetadataWorkUnit]

datahub.emitter.mcp_builder.add_tags_to_entity_wu(entity_type, entity_urn, tags)
Parameters:
  • entity_type (str)

  • entity_urn (str)

  • tags (List[str])

Return type:

Iterable[MetadataWorkUnit]

datahub.emitter.mcp_builder.gen_containers(container_key, name, sub_types, parent_container_key=None, extra_properties=None, domain_urn=None, description=None, owner_urn=None, external_url=None, tags=None, qualified_name=None, created=None, last_modified=None)
Parameters:
  • container_key (TypeVar(KeyType, bound= ContainerKey))

  • name (str)

  • sub_types (List[str])

  • parent_container_key (Optional[ContainerKey])

  • extra_properties (Optional[Dict[str, str]])

  • domain_urn (Optional[str])

  • description (Optional[str])

  • owner_urn (Optional[str])

  • external_url (Optional[str])

  • tags (Optional[List[str]])

  • qualified_name (Optional[str])

  • created (Optional[int])

  • last_modified (Optional[int])

Return type:

Iterable[MetadataWorkUnit]

datahub.emitter.mcp_builder.add_dataset_to_container(container_key, dataset_urn)
Parameters:
  • container_key (TypeVar(KeyType, bound= ContainerKey))

  • dataset_urn (str)

Return type:

Iterable[MetadataWorkUnit]

datahub.emitter.mcp_builder.add_entity_to_container(container_key, entity_type, entity_urn)
Parameters:
  • container_key (TypeVar(KeyType, bound= ContainerKey))

  • entity_type (str)

  • entity_urn (str)

Return type:

Iterable[MetadataWorkUnit]

datahub.emitter.mcp_builder.mcps_from_mce(mce)
Parameters:

mce (MetadataChangeEventClass)

Return type:

Iterable[MetadataChangeProposalWrapper]

datahub.emitter.mcp_builder.create_embed_mcp(urn, embed_url)
Parameters:
  • urn (str)

  • embed_url (str)

Return type:

MetadataChangeProposalWrapper