Plugins Guide
Plugins are way to enhance the basic DataHub functionality in a custom manner.
Currently, DataHub formally supports 2 types of plugins:
Authentication
Note: This is in BETA version
It is recommend that you do not do this unless you really know what you are doing
Custom authentication plugin makes it possible to authenticate DataHub users against any Identity Management System. Choose your Identity Management System and write custom authentication plugin as per detail mentioned in this section.
Currently, custom authenticators cannot be used to authenticate users of DataHub's web UI. This is because the DataHub web app expects the presence of 2 special cookies PLAY_SESSION and actor which are explicitly set by the server when a login action is performed. Instead, custom authenticators are useful for authenticating API requests to DataHub's backend (GMS), and can stand in addition to the default Authentication performed by DataHub, which is based on DataHub-minted access tokens.
The sample authenticator implementation can be found at Authenticator Sample
Implementing an Authentication Plugin
Add datahub-auth-api as compileOnly dependency: Maven coordinates of datahub-auth-api can be found at Maven
Example of gradle dependency is given below.
dependencies {
def auth_api = 'io.acryl:datahub-auth-api:0.9.3-3rc3'
compileOnly "${auth_api}"
testImplementation "${auth_api}"
}Implement the Authenticator interface: Refer Authenticator Sample
Sample class which implements the Authenticator interface
public class GoogleAuthenticator implements Authenticator {
@Override
public void init(@Nonnull Map<String, Object> authenticatorConfig, @Nullable AuthenticatorContext context) {
// Plugin initialization code will go here
// DataHub will call this method on boot time
}
@Nullable
@Override
public Authentication authenticate(@Nonnull AuthenticationRequest authenticationRequest)
throws AuthenticationException {
// DataHub will call this method whenever authentication decisions are need to be taken
// Authenticate the request and return Authentication
}
}Use
getResourceAsStream
to read files: If your plugin read any configuration file like properties or YAML or JSON or xml then usethis.getClass().getClassLoader().getResourceAsStream("<file-name>")
to read that file from DataHub GMS plugin's class-path. For DataHub GMS resource look-up behavior please refer Plugin Installation section. Sample code ofgetResourceAsStream
is available in sample Authenticator plugin TestAuthenticator.java.
Bundle your Jar: Use
com.github.johnrengelman.shadow
gradle plugin to create an uber jar.To see an example of building an uber jar, check out the
build.gradle
file for the apache-ranger-plugin file of Apache Ranger Plugin for reference.Exclude signature files as shown in below
shadowJar
task.apply plugin: 'com.github.johnrengelman.shadow';
shadowJar {
// Exclude com.datahub.plugins package and files related to jar signature
exclude "META-INF/*.RSA", "META-INF/*.SF","META-INF/*.DSA"
}Refer section Plugin Installation for plugin installation in DataHub environment
Enable GMS Authentication
By default, authentication is disabled in DataHub GMS.
Follow below steps to enable GMS authentication
Download docker-compose.quickstart.yml: Download docker compose file docker-compose.quickstart.yml
Set environment variable: Set
METADATA_SERVICE_AUTH_ENABLED
environment variable totrue
Redeploy DataHub GMS: Below is quickstart command to redeploy DataHub GMS
datahub docker quickstart -f docker-compose.quickstart.yml
Authorization
Note: This is in BETA version
It is recommend that you do not do this unless you really know what you are doing
Custom authorization plugin makes it possible to authorize DataHub users against any Access Management System. Choose your Access Management System and write custom authorization plugin as per detail mentioned in this section.
The sample authorizer implementation can be found at Authorizer Sample
Implementing an Authorization Plugin
Add datahub-auth-api as compileOnly dependency: Maven coordinates of datahub-auth-api can be found at Maven
Example of gradle dependency is given below.
dependencies {
def auth_api = 'io.acryl:datahub-auth-api:0.9.3-3rc3'
compileOnly "${auth_api}"
testImplementation "${auth_api}"
}Implement the Authorizer interface: Authorizer Sample
Sample class which implements the Authorization interface
public class ApacheRangerAuthorizer implements Authorizer {
@Override
public void init(@Nonnull Map<String, Object> authorizerConfig, @Nonnull AuthorizerContext ctx) {
// Plugin initialization code will go here
// DataHub will call this method on boot time
}
@Override
public AuthorizationResult authorize(@Nonnull AuthorizationRequest request) {
// DataHub will call this method whenever authorization decisions are need be taken
// Authorize the request and return AuthorizationResult
}
@Override
public AuthorizedActors authorizedActors(String privilege, Optional<ResourceSpec> resourceSpec) {
// Need to add doc
}
}Use
getResourceAsStream
to read files: If your plugin read any configuration file like properties or YAML or JSON or xml then usethis.getClass().getClassLoader().getResourceAsStream("<file-name>")
to read that file from DataHub GMS plugin's class-path. For DataHub GMS resource look-up behavior please refer Plugin Installation section. Sample code ofgetResourceAsStream
is available in sample Authenticator plugin TestAuthenticator.java.Bundle your Jar: Use
com.github.johnrengelman.shadow
gradle plugin to create an uber jar.To see an example of building an uber jar, check out the
build.gradle
file for the apache-ranger-plugin file of Apache Ranger Plugin for reference.Exclude signature files as shown in below
shadowJar
task.apply plugin: 'com.github.johnrengelman.shadow';
shadowJar {
// Exclude com.datahub.plugins package and files related to jar signature
exclude "META-INF/*.RSA", "META-INF/*.SF","META-INF/*.DSA"
}Install the Plugin: Refer to the section (Plugin Installation)[#plugin_installation] for plugin installation in DataHub environment
Plugin Installation
DataHub's GMS Service searches for the plugins in container's local directory at location /etc/datahub/plugins/auth/
. This location will be referred as plugin-base-directory
hereafter.
For docker, we set docker-compose to mount ${HOME}/.datahub
directory to /etc/datahub
directory within the GMS containers.
Docker
Follow below steps to install plugins:
Lets consider you have created an uber jar for authorizer plugin and jar name is apache-ranger-authorizer.jar and class com.abc.RangerAuthorizer has implemented the Authorizer interface.
Create a plugin configuration file: Create a
config.yml
file at${HOME}/.datahub/plugins/auth/
. For more detail on configuration refer Config Detail sectionCreate a plugin directory: Create plugin directory as
apache-ranger-authorizer
, this directory will be referred asplugin-home
hereaftermkdir -p ${HOME}/.datahub/plugins/auth/apache-ranger-authorizer
Copy plugin jar to
plugin-home
: Copyapache-ranger-authorizer.jar
toplugin-home
copy apache-ranger-authorizer.jar ${HOME}/.datahub/plugins/auth/apache-ranger-authorizer
Update plugin configuration file: Add below entry in
config.yml
file, the plugin can take any arbitrary configuration under the "configs" block. in our example, there is username and passwordplugins:
- name: "apache-ranger-authorizer"
type: "authorizer"
enabled: "true"
params:
className: "com.abc.RangerAuthorizer"
configs:
username: "foo"
password: "fake"Restart datahub-gms container:
On startup DataHub GMS service performs below steps
- Load
config.yml
- Prepare list of plugin where
enabled
is set totrue
- Look for directory equivalent to plugin
name
inplugin-base-directory
. In this case it is/etc/datahub/plugins/auth/apache-ranger-authorizer/
, this directory will becomeplugin-home
- Look for
params.jarFileName
attribute otherwise look for jar having name as <plugin-name>.jar. In this case it is/etc/datahub/plugins/auth/apache-ranger-authorizer/apache-ranger-authorizer.jar
- Load class given in plugin
params.className
attribute from the jar, here load classcom.abc.RangerAuthorizer
fromapache-ranger-authorizer.jar
- Call
init
method of plugin
On method call of `getResourceAsStream` DataHub GMS service looks for the resource in below order. 1. Look for the requested resource in plugin-jar file. if found then return the resource as InputStream. 2. Look for the requested resource in `plugin-home` directory. if found then return the resource as InputStream. 3. Look for the requested resource in application class-loader. if found then return the resource as InputStream. 4. Return `null` as requested resource is not found.- Load
By default, authentication is disabled in DataHub GMS, Please follow section Enable GMS Authentication to enable authentication.
Kubernetes
Helm support is coming soon.
Config Detail
A sample config.yml
can be found at config.yml.
config.yml
structure:
Field | Required | Type | Default | Description |
---|---|---|---|---|
plugins[].name | ✅ | string | name of the plugin | |
plugins[].type | ✅ | enum[authenticator, authorizer] | type of plugin, possible values are authenticator or authorizer | |
plugins[].enabled | ✅ | boolean | whether this plugin is enabled or disabled. DataHub GMS wouldn't process disabled plugin | |
plugins[].params.className | ✅ | string | Authenticator or Authorizer implementation class' fully qualified class name | |
plugins[].params.jarFileName | string | default to plugins[].name .jar | jar file name in plugin-home | |
plugins[].params.configs | map<string,object> | default to empty map | Runtime configuration required for plugin |
plugins[] is an array of plugin, where you can define multiple authenticator and authorizer plugins. plugin name should be unique in plugins array.
Plugin Permissions
Adhere to below plugin access control to keep your plugin forward compatible.
- Plugin should read/write file to and from
plugin-home
directory only. Refer Plugin Installation step2 forplugin-home
definition - Plugin should access port 80 or 443 or port higher than 1024
All other access are forbidden for the plugin.
Disclaimer: In BETA version your plugin can access any port and can read/write to any location on file system, however you should implement the plugin as per above access permission to keep your plugin compatible with upcoming release of DataHub.
Migration Of Plugins From application.yml
If you have any custom Authentication or Authorization plugin define in authorization
or authentication
section of application.yml then migrate them as per below steps.
Implement Plugin: For Authentication Plugin follow steps of Implementing an Authentication Plugin and for Authorization Plugin follow steps of Implementing an Authorization Plugin
Install Plugin: Install the plugins as per steps mentioned in Plugin Installation. Here you need to map the configuration from application.yml to configuration in
config.yml
. This mapping fromapplication.yml
toconfig.yml
is described belowMapping for Authenticators
a. In
config.yml
setplugins[].type
toauthenticator
b.
authentication.authenticators[].type
is mapped toplugins[].params.className
c.
authentication.authenticators[].configs
is mapped toplugins[].params.configs
Example Authenticator Plugin configuration in
config.yml
plugins:
- name: "apache-ranger-authenticator"
type: "authenticator"
enabled: "true"
params:
className: "com.abc.RangerAuthenticator"
configs:
username: "foo"
password: "fake"Mapping for Authorizer
a. In
config.yml
setplugins[].type
toauthorizer
b.
authorization.authorizers[].type
is mapped toplugins[].params.className
c.
authorization.authorizers[].configs
is mapped toplugins[].params.configs
Example Authorizer Plugin configuration in
config.yml
plugins:
- name: "apache-ranger-authorizer"
type: "authorizer"
enabled: "true"
params:
className: "com.abc.RangerAuthorizer"
configs:
username: "foo"
password: "fake"Move any other configurations files of your plugin to
plugin_home
directory. The detail aboutplugin_home
is mentioned in Plugin Installation section.