Federated query in Athena
In a big-data world the data lies around in different environments. The hardest challenge is to get a single place to query all of the environments. AWS has come up with a federated querying capability .
As shown in below picture Athena can query make use of this functionality to query against different environments.
Currently this feature is available as a preview and you need to ensure that the work group in Athena to avail this functionality need to be “AmazonAthenaPreviewFunctionality”
Making use of default connectors available in Athena :
In this section i am explaining how to make use of some default connectors available in Athena.
For explaining this use case i am taking an example of dynamo db.
select ```query a data source``` and under that select amazon dynamodb
select dynamo_connect lambda function and provide a catalog name. I have given dynamo_catalog
then connect to the specific data source.
Now you can query the dynamo db table by qualifying the catalog name and database name
Enable custom connectors.
Git clone : https://github.com/awslabs/aws-athena-query-federation
modify the connectors available or add your own connectors.
Then do ```mvn clean install```
for example if you are trying to add athena-jdbc
PWD : Documents/athena_connector/aws-athena-query-federation/athena-jdbc
run sh ../tools/publish.sh <s3 location > athena-jdbc us-east-1(region )
to publish the connector to your private AWS Serverless Application Repository. The S3_BUCKET in the command is where a copy of the connector’s code will be stored for Serverless Application Repository to retrieve it. This will allow users with permission to do so, the ability to deploy instances of the connector via 1-Click form. Then navigate to Serverless Application Repository
Steps to connect to the data sources from Athena
Select ```connect data source ``` link
Select ```query a datasource```
Choose the lambda function configured and provide a unique name for catalog.
This catalog name will be used in lambda environment variable to configure the connection
Configure environment variable in the lambda function selected. Name of the variable need to be <catalog_name>_connection_string
Document says we Can make use of aws secret manager to store the credentials.
Same can be retrieved on run time in lambda by specifying credential in below fashion.
But this functionality didn't work for me. Wanted to call this out. please comment on this article if you are able to get this working in for a jdbc connectivity other than default connectors available with Athena.
References : https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-jdbc
https://aws.amazon.com/blogs/big-data/query-any-data-source-with-amazon-athenas-new-federated-query/