Federated query in Athena

Jerryverghesecheruvathoor
4 min readSep 6, 2020

--

In a big-data world the data lies around in different environments. The hardest challenge is to get a single place to query all of the environments. AWS has come up with a federated querying capability .

As shown in below picture Athena can query make use of this functionality to query against different environments.

Currently this feature is available as a preview and you need to ensure that the work group in Athena to avail this functionality need to be “AmazonAthenaPreviewFunctionality”

Making use of default connectors available in Athena :

In this section i am explaining how to make use of some default connectors available in Athena.

For explaining this use case i am taking an example of dynamo db.

select ```query a data source``` and under that select amazon dynamodb

select dynamo_connect lambda function and provide a catalog name. I have given dynamo_catalog

then connect to the specific data source.

Now you can query the dynamo db table by qualifying the catalog name and database name

Enable custom connectors.

Git clone : https://github.com/awslabs/aws-athena-query-federation

modify the connectors available or add your own connectors.

Then do ```mvn clean install```

for example if you are trying to add athena-jdbc

PWD : Documents/athena_connector/aws-athena-query-federation/athena-jdbc

run sh ../tools/publish.sh <s3 location > athena-jdbc us-east-1(region )

to publish the connector to your private AWS Serverless Application Repository. The S3_BUCKET in the command is where a copy of the connector’s code will be stored for Serverless Application Repository to retrieve it. This will allow users with permission to do so, the ability to deploy instances of the connector via 1-Click form. Then navigate to Serverless Application Repository

Steps to connect to the data sources from Athena

Select ```connect data source ``` link

Select ```query a datasource```

Choose the lambda function configured and provide a unique name for catalog.

This catalog name will be used in lambda environment variable to configure the connection

Configure environment variable in the lambda function selected. Name of the variable need to be <catalog_name>_connection_string

Document says we Can make use of aws secret manager to store the credentials.

Same can be retrieved on run time in lambda by specifying credential in below fashion.

But this functionality didn't work for me. Wanted to call this out. please comment on this article if you are able to get this working in for a jdbc connectivity other than default connectors available with Athena.

References : https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-jdbc

https://aws.amazon.com/blogs/big-data/query-any-data-source-with-amazon-athenas-new-federated-query/

--

--

Responses (1)