Skip to main content
Marcel Krčah

AWS Athena: Adding yyyy/mm/dd partitions to an S3 table

Published on , in ,

Assume data in S3:

s3://mybucket/foo/2021/07/27/data.gz
s3://mybucket/foo/2021/07/28/data.gz
s3://mybucket/foo/2021/07/29/data.gz
s3://mybucket/foo/2021/07/30/data.gz

And AWS Athena table linked to the S3 bucket, with the partitions Year, Month, and Day.

CREATE external TABLE foos (...)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT SERDE ...
LOCATION  's3://mybucket/foo';

After creating a table, partitions must be created. The docs suggests the following command:

MSCK REPAIR TABLE foos;

However, this won't work. The query will succeed but with warnings:

Partitions not in metastore:
foo:2021/07/27,
foo:2021/07/28,
foo:2021/07/29,
foo:2021/07/30.

This is because MSCK REPAIR TABLE foos supports only partitioning with key-value pairs in file paths, such as:

s3://mybucket/foo/year=2021/month=07/day=27/data.gz
s3://mybucket/foo/year=2021/month=07/day=28/data.gz
s3://mybucket/foo/year=2021/month=07/day=29/data.gz
s3://mybucket/foo/year=2021/month=07/day=30/data.gz

Partitions without the key-value pairs must be added manually:

ALTER TABLE foos ADD
  PARTITION (year='2021',month='07',day='27')
  	location 's3://mybucket/foo/2021/07/27'
  PARTITION (year='2021',month='07',day='28')
  	location 's3://mybucket/foo/2021/07/28'
  PARTITION (year='2021',month='07',day='29')
  	location 's3://mybucket/foo/2021/07/29'
  PARTITION (year='2021',month='07',day='30')
  	location 's3://mybucket/foo/2021/07/30';

More info:

This blog is written by Marcel Krcah, an independent consultant for product-oriented software engineering. If you like what you read, sign up for my newsletter