AWS Athena: Adding yyyy/mm/dd partitions to an S3 table
Assume data in S3:
s3://mybucket/foo/2021/07/27/data.gz
s3://mybucket/foo/2021/07/28/data.gz
s3://mybucket/foo/2021/07/29/data.gz
s3://mybucket/foo/2021/07/30/data.gz
And AWS Athena table linked to the S3 bucket, with the partitions Year, Month, and Day.
CREATE external TABLE foos (...)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT SERDE ...
LOCATION 's3://mybucket/foo';
After creating a table, partitions must be created. The docs suggests the following command:
MSCK REPAIR TABLE foos;
However, this won't work. The query will succeed but with warnings:
Partitions not in metastore:
foo:2021/07/27,
foo:2021/07/28,
foo:2021/07/29,
foo:2021/07/30.
This is because MSCK REPAIR TABLE foos supports only partitioning with key-value pairs in file paths, such as:
s3://mybucket/foo/year=2021/month=07/day=27/data.gz
s3://mybucket/foo/year=2021/month=07/day=28/data.gz
s3://mybucket/foo/year=2021/month=07/day=29/data.gz
s3://mybucket/foo/year=2021/month=07/day=30/data.gz
Partitions without the key-value pairs must be added manually:
ALTER TABLE foos ADD
PARTITION (year='2021',month='07',day='27')
location 's3://mybucket/foo/2021/07/27'
PARTITION (year='2021',month='07',day='28')
location 's3://mybucket/foo/2021/07/28'
PARTITION (year='2021',month='07',day='29')
location 's3://mybucket/foo/2021/07/29'
PARTITION (year='2021',month='07',day='30')
location 's3://mybucket/foo/2021/07/30';
More info:
- Amazon Forum answer from 2017
- StackOverflow
- AWS Partitioning Docs
- Alter Table Add Partition Synopsis
- ← Previous post: PostgreSQL vs Aurora PostgreSQL vs Aurora PostgreSQL Serverless
- → Next post: Functional programming: driving team adoption
This blog is written by Marcel Krcah, an independent consultant for product-oriented software engineering. If you like what you read, sign up for my newsletter