AWS – Amazon EMR announces S3A as the default connector
AWS announces Amazon EMR S3A, a new Amazon S3 connector that optimizes performance for Apache Hadoop, Apache Spark, and Apache Hive workloads on Amazon EMR. This new connector enhances the open source S3A architecture with AWS-specific optimizations to help organizations process large-scale data more efficiently. With direct integration support for S3 Express One Zone, S3 Glacier, and AWS Outposts, EMR S3A helps customers leverage different storage options in AWS to optimize both data access speed and storage cost on their EMR workloads.
Additionally, the EMR S3A connector delivers advanced security features and performance capabilities that extend beyond open source S3A. Key improvements include Apache Spark built-in fine-grained access control support, enhanced S3A credentials resolver, MagicCommitter V2 for optimized file writes, and accelerated S3 prefix listing for columnar file formats. These enhancements are available starting with EMR release 7.10 and maintain compatibility with existing applications.
The Amazon EMR S3A connector is available in all AWS Regions where Amazon EMR is available and comes pre-configured with Amazon EMR release version 7.10 and later. To learn more about Amazon EMR S3A, see the Amazon EMR documentation.
Read More for the details.