Reading an excel file from AWS S3 using Python?
To read and load an Excel file from AWS S3 using Python, you basically need two libraries, boto3 and pandas along with Amazon S3 API to extract and accumulate the data contained inside an excel file.
Boto3 is the Amazon Web Services (AWS) Applications Development Kit (SDK) for Python, enabling Python programmers to develop software that uses AWS services such as Amazon S3 and EC2.
Here is an example of how you can use these libraries to read and load an Excel file from S3:
import boto3
import pandas as pd# Create an S3 client
s3 = boto3.client('s3')# Set the name of the bucket and the file key
bucket_name = 'BUCKET_NAME'
file_key = 'FILE_KEY'# Download the file from S3
s3.download_file(bucket_name, file_key, 'local_file.xlsx')
# Load the data from the downloaded file into a pandas DataFrame
df = pd.read_excel('local_file.xlsx')
# Do something with the data
print(df)
This code creates an S3 client using the boto3
library, then uses the download_file
method to download the Excel file from the specified S3 bucket and save it to the local filesystem. It then uses the read_excel
method of the pandas
library to load the data from the file into a pandas
DataFrame.
Note: Don’t forget to replace BUCKET_NAME
and FILE_KEY
with the appropriate values for your S3 bucket and file.