GetOData

Save To S3 – Effortlessly Archive Task Data to Your S3 Bucket

2 min read

Intro:

The Save To S3 actor from Apify.com allows users to effortlessly save data sets directly to an Amazon S3 bucket. This is particularly useful for developers and data engineers who need to store web scraping results or other processed data in a reliable cloud storage solution.

🔍 What Is Save To S3?

The Save To S3 actor enables users to automatically send the default dataset from an Apify run to an Amazon S3 bucket, enhancing data management and accessibility. It provides seamless integration for users running other Apify actors, particularly those utilizing API v2, allowing efficient storage of scraped data in a secure and scalable environment.

✨ Features

  • Automated Data Storage: Automatically save datasets to S3 upon completion of an actor run.
  • Webhook Integration: Responds to ACTOR.RUN.SUCCEEDED webhooks for real-time data handling.
  • Customizable Input: Allows configuration for AWS credentials and dataset format directly within the actor’s task.
  • Support for API v2: Optimized for integration with newer versions of Apify's API.
  • Data Format Flexibility: Choose from various formats for saving datasets (e.g. JSON, CSV).

🛠️ How to Use It

Step-by-step tutorial:

  1. Go to the tool’s page: Save To S3
  2. Click “Try for free” or “Run actor”.
  3. Fill in the required input fields:
    • accessKeyId: AWS access key ID
    • secretAccessKey: AWS secret access key
    • region: AWS region of your bucket (e.g., eu-west-2)
    • bucket: Your S3 bucket name
    • objectKeyFormat: Specify the desired key format for the saved object
    • format: Choose the dataset format (e.g., JSON, CSV)
    • clean: Specify if the dataset should be cleaned
    • datasetOptions: Any additional parameters for dataset handling
    • debugLog: Set to true for debugging logs
  4. Click “Run” and wait for results.
  5. Download results or send them to a specified webhook.

🧪 Sample Input (JSON)

json { "accessKeyId": "your-access-key-id", "secretAccessKey": "your-secret-access-key", "region": "eu-west-2", "bucket": "your-bucket-name", "objectKeyFormat": "${resource.id}_${resource.startedAt}.json", "format": "json", "clean": true, "datasetOptions": {}, "debugLog": false }

📤 Output Data (Fields)

  • resource.id: Unique identifier for the action run
  • resource.startedAt: Timestamp of when the action started
  • Other dataset fields based on your scraping results

💰 Pricing
This actor is priced at $0.01 per run, with a pay-as-you-go model. There is also a free tier available for users to test the functionalities.

👨‍💻 Built By
Drinksight — from Apify.com

✅ Final Thoughts
The Save To S3 actor is perfect for developers, data engineers, and businesses that rely on Apify for web scraping. It simplifies the workflow of storing scraped data securely in the cloud and is a must-try for anyone looking to streamline their data management process.

🔗 Try the Actor Now
👉 Save To S3