Save To S3

Save To S3

Designed to be run from an ACTOR.RUN.SUCCEEDED webhook, this actor downloads a task run's default dataset and saves it to an S3 bucket.

AUTOMATIONOPEN_SOURCEApify

save-to-s3

An Apify actor to save the default dataset of a run to an S3 bucket.

It is designed to be called from the ACTOR.RUN.SUCCEEDED webhook of the actor that has generated the dataset.

This actor is compatible with API v2 - I made it because I couldn't get the Crawler Results To S3 actor to work with v2 actors.

Usage

AWS credentials and options for fomatting the data set are set on this actor's input, which are merged with the webhook's post data. You'll therefore need to create a task for your uploads so you can save common config such as your AWS credentials and dataset format details.

1. Create the task

Create a new task using the save-to-3 actor. This allows you to specify input to use every time the task is run. The webhook's post data will be merged with this at runtime - the values are those from the get actor run API endpoint, all grouped under a resource property.

The properties you can specify in your Input for the task:

PropertyDescription
accessKeyIdThe access key for the AWS user to connect with
secretAccessKeyThe secret access key for the AWS user to connect with
regionThe AWS region your bucket is located in (eg eu-west-2)
bucketThe bucket name to save files to
objectKeyFormatA string to specify the key (i.e. filename) for the S3 object you will save. You can specify any property from the input object using dot notation in a syntax similar to JavaScript template literals. For example, the defauult value ${resource.id}_${resource.startedAt}.${format} will yield an S3 object with a name something like SBNgQGmp87LtspHF1_2019-05-15T07:25:00.414Z.json.
formatMaps to the format parameter of the get dataset items API endpoint and accepts any of the valid string values
cleanMaps to the clean parameter of the get dataset items API endpoint
datasetOptionsAn object that allows you to specify any of the other parameters of the get dataset items API endpoint, for example { "offset": "10" } is the equivalent of settings ?offset=10 in the API call
debugLogA boolean indicating whether to use debug level logging

2. Create the webhook

Go to your save-to-s3 task's API tab and copy the URL for the Run Task endpoint, which will be in the format: https://api.apify.com/v2/actor-tasks/TASK_NAME_HERE/runs?token=YOUR_TOKEN_HERE

Go to either the actor or (more likely) the actor task you want to add save-to-s3 functionality to. In the Webhooks tab, add a webhook with the URL you just copied. For Event types, select ACTOR.RUN.SUCCEEDED. Then Save.

Security

Because you store yoour AWS user's key and secret as part of this actor's input, it is strongly recommended that you create an AWS IAM user specifically for Apify, and only grant access to the specific buckey you are using.

An example policy:

1{
2  "Version": "2012-10-17",
3  "Statement": [
4    {
5      "Effect": "Allow",
6      "Action": ["s3:GetBucketLocation", "s3:ListAllMyBuckets"],
7      "Resource": "arn:aws:s3:::*"
8    },
9    {
10      "Effect": "Allow",
11      "Action": "s3:*",
12      "Resource": ["arn:aws:s3:::YOUR-BUCKET", "arn:aws:s3:::YOUR-BUCKET/*"]
13    }
14  ]
15}

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!