/ /

Amazon S3 connector for Simpplr enterprise search

Updated 15 days ago

Introduction

The Amazon S3 connector allows Simpplr Enterprise Search to index objects from your AWS S3 buckets, making files and metadata discoverable in enterprise search alongside other connected systems.

With this connector, you can:

  • Unify S3 content with other enterprise sources in one search experience.

  • Power Smart Answers with S3 documents

  • Scope indexing to specific buckets configured in the connector.

Area

Support

Content types

S3 objects (files)

Metadata

Object key, bucket, size, owner display name, storage class, last modified time, URL

Permissions

No per-document access control from S3 IAM; all indexed objects are searchable by all enterprise search users (see Authentication and security)

Indexing

Full sync (list and enumerate objects);

Incremental sync is not available for this connector

Scope

One or more bucket names in connector configuration, or * for all buckets the IAM principal can list

Search features

Keyword and semantic search depend on your index, ingest pipeline, and ML configuration

Objects and content supported

Object types

Type

Description

S3 object

One search document per object key within the configured scope

Metadata captured

For each indexed object, the connector records:

Field

Description

filename

Full object key in the bucket

bucket

Bucket name

size_in_bytes

Object size

owner

S3 owner display name when returned by the API (otherwise empty)

storage_class

e.g. STANDARD

_timestamp

Object last modified time (ISO-8601 UTC)

url

Link to open the object (see Search experience)

When the file type and size allow extraction, the ingest pipeline may also populate body (searchable text), file_extension, mime_type, and related fields.

What is not indexed

  • Bucket-level configuration only (policies, lifecycle rules) — not separate documents

  • Empty “folder” prefixes with no object at that key

  • Object versions other than the current version (versioning is not enumerated separately)

  • Content blocked by connector file-size or file-type rules (metadata may still be indexed)

Permissions model

S3 bucket and object permissions in AWS are not mirrored into per-user document security in Enterprise Search. Every object that is successfully indexed is treated as visible to all users who can use enterprise search.

Implication: Only index buckets whose contents are appropriate for organization-wide discovery. Use bucket IAM policies and the AWS Buckets setting to limit what enters the index.

Versions and editions supported

Category

Details

Supported

Amazon S3 in AWS commercial regions; access via IAM user access key and secret key

Not supported

Per-object ACL sync into search; incremental change feeds (see Crawling and sync behavior); on-premises S3-compatible endpoints unless explicitly supported by your deployment

Prerequisites

Before you begin, ensure the following:

Source system permissions

  • An IAM user or role that the connector will use, with permission to list and read objects in the target buckets.

  • Ability to create and manage IAM credentials (access key ID and secret access key) for that principal.

  • For direct HTTPS URLs on public buckets: additional read permissions on bucket configuration APIs (public access block, bucket policy status, bucket ACL). See Known limitations.

Typical IAM actions (adjust to your least-privilege policy):

  • s3:ListAllMyBuckets — when using ["*"] for buckets or for connection validation

  • s3:GetBucketLocation

  • s3:ListBucket — on configured buckets (and prefixes if restricted in IAM)

  • s3:GetObject — on objects to be indexed and downloaded for text extraction

  • s3:GetBucketPublicAccessBlock, s3:GetBucketPolicyStatus, s3:GetBucketAcl — optional, for public-bucket URL detection

Application credentials

  • AWS Access Key ID and AWS Secret Access Key for the connector IAM principal.

  • List of bucket names to index, or * to discover all buckets in the account (subject to IAM).

Network and firewall

  • Outbound HTTPS from the connector runtime to AWS S3 endpoints in the regions of your buckets.

  • No inbound connectivity from AWS to your network is required.

Authentication

Authentication mechanism

The connector connects to AWS using the AWS SDK with credentials supplied in the connector configuration:

Field

Required

Description

AWS Access Key Id

Yes

Access key for the IAM principal

AWS Secret Key

Yes

Secret key for that principal (stored as a sensitive field)

AWS Buckets

Yes

Bucket names to crawl, or * to list all buckets the principal can access

Setup and configuration

Step 1 — Prepare AWS

  1. Sign in to the AWS Management Console (or use your IAM tooling), go to IAM -> Users.

  2. Create or select an IAM user for the connector.
    image9.png

  3. Attach policy  - Either provide complete s3 read access by attaching policy AmazonS3ReadOnlyAccess that grants the required S3 actions on the buckets you plan to index.
    Or
    In the IAM console sidebar, click Policies, then click Create policy.
    Click on the JSON tab/editor view.
    Clear any default code and paste the following JSON block:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3ConnectorListBucketsWhenUsingWildcard",
      "Effect": "Allow",
      "Action": "s3:ListAllMyBuckets",
      "Resource": "*"
    },
    {
      "Sid": "S3ConnectorReadConfiguredBuckets",
      "Effect": "Allow",
      "Action": [
        "s3:GetBucketLocation",
        "s3:ListBucket",
        "s3:GetObject",
        "s3:GetPublicAccessBlock",
        "s3:GetBucketPolicyStatus",
        "s3:GetBucketAcl"
      ],
      "Resource": [
        "arn:aws:s3:::YOUR-BUCKET-1",
        "arn:aws:s3:::YOUR-BUCKET-1/*",
        "arn:aws:s3:::YOUR-BUCKET-2",
        "arn:aws:s3:::YOUR-BUCKET-2/*"
      ]
    }
  ]
}

image13.png
Note - Replace YOUR-BUCKET-1 and YOUR-BUCKET-2 with your actual S3 bucket names. If you wish to target all buckets across your entire AWS account automatically, you can simplify the policy by changing the resource line to "Resource": "*".
image12.png

  1. Create an access key for that principal and save the access key ID and secret access key securely.
    image2.png

    image11.pngimage7.png

  2. Confirm the principal user can list and get objects in each target bucket.

  3. Go to Service Control Policies
    image10.png

  4. Verify in all Service Control Policies that GetBucketPublicAccessBlock should not be denied.
    Note: For example, in the attached policy, the GetBucketPublicAccessBlockaction is not marked as denied. Similarly, ensure that this is the case for all other attached policies.
    image4.png

Step 2 — Create the connector in Simpplr Enterprise Search

  1. In Simpplr, go to Enterprise Search → Connectors → Add connector.
    image8.png

    image5.png

  2. Select Amazon S3.
    image3.png

  3. Enter basic information:

    • Name: e.g. Corporate S3 documents

    • Description: Optional

  4. Provide authentication and scope:

    • AWS Access Key Id and AWS Secret Key

    • AWS Buckets: list of all buckets(comma separated), or * for all buckets the principal can list
      image1.png

  5. Configure Audience based filtering.

    • Include audiences

    • Exclude audiences
      image6.png

  6. Save the configuration and run Test connection (validates credentials via ListBuckets).

  7. Start a full sync and monitor progress on the connector dashboard.

Step 3 — Monitor

  1. Open the connector in Enterprise Search and review Health / sync status.

  2. Confirm last sync completes successfully and documents indexed increases as expected.

  3. Run a test search for a known object key or extracted text.

Crawling and sync behavior

Initial full crawl

On each sync, the connector:

  1. Resolves target buckets from the configured AWS Buckets list or *.

  2. Determines each bucket’s AWS region and uses a regional S3 client.

  3. Lists objects in pages (default 100 per page) without loading entire buckets into memory.

  4. For every object, indexes metadata (key, bucket, size, timestamps, URL, etc.).

  5. For supported file types and sizes, downloads the object and runs text extraction so body is searchable.

Incremental updates

Item

Current behavior

Incremental sync

Not supported. The connector does not implement cursor- or event-based incremental listing.

Why

S3 list APIs do not support efficient “changed since” queries across a whole bucket without scanning keys.

Expected latency

Time from an upload to appearance in search depends on sync schedule, bucket size, and extraction load. Under normal conditions, expect updates after the next successful full sync, not in real time.

Field mapping and search experience

Default field mapping

Source (S3 / connector)

Index / search concept

Hash of bucket + key

Document id

Object key

filename (full path in bucket)

Bucket name

bucket

LastModified

_timestamp

Size

size_in_bytes

Owner display name

owner

Storage class

storage_class

Computed link

url

Extracted text

body

connector_type = s3

Search experience

  • Result layout: Typically shows title/path from filename, bucket, modified time, size, and an Open action using url.

  • Useful filters: Source = Amazon S3, file type (from extension/mime), bucket, owner (when present), last modified.

  • Autocomplete: Often wired to filename and bucket in search configuration.

  • Smart Answers / semantic search: Supported when your deployment indexes body and enables those features on this source.

Open behavior (url field)

Every document includes a url. The connector chooses the format based on whether the bucket is treated as public:

Bucket treatment

URL type

User experience

Private (default)

AWS Console object link

Opens the object in the AWS Console; user must sign in to AWS with access to the bucket

Public

Direct HTTPS S3 object URL

Opens or downloads the file in the browser if the object is anonymously readable

Public detection (once per bucket per sync) checks, in order: Block Public Access settings, bucket policy public status, then bucket ACL for “Everyone” read. If checks fail (e.g. AccessDenied), the connector uses a Console link.

Known limitations

Limitation

Details

No document-level security

All indexed objects are visible to all enterprise search users; S3 IAM is not enforced at query time

No incremental sync

Each sync re-lists objects in scope; large buckets take longer and consume more API calls

File size and type

Very large or unsupported types may be metadata-only (no body) per platform extraction rules

Static IAM keys

Long-lived access keys.

Public URL detection

Requires extra bucket configuration APIs; missing permissions force Console URLs even for public buckets

Versioning

Current object version only

Monitoring and troubleshooting

Connector health

In Enterprise Search, open the connector and review:

Metric

Description

Last sync status

Success / Warning / Failed

Last sync time

When the last job finished

Next scheduled sync

If scheduling is configured

Documents indexed

Approximate count in the index

Errors / warnings

From sync logs

Common issues and resolutions

Issue: Authentication failed

Symptoms: Test connection or sync fails immediately; invalid credentials errors in logs.

Possible causes:

  • Incorrect access key ID or secret key

  • IAM principal disabled or key rotated

  • Missing s3:ListAllMyBuckets (when using *) or bucket-specific permissions

Resolution:

  1. Verify credentials in the connector configuration.

  2. Confirm the IAM policy allows required actions on target buckets.

  3. Re-test connection, then re-run sync.

Issue: Sync completes but no documents (or partial bucket)

Possible causes:

  • Wrong bucket names in configuration

  • IAM s3:ListBucket or s3:GetObject denied on configured buckets

  • Errors logged per bucket during list (connector logs a warning and skips further objects in that bucket)

Resolution:

  1. Validate bucket names in the connector configuration against the AWS console.

  2. Test list/get with the same IAM principal (AWS CLI or console).

  3. Review connector logs for bucket-specific warnings.

Issue: Rate limit or throttling

Possible causes:

  • Very large buckets or high sync frequency

  • AWS S3 request rate limits for the bucket/prefix

Resolution:

  1. Index fewer buckets (remove buckets from configuration or avoid *).

  2. Increase sync interval.

  3. Review AWS S3 metrics and request rate guidance for hot prefixes.

Issue: Open link goes to Console but bucket is public

Possible causes:

  • Connector IAM user lacks GetBucketPublicAccessBlock, GetBucketPolicyStatus, or GetBucketAcl

  • Organization SCP denies those APIs

  • Bucket is not actually public-readable at the object level

Resolution:

  1. Grant read-only bucket configuration permissions to the connector principal.

  2. Verify SCPs do not deny public access block / policy / ACL reads.

  3. Confirm anonymous GetObject works for a test key.

When to contact support

Contact Simpplr Support if:

  • Authentication errors persist after verifying IAM and credentials

  • Sync is stuck or failed for several hours with no progress

  • Indexed counts are far below expected for a stable bucket scope

  • Search results lack Open links or extracted text for files that should be supported

Include:

  • Connector name and instance ID (if available)

  • Organization / tenant URL

  • Approximate time of the issue

  • Error messages or screenshots from the connector UI

  • Steps already tried from this guide

Frequently asked questions

Q. Can I index multiple buckets?
Ans: Yes. List multiple bucket names in AWS Buckets, or use * to discover all buckets the IAM principal can list.

Q. How often does the connector sync?
Ans: Scheduling is configured in Enterprise Search. Because only full sync is supported today, daily sync is a common recommendation for active buckets.

Q. Is version history indexed?
Ans: No. One document per current object key; S3 versioning history is not enumerated.

Q. Does the connector index content from public or shared links?
Ans: It indexes objects your IAM principal can list and read. Public anonymous access does not add separate “guest” security in search—all indexed docs remain visible to all search users.

Q. What happens when a user loses access to AWS?
Ans: Enterprise Search does not re-check S3 IAM per user at query time. Restrict indexed content by bucket scope and IAM on the connector principal.

Q. Can I exclude paths or file types?
Ans: Not in the current product configuration. Choose buckets whose contents are safe to expose in search, and use IAM on the connector principle to restrict which objects can be listed and read.



Was this article helpful?
Subscribe to receive updates on this article