Amazon S3 connector for Simpplr enterprise search

Updated 15 days ago

Introduction

The Amazon S3 connector allows Simpplr Enterprise Search to index objects from your AWS S3 buckets, making files and metadata discoverable in enterprise search alongside other connected systems.

With this connector, you can:

Unify S3 content with other enterprise sources in one search experience.
Power Smart Answers with S3 documents
Scope indexing to specific buckets configured in the connector.

Area	Support
Content types	S3 objects (files)
Metadata	Object key, bucket, size, owner display name, storage class, last modified time, URL
Permissions	No per-document access control from S3 IAM; all indexed objects are searchable by all enterprise search users (see Authentication and security)
Indexing	Full sync (list and enumerate objects); Incremental sync is not available for this connector
Scope	One or more bucket names in connector configuration, or * for all buckets the IAM principal can list
Search features	Keyword and semantic search depend on your index, ingest pipeline, and ML configuration

Objects and content supported

Object types

Type	Description
S3 object	One search document per object key within the configured scope

Metadata captured

For each indexed object, the connector records:

Field	Description
filename	Full object key in the bucket
bucket	Bucket name
size_in_bytes	Object size
owner	S3 owner display name when returned by the API (otherwise empty)
storage_class	e.g. STANDARD
_timestamp	Object last modified time (ISO-8601 UTC)
url	Link to open the object (see Search experience)

When the file type and size allow extraction, the ingest pipeline may also populate body (searchable text), file_extension, mime_type, and related fields.

What is not indexed

Bucket-level configuration only (policies, lifecycle rules) — not separate documents
Empty “folder” prefixes with no object at that key
Object versions other than the current version (versioning is not enumerated separately)
Content blocked by connector file-size or file-type rules (metadata may still be indexed)

Permissions model

S3 bucket and object permissions in AWS are not mirrored into per-user document security in Enterprise Search. Every object that is successfully indexed is treated as visible to all users who can use enterprise search.

Implication: Only index buckets whose contents are appropriate for organization-wide discovery. Use bucket IAM policies and the AWS Buckets setting to limit what enters the index.

Versions and editions supported

Category	Details
Supported	Amazon S3 in AWS commercial regions; access via IAM user access key and secret key
Not supported	Per-object ACL sync into search; incremental change feeds (see Crawling and sync behavior); on-premises S3-compatible endpoints unless explicitly supported by your deployment

Prerequisites

Before you begin, ensure the following:

Source system permissions

An IAM user or role that the connector will use, with permission to list and read objects in the target buckets.
Ability to create and manage IAM credentials (access key ID and secret access key) for that principal.
For direct HTTPS URLs on public buckets: additional read permissions on bucket configuration APIs (public access block, bucket policy status, bucket ACL). See Known limitations.

Typical IAM actions (adjust to your least-privilege policy):

s3:ListAllMyBuckets — when using ["*"] for buckets or for connection validation
s3:GetBucketLocation
s3:ListBucket — on configured buckets (and prefixes if restricted in IAM)
s3:GetObject — on objects to be indexed and downloaded for text extraction
s3:GetBucketPublicAccessBlock, s3:GetBucketPolicyStatus, s3:GetBucketAcl — optional, for public-bucket URL detection

Application credentials

AWS Access Key ID and AWS Secret Access Key for the connector IAM principal.
List of bucket names to index, or * to discover all buckets in the account (subject to IAM).

Network and firewall

Outbound HTTPS from the connector runtime to AWS S3 endpoints in the regions of your buckets.
No inbound connectivity from AWS to your network is required.

Authentication

Authentication mechanism

The connector connects to AWS using the AWS SDK with credentials supplied in the connector configuration:

Field	Required	Description
AWS Access Key Id	Yes	Access key for the IAM principal
AWS Secret Key	Yes	Secret key for that principal (stored as a sensitive field)
AWS Buckets	Yes	Bucket names to crawl, or * to list all buckets the principal can access

Setup and configuration

Step 1 — Prepare AWS

Sign in to the AWS Management Console (or use your IAM tooling), go to IAM -> Users.
Create or select an IAM user for the connector.
Attach policy - Either provide complete s3 read access by attaching policy AmazonS3ReadOnlyAccess that grants the required S3 actions on the buckets you plan to index.
Or
In the IAM console sidebar, click Policies, then click Create policy.
Click on the JSON tab/editor view.
Clear any default code and paste the following JSON block:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3ConnectorListBucketsWhenUsingWildcard",
      "Effect": "Allow",
      "Action": "s3:ListAllMyBuckets",
      "Resource": "*"
    },
    {
      "Sid": "S3ConnectorReadConfiguredBuckets",
      "Effect": "Allow",
      "Action": [
        "s3:GetBucketLocation",
        "s3:ListBucket",
        "s3:GetObject",
        "s3:GetPublicAccessBlock",
        "s3:GetBucketPolicyStatus",
        "s3:GetBucketAcl"
      ],
      "Resource": [
        "arn:aws:s3:::YOUR-BUCKET-1",
        "arn:aws:s3:::YOUR-BUCKET-1/*",
        "arn:aws:s3:::YOUR-BUCKET-2",
        "arn:aws:s3:::YOUR-BUCKET-2/*"
      ]
    }
  ]
}

Note - Replace YOUR-BUCKET-1 and YOUR-BUCKET-2 with your actual S3 bucket names. If you wish to target all buckets across your entire AWS account automatically, you can simplify the policy by changing the resource line to "Resource": "*".

Create an access key for that principal and save the access key ID and secret access key securely.
Confirm the principal user can list and get objects in each target bucket.
Go to Service Control Policies
Verify in all Service Control Policies that GetBucketPublicAccessBlock should not be denied.
Note: For example, in the attached policy, the GetBucketPublicAccessBlockaction is not marked as denied. Similarly, ensure that this is the case for all other attached policies.

Step 2 — Create the connector in Simpplr Enterprise Search

In Simpplr, go to Enterprise Search → Connectors → Add connector.
Select Amazon S3.
Enter basic information:
- Name: e.g. Corporate S3 documents
- Description: Optional
Provide authentication and scope:
- AWS Access Key Id and AWS Secret Key
- AWS Buckets: list of all buckets(comma separated), or * for all buckets the principal can list
Configure Audience based filtering.
- Include audiences
- Exclude audiences
Save the configuration and run Test connection (validates credentials via ListBuckets).
Start a full sync and monitor progress on the connector dashboard.

Step 3 — Monitor

Open the connector in Enterprise Search and review Health / sync status.
Confirm last sync completes successfully and documents indexed increases as expected.
Run a test search for a known object key or extracted text.

Crawling and sync behavior

Initial full crawl

On each sync, the connector:

Resolves target buckets from the configured AWS Buckets list or *.
Determines each bucket’s AWS region and uses a regional S3 client.
Lists objects in pages (default 100 per page) without loading entire buckets into memory.
For every object, indexes metadata (key, bucket, size, timestamps, URL, etc.).
For supported file types and sizes, downloads the object and runs text extraction so body is searchable.

Incremental updates

Item	Current behavior
Incremental sync	Not supported. The connector does not implement cursor- or event-based incremental listing.
Why	S3 list APIs do not support efficient “changed since” queries across a whole bucket without scanning keys.

Expected latency

Time from an upload to appearance in search depends on sync schedule, bucket size, and extraction load. Under normal conditions, expect updates after the next successful full sync, not in real time.

Field mapping and search experience

Default field mapping

Source (S3 / connector)	Index / search concept
Hash of bucket + key	Document id
Object key	filename (full path in bucket)
Bucket name	bucket
LastModified	_timestamp
Size	size_in_bytes
Owner display name	owner
Storage class	storage_class
Computed link	url
Extracted text	body
—	connector_type = s3

Search experience

Result layout: Typically shows title/path from filename, bucket, modified time, size, and an Open action using url.
Useful filters: Source = Amazon S3, file type (from extension/mime), bucket, owner (when present), last modified.
Autocomplete: Often wired to filename and bucket in search configuration.
Smart Answers / semantic search: Supported when your deployment indexes body and enables those features on this source.

Open behavior (url field)

Every document includes a url. The connector chooses the format based on whether the bucket is treated as public:

Bucket treatment	URL type	User experience
Private (default)	AWS Console object link	Opens the object in the AWS Console; user must sign in to AWS with access to the bucket
Public	Direct HTTPS S3 object URL	Opens or downloads the file in the browser if the object is anonymously readable

Public detection (once per bucket per sync) checks, in order: Block Public Access settings, bucket policy public status, then bucket ACL for “Everyone” read. If checks fail (e.g. AccessDenied), the connector uses a Console link.

Known limitations

Limitation	Details
No document-level security	All indexed objects are visible to all enterprise search users; S3 IAM is not enforced at query time
No incremental sync	Each sync re-lists objects in scope; large buckets take longer and consume more API calls
File size and type	Very large or unsupported types may be metadata-only (no body) per platform extraction rules
Static IAM keys	Long-lived access keys.
Public URL detection	Requires extra bucket configuration APIs; missing permissions force Console URLs even for public buckets
Versioning	Current object version only

Monitoring and troubleshooting

Connector health

In Enterprise Search, open the connector and review:

Metric	Description
Last sync status	Success / Warning / Failed
Last sync time	When the last job finished
Next scheduled sync	If scheduling is configured
Documents indexed	Approximate count in the index
Errors / warnings	From sync logs

Common issues and resolutions

Issue: Authentication failed

Symptoms: Test connection or sync fails immediately; invalid credentials errors in logs.

Possible causes:

Incorrect access key ID or secret key
IAM principal disabled or key rotated
Missing s3:ListAllMyBuckets (when using *) or bucket-specific permissions

Resolution:

Verify credentials in the connector configuration.
Confirm the IAM policy allows required actions on target buckets.
Re-test connection, then re-run sync.

Issue: Sync completes but no documents (or partial bucket)

Possible causes:

Wrong bucket names in configuration
IAM s3:ListBucket or s3:GetObject denied on configured buckets
Errors logged per bucket during list (connector logs a warning and skips further objects in that bucket)

Resolution:

Validate bucket names in the connector configuration against the AWS console.
Test list/get with the same IAM principal (AWS CLI or console).
Review connector logs for bucket-specific warnings.

Issue: Rate limit or throttling

Possible causes:

Very large buckets or high sync frequency
AWS S3 request rate limits for the bucket/prefix

Resolution:

Index fewer buckets (remove buckets from configuration or avoid *).
Increase sync interval.
Review AWS S3 metrics and request rate guidance for hot prefixes.

Issue: Open link goes to Console but bucket is public

Possible causes:

Connector IAM user lacks GetBucketPublicAccessBlock, GetBucketPolicyStatus, or GetBucketAcl
Organization SCP denies those APIs
Bucket is not actually public-readable at the object level

Resolution:

Grant read-only bucket configuration permissions to the connector principal.
Verify SCPs do not deny public access block / policy / ACL reads.
Confirm anonymous GetObject works for a test key.

When to contact support

Contact Simpplr Support if:

Authentication errors persist after verifying IAM and credentials
Sync is stuck or failed for several hours with no progress
Indexed counts are far below expected for a stable bucket scope
Search results lack Open links or extracted text for files that should be supported

Include:

Connector name and instance ID (if available)
Organization / tenant URL
Approximate time of the issue
Error messages or screenshots from the connector UI
Steps already tried from this guide

Frequently asked questions

Q. Can I index multiple buckets?
Ans: Yes. List multiple bucket names in AWS Buckets, or use * to discover all buckets the IAM principal can list.

Q. How often does the connector sync?
Ans: Scheduling is configured in Enterprise Search. Because only full sync is supported today, daily sync is a common recommendation for active buckets.

Q. Is version history indexed?
Ans: No. One document per current object key; S3 versioning history is not enumerated.

Q. Does the connector index content from public or shared links?
Ans: It indexes objects your IAM principal can list and read. Public anonymous access does not add separate “guest” security in search—all indexed docs remain visible to all search users.

Q. What happens when a user loses access to AWS?
Ans: Enterprise Search does not re-check S3 IAM per user at query time. Restrict indexed content by bucket scope and IAM on the connector principal.

Q. Can I exclude paths or file types?
Ans: Not in the current product configuration. Choose buckets whose contents are safe to expose in search, and use IAM on the connector principle to restrict which objects can be listed and read.

Was this article helpful?

Subscribe to receive updates on this article