The Amazon S3 connector allows Simpplr Enterprise Search to index objects from your AWS S3 buckets, making files and metadata discoverable in enterprise search alongside other connected systems.
With this connector, you can:
Unify S3 content with other enterprise sources in one search experience.
Power Smart Answers with S3 documents
Scope indexing to specific buckets configured in the connector.
Area | Support |
|---|---|
Content types | S3 objects (files) |
Metadata | Object key, bucket, size, owner display name, storage class, last modified time, URL |
Permissions | No per-document access control from S3 IAM; all indexed objects are searchable by all enterprise search users (see Authentication and security) |
Indexing | Full sync (list and enumerate objects); Incremental sync is not available for this connector |
Scope | One or more bucket names in connector configuration, or * for all buckets the IAM principal can list |
Search features | Keyword and semantic search depend on your index, ingest pipeline, and ML configuration |
Type | Description |
|---|---|
S3 object | One search document per object key within the configured scope |
For each indexed object, the connector records:
Field | Description |
|---|---|
filename | Full object key in the bucket |
bucket | Bucket name |
size_in_bytes | Object size |
owner | S3 owner display name when returned by the API (otherwise empty) |
storage_class | e.g. STANDARD |
_timestamp | Object last modified time (ISO-8601 UTC) |
url | Link to open the object (see Search experience) |
When the file type and size allow extraction, the ingest pipeline may also populate body (searchable text), file_extension, mime_type, and related fields.
Bucket-level configuration only (policies, lifecycle rules) — not separate documents
Empty “folder” prefixes with no object at that key
Object versions other than the current version (versioning is not enumerated separately)
Content blocked by connector file-size or file-type rules (metadata may still be indexed)
S3 bucket and object permissions in AWS are not mirrored into per-user document security in Enterprise Search. Every object that is successfully indexed is treated as visible to all users who can use enterprise search.
Implication: Only index buckets whose contents are appropriate for organization-wide discovery. Use bucket IAM policies and the AWS Buckets setting to limit what enters the index.
Category | Details |
|---|---|
Supported | Amazon S3 in AWS commercial regions; access via IAM user access key and secret key |
Not supported | Per-object ACL sync into search; incremental change feeds (see Crawling and sync behavior); on-premises S3-compatible endpoints unless explicitly supported by your deployment |
Before you begin, ensure the following:
An IAM user or role that the connector will use, with permission to list and read objects in the target buckets.
Ability to create and manage IAM credentials (access key ID and secret access key) for that principal.
For direct HTTPS URLs on public buckets: additional read permissions on bucket configuration APIs (public access block, bucket policy status, bucket ACL). See Known limitations.
Typical IAM actions (adjust to your least-privilege policy):
s3:ListAllMyBuckets — when using ["*"] for buckets or for connection validation
s3:GetBucketLocation
s3:ListBucket — on configured buckets (and prefixes if restricted in IAM)
s3:GetObject — on objects to be indexed and downloaded for text extraction
s3:GetBucketPublicAccessBlock, s3:GetBucketPolicyStatus, s3:GetBucketAcl — optional, for public-bucket URL detection
AWS Access Key ID and AWS Secret Access Key for the connector IAM principal.
List of bucket names to index, or * to discover all buckets in the account (subject to IAM).
Outbound HTTPS from the connector runtime to AWS S3 endpoints in the regions of your buckets.
No inbound connectivity from AWS to your network is required.
The connector connects to AWS using the AWS SDK with credentials supplied in the connector configuration:
Field | Required | Description |
|---|---|---|
AWS Access Key Id | Yes | Access key for the IAM principal |
AWS Secret Key | Yes | Secret key for that principal (stored as a sensitive field) |
AWS Buckets | Yes | Bucket names to crawl, or * to list all buckets the principal can access |
Sign in to the AWS Management Console (or use your IAM tooling), go to IAM -> Users.
Create or select an IAM user for the connector.
Attach policy - Either provide complete s3 read access by attaching policy AmazonS3ReadOnlyAccess that grants the required S3 actions on the buckets you plan to index.
Or
In the IAM console sidebar, click Policies, then click Create policy.
Click on the JSON tab/editor view.
Clear any default code and paste the following JSON block:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "S3ConnectorListBucketsWhenUsingWildcard",
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "*"
},
{
"Sid": "S3ConnectorReadConfiguredBuckets",
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetObject",
"s3:GetPublicAccessBlock",
"s3:GetBucketPolicyStatus",
"s3:GetBucketAcl"
],
"Resource": [
"arn:aws:s3:::YOUR-BUCKET-1",
"arn:aws:s3:::YOUR-BUCKET-1/*",
"arn:aws:s3:::YOUR-BUCKET-2",
"arn:aws:s3:::YOUR-BUCKET-2/*"
]
}
]
}
Note - Replace YOUR-BUCKET-1 and YOUR-BUCKET-2 with your actual S3 bucket names. If you wish to target all buckets across your entire AWS account automatically, you can simplify the policy by changing the resource line to "Resource": "*".
Create an access key for that principal and save the access key ID and secret access key securely.
Confirm the principal user can list and get objects in each target bucket.
Go to Service Control Policies
Verify in all Service Control Policies that GetBucketPublicAccessBlock should not be denied.
Note: For example, in the attached policy, the GetBucketPublicAccessBlockaction is not marked as denied. Similarly, ensure that this is the case for all other attached policies.
In Simpplr, go to Enterprise Search → Connectors → Add connector.
Select Amazon S3.
Enter basic information:
Name: e.g. Corporate S3 documents
Description: Optional
Provide authentication and scope:
AWS Access Key Id and AWS Secret Key
AWS Buckets: list of all buckets(comma separated), or * for all buckets the principal can list
Configure Audience based filtering.
Include audiences
Exclude audiences
Save the configuration and run Test connection (validates credentials via ListBuckets).
Start a full sync and monitor progress on the connector dashboard.
Open the connector in Enterprise Search and review Health / sync status.
Confirm last sync completes successfully and documents indexed increases as expected.
Run a test search for a known object key or extracted text.
On each sync, the connector:
Resolves target buckets from the configured AWS Buckets list or *.
Determines each bucket’s AWS region and uses a regional S3 client.
Lists objects in pages (default 100 per page) without loading entire buckets into memory.
For every object, indexes metadata (key, bucket, size, timestamps, URL, etc.).
For supported file types and sizes, downloads the object and runs text extraction so body is searchable.
Item | Current behavior |
|---|---|
Incremental sync | Not supported. The connector does not implement cursor- or event-based incremental listing. |
Why | S3 list APIs do not support efficient “changed since” queries across a whole bucket without scanning keys. |
Time from an upload to appearance in search depends on sync schedule, bucket size, and extraction load. Under normal conditions, expect updates after the next successful full sync, not in real time.
Source (S3 / connector) | Index / search concept |
|---|---|
Hash of bucket + key | Document id |
Object key | filename (full path in bucket) |
Bucket name | bucket |
LastModified | _timestamp |
Size | size_in_bytes |
Owner display name | owner |
Storage class | storage_class |
Computed link | url |
Extracted text | body |
— | connector_type = s3 |
Result layout: Typically shows title/path from filename, bucket, modified time, size, and an Open action using url.
Useful filters: Source = Amazon S3, file type (from extension/mime), bucket, owner (when present), last modified.
Autocomplete: Often wired to filename and bucket in search configuration.
Smart Answers / semantic search: Supported when your deployment indexes body and enables those features on this source.
Every document includes a url. The connector chooses the format based on whether the bucket is treated as public:
Bucket treatment | URL type | User experience |
|---|---|---|
Private (default) | AWS Console object link | Opens the object in the AWS Console; user must sign in to AWS with access to the bucket |
Public | Direct HTTPS S3 object URL | Opens or downloads the file in the browser if the object is anonymously readable |
Public detection (once per bucket per sync) checks, in order: Block Public Access settings, bucket policy public status, then bucket ACL for “Everyone” read. If checks fail (e.g. AccessDenied), the connector uses a Console link.
Limitation | Details |
|---|---|
No document-level security | All indexed objects are visible to all enterprise search users; S3 IAM is not enforced at query time |
No incremental sync | Each sync re-lists objects in scope; large buckets take longer and consume more API calls |
File size and type | Very large or unsupported types may be metadata-only (no body) per platform extraction rules |
Static IAM keys | Long-lived access keys. |
Public URL detection | Requires extra bucket configuration APIs; missing permissions force Console URLs even for public buckets |
Versioning | Current object version only |
In Enterprise Search, open the connector and review:
Metric | Description |
|---|---|
Last sync status | Success / Warning / Failed |
Last sync time | When the last job finished |
Next scheduled sync | If scheduling is configured |
Documents indexed | Approximate count in the index |
Errors / warnings | From sync logs |
Issue: Authentication failed
Symptoms: Test connection or sync fails immediately; invalid credentials errors in logs.
Possible causes:
Incorrect access key ID or secret key
IAM principal disabled or key rotated
Missing s3:ListAllMyBuckets (when using *) or bucket-specific permissions
Resolution:
Verify credentials in the connector configuration.
Confirm the IAM policy allows required actions on target buckets.
Re-test connection, then re-run sync.
Issue: Sync completes but no documents (or partial bucket)
Possible causes:
Wrong bucket names in configuration
IAM s3:ListBucket or s3:GetObject denied on configured buckets
Errors logged per bucket during list (connector logs a warning and skips further objects in that bucket)
Resolution:
Validate bucket names in the connector configuration against the AWS console.
Test list/get with the same IAM principal (AWS CLI or console).
Review connector logs for bucket-specific warnings.
Issue: Rate limit or throttling
Possible causes:
Very large buckets or high sync frequency
AWS S3 request rate limits for the bucket/prefix
Resolution:
Index fewer buckets (remove buckets from configuration or avoid *).
Increase sync interval.
Review AWS S3 metrics and request rate guidance for hot prefixes.
Issue: Open link goes to Console but bucket is public
Possible causes:
Connector IAM user lacks GetBucketPublicAccessBlock, GetBucketPolicyStatus, or GetBucketAcl
Organization SCP denies those APIs
Bucket is not actually public-readable at the object level
Resolution:
Grant read-only bucket configuration permissions to the connector principal.
Verify SCPs do not deny public access block / policy / ACL reads.
Confirm anonymous GetObject works for a test key.
Contact Simpplr Support if:
Authentication errors persist after verifying IAM and credentials
Sync is stuck or failed for several hours with no progress
Indexed counts are far below expected for a stable bucket scope
Search results lack Open links or extracted text for files that should be supported
Include:
Connector name and instance ID (if available)
Organization / tenant URL
Approximate time of the issue
Error messages or screenshots from the connector UI
Steps already tried from this guide
Q. Can I index multiple buckets?
Ans: Yes. List multiple bucket names in AWS Buckets, or use * to discover all buckets the IAM principal can list.
Q. How often does the connector sync?
Ans: Scheduling is configured in Enterprise Search. Because only full sync is supported today, daily sync is a common recommendation for active buckets.
Q. Is version history indexed?
Ans: No. One document per current object key; S3 versioning history is not enumerated.
Q. Does the connector index content from public or shared links?
Ans: It indexes objects your IAM principal can list and read. Public anonymous access does not add separate “guest” security in search—all indexed docs remain visible to all search users.
Q. What happens when a user loses access to AWS?
Ans: Enterprise Search does not re-check S3 IAM per user at query time. Restrict indexed content by bucket scope and IAM on the connector principal.
Q. Can I exclude paths or file types?
Ans: Not in the current product configuration. Choose buckets whose contents are safe to expose in search, and use IAM on the connector principle to restrict which objects can be listed and read.