Confluence connector for Simpplr Enterprise Search

Updated 4 months ago

Introduction

The Confluence connector allows Simpplr Enterprise Search to index Documents and Spaces that you have created in Confluence, making it easily discoverable and searchable directly within Simpplr.

With this connector, you can (use cases):

Bring Confluence content into Simpplr Enterprise Search so users can find Confluence spaces and documents, alongside intranet content in one place.
Respect Confluence permissions so users only see files they already have access to in Confluence.
Use advanced features like autocomplete, hybrid ranking, and Smart Answers on top of Confluence content.

Indexed content from Simpplr Enterprise Search is available in:

In main search listing
Smart answers

Capabilities at a glance

Content types	Pages.
Metadata	Name, Type, Created time and last modified time, Owner, Author, Ancestors, URL, Space Details, Status, Permissions (users and groups level access)
Permissions	Account and Group based permissions
Indexing	Initial full crawl when the connector is created, followed by a weekly full crawl. Incremental updates run every 4 hours, and ACL (permission) sync runs every hour.
Multiple instances support	Currently multiple Confluence connectors are not supported within a single connector. User may create multiple connectors for different Confluence Accounts
Search features	Pre-ingestion filters Space Based Filters - Admins can exclude files from indexing based on: Space Audience filters - Admins can include/exclude documents from indexing based on the Audiences. Filtering follows an after-scan approach, meaning the entire dataset is scanned first and then filtered based on the specified fields. Keyword search Hybrid / semantic ranking Autocomplete suggestions Confluence pages’ content can be used in Smart Answers

Objects and content supported

Objects - List the object types that are indexed, for example:

Space
Page

Metadata - For each indexed item, Confluence captures based on the type of Object :

Name
Type
Created time and last modified time
Owner,
Author,
Ancestors
URL
Space Details
- Space Name
- Space URL
- Space ID
Status
Permissions (users and groups level access)

Permissions model - Permissions are read from Confluence and enforced in Simpplr Enterprise Search. Include:
1. How user and group permissions are synchronized

Confluence user and group memberships are fetched and stored in the ACL index.
When a user is added to or removed from a Confluence group, the ACL index is updated the next time the ACL sync runs (by default, every hour).

How public or link-shared content is handled

Content that is only available via anonymous or public shared links is not indexed in the current version.

What happens when access is removed for a specific document

When a user loses access to a Space or Page in Confluence , the updated permissions are applied during the next Incremental / ACL sync.
The file will no longer appear in that user’s Simpplr search results after the Incremental / ACL sync completes.

Versions and editions supported

Supported Confluence account editions: All.
Not supported: N/A

Prerequisites

Before you begin, ensure the following:

Source system permissions

You need access to the Confluence cloud Developer Console to create and configure your application.

Application / service account

The user should be having Product Admin access ( to Sync Access Control Information)
Ability to generate API token from Security Section of the Above User’s Profile

Confluence documentation:

Authentication and security

Authentication mechanism
Describe how Simpplr Enterprise Search connects to Confluence:

Auth type: Api token
Scopes or permissions required (examples):
- User needs to have product Admin access

Data security
1. Data storage and residency: Indexed content and ACLs from Confluence are stored within your Simpplr Enterprise Search environment, in the same region as your Simpplr tenant.
2. Encryption in transit: Server-side encryption with Amazon S3 managed keys (SSE-S3), TLS encryption in Kafka.
3. Encryption at rest: SSL (TLS 1.2 or higher), Auth: OAuth 2.0 Bearer tokens (client-credential).
4. Permission enforcement: Jita access controls (users and groups) are stored in the ACL index and applied at query time. Search results are always filtered by the signed-in user’s identity and Confluence group memberships.

Setup and Configuration

There are two steps to setup this connector:

Setup credentials in Connector Source - This step yields an API Token that will be used in the next step.
Setup the connector in Simpplr Intranet

Step 1 - Prepare Confluence source

Step 1a - Setup Prerequisites

Log in to Confluence Cloud using the Admin Account.
Make sure the user has Product Admin access. (Required to sync Access Control related information)
Here are the steps to add Product Admin access for a new user or if the user does not have Product Admin access:
- Go to Jira Administration by clicking the tiles on the Top Left of the Screen

Go to Directory and click on the User.
Under Product Access, click on Grant access and add Product Admin

Step 1b - Configuring the Api Token

Navigate to Security → API Tokens → Create and Manage API Tokens
There are two token options available: The Jira Connector supports both Scoped Tokens and General Tokens. You may choose either type based on your access requirements.
1. Scoped Token:
  Scoped API tokens allow you to specify what actions a token can perform in Confluence Cloud. Unlike classic tokens, which grant all permissions available to the user, scoped tokens restrict access to only the selected scopes, reducing risk if a token is compromised.
  Scoped tokens have specific permissions and are restricted by selected scopes. They use the api.atlassian.com domain instead of your site-specific URL.

General Token (Classic):
Classic tokens often grant all permissions available to the user who generated them across an entire account or organization. For example, a classic token with "repo" scope has access to all private repositories the user can access.

Due to their broad access, Classic tokens pose a greater security risk if compromised. Revealing a classic token could potentially expose all of an organization's projects and data.

You can setup the connector in two ways:

Scoped Api Token: Complex, gives you more control
Classic Api Token: Quick & simple

Scoped Api Token

Select Create API token with scopes.

Give your API token a name that describes its purpose.
Select an expiration date for the API token.
Token expiration must be between 1 and 365 days.

Select the Confluence app for the API token to access.

Select the below mentioned scopes

Note:

Total 12 permissions are required

Filter the Scope Actions: Read, Read Only, Search

Scope Type: Granular Permissions (1):

read:content-details:confluence

Tip: Search by scope name read:content-details:confluence

Scope Type: Classic Permissions (11):

read:account
read:confluence-content.all
read:confluence-content.permission
read:confluence-content.summary
read:confluence-groups
read:confluence-props
read:confluence-space.summary
read:confluence-user
read:me
readonly:content.attachment:confluence
search:confluence

Tip: Filter Scope Type: Classic

6. Select Next, review the token and then select Create.

Select Copy to clipboard, then paste the token to your script, or save it somewhere safe.
You can't recover the API token after you’ve completed this step. We recommend saving it in a password manager.

Classic Api Token

Generate and copy the API Token.

Step 1c - Extracting the Confluence Cloud ID

Cloud ID can be accessed by navigating to URL:

<you-tenant-url>/_edge/tenant_info

For Example: If your Confluence application URL looks like this: https://simpplr.atlassian.net/wiki/home, your tenant URL would be: https://simpplr.atlassian.net

Finally, your cloud ID can be accessed using this URL:
https://simpplr.atlassian.net/_edge/tenant_info

Step 1 Output

Api Token
Confluence Cloud ID

Step 2 - Create the connector in Enterprise Application

Step 2a - Setting up the configuration

In Simpplr, go to: Manage Features → Enterprise Search → Add Source.
Search and Select “Atlassian Confluence Cloud”.
Enter basic information:
- Connection Name: (Connector Name for this instance)
Provide authentication details:
- Confluence Cloud account email (email)
- Confluence Cloud API token (copied from the earlier step)
- Confluence URL label: Host URL should be in the format (use the cloud Id from the earlier step)
  https://api.atlassian.com/ex/confluence/<Cloud ID>

Click “Save Configuration” and “confirm”.

Step 2b - Setting up the Data & Audience Filters

Configure inclusion rules:
- Not configurable in the current version
Configure exclusion rules:
- Exclude Spaces

Configure Audience based filtering.
- Include audiences
- Exclude audiences

Step 2c - Save & Sync

Default schedule: Full crawl at first setup and once in a week, incremental sync every 4 hours, ACL runs every hour
Configuration options:
- No option to configure the sync schedule, however sync can be paused and resumed manually

Step 2d - Monitor

Monitor the initial full sync status (starts automatically) in the connector dashboard.

Crawling and sync behavior

How the connector works over time:

Initial full crawl

All the content present in the storage account is indexed during the first run
How long it may take: Depends on the size of the content.

Incremental updates

Mechanism: Based on Timestamp of previous sync.
What changes trigger reindexing:
- New docs created
- Existing doc updated
- Permissions changed
- Docs moved or renamed
- Docs deleted

Deletion and permission changes

Deleted items are removed from index at next sync.
Permission changes are updated at the next sync cycle.

Expected latency

With the default schedule (incremental sync every 4 hours and ACL sync every hour), changes made to Confluence content are generally reflected in Simpplr search results within 4 hours of the update, and the permission lag in the system can be up-to 4hours. (as the incremental Sync every 4 hours). On top of that, there can be certain cases, where the permission sync can take up-to 7days (When the full sync is run), subject to content volume and system load.

Field mapping and search experience

Search experience - How content from Confluence connector appears in search:

Result layout: (Icon, Connector Type (Confluence), Space, Title (name) as link, Body(excerpt), owner, Updated Date, Content Type, Access Detail)
Available filters and facets:
- Sources = Confluence Cloud
- File type
- Owner
- Created Date

Participation in advanced features:
- Smart Answers / Q&A: Yes
- Autocomplete: Yes
- Recommendations / “Suggested for you”: N/A
- Trending / popular results: N/A
- Semantic / hybrid ranking: Yes

Limits and known limitations

Maximum file size indexed	Files bigger than 10 MB won’t be extracted.
Unsupported file types	Compressed files are not supported, e.g., an archive file containing a set of PDFs (The file content is not searchable, however, the users can still search via file title.)
Rate limits	N/A
Preview limitations	No preview available for excel, or media files.
Permission edge case	Permission changes are not synced unless ACL sync is run.
Other known limitations	The text from the PDF is extracted. However, If text is within an image, it is not extracted.

Monitoring and troubleshooting

Connector health and monitoring - Describe where admins can see status information:

Enterprise Search -> Connector name
Available metrics:
- Last sync status (Success / Warning / Failed)
- Last sync time
- Next scheduled sync
- Sync Type
- Total items indexed count

Common issues and resolutions. Example pattern:
- Issue: Authentication failed, Failed to authorize with the given Api token (invalid token)
  Possible causes:
  - Incorrect Token
  - User does not have the access of Product admin

Resolution:

Verify and re-enter credentials
Confirm if the user has the access of Product admin

Issue: 404 error, URL not found:
Possible causes:
- Incorrect Host URL
- Incorrect Cloud ID

Resolution:

5. Verify the cloud ID and Host URL Format.

6. When to contact support.

Authentication error persists even after trying the above-mentioned resolutions
Sync is stuck in the Pending state,
Sync is in progress but no documents are getting ingested.
Sync failure with cancelled error (when not cancelled manually)
Incomplete or Partial sync.
When contacting Support, include:

Connector name and instance ID (if available)
Organization URL
Approximate time and date of the issue
Error messages or screenshots
Steps you already tried

Frequently asked questions (FAQ)

Q1. Can I connect multiple Confluence tenants or domains?
A. Functionality to connect multiple domains into the single connector has not been implemented yet.

Q2. How often does Confluence sync data?
A. The connector runs a full crawl on first setup and then once per week. Incremental sync runs every hour and ACL (permission) sync runs every hour by default.

Q3. Are comments, revisions, or version history indexed?
A. Comments and individual versions are not indexed as separate items. The connector indexes the latest file metadata, including the last updated time and updated-by user.

Q4. Does the connector index content from external guests or shared links?
A. No. Connector indexes the confluence documents and spaces only from the configured tenant.

Q5. What happens when a user loses access to an item in Confluence Storage ?
A. The updated access permissions will be indexed during the next sync.

Note:

The permission lag in the system can be up-to 1-2hours. (As the incremental sync every hour). On top of that, there can be certain cases, where the permission sync can take up-to 7days (When the full sync is run).

Q6. Can I exclude certain sites/teams/folders from being indexed?
A. Documents can be excluded based on the Confluence spaces. Documents can also be excluded based on file extension, size, and age. Additionally, documents can be included or excluded based on audiences.

Q7. How are deletions handled?
A. Objects deleted from the source are permanently deleted from the index.

Was this article helpful?

Subscribe to receive updates on this article