Box connector for Simpplr Enterprise Search

Table of Contents

Introduction

Capabilities at a glance

Objects and content supported

How public or link-shared content is handled

What happens when access is removed for a specific document?

Versions and editions supported

Prerequisites

Authentication and security

Setup and configuration

Crawling and sync behavior

Field mapping and search experience

Known limitations

Monitoring and troubleshooting

Frequently asked questions (FAQ)

 

Who is this for?  Search & IT administrators

Introduction

The Box connector allows Simpplr Enterprise Search to index Box Storage content, making it easily discoverable and searchable directly within Simpplr.

With this connector, you can (use cases):

  • Bring Box content into Simpplr Enterprise Search so users can find files alongside intranet content in one place.
  • Respect Box permissions so users only see files they already have access to in Box.
  • Use advanced features like autocomplete, hybrid ranking, and Smart Answers on top of Box content.

Indexed content from Simpplr Enterprise Search is available in:

  • Main search listing
  • Smart answers

Capabilities at a glance

Content types Folders, Files, pdf, ppt, doc, word, excel, csv, text etc.
Metadata Title, URL, Owner, Created time and last modified time, Parent URL, File type, extension and size, Mime Type, Permissions (users and groups level access)
Permissions User and Group based permissions
Indexing Initial full crawl when the connector is created, followed by a weekly full crawl. Incremental updates run every 4 hours, and ACL (permission) sync runs every hour.
Multiple instances support Multiple box connections can be configured in the Simpplr environment.
Search features
  • Pre-ingestion filters - Admins can include/exclude files from indexing based on:
  • File extension (e.g., .zip, .exe)
  • File size
  • Document age
  • Audience filters - Admins can include/exclude documents from indexing based on the Audiences.

Filtering follows an after-scan approach, meaning the entire dataset is scanned first and then filtered based on the specified fields.

  • Keyword search
  • Hybrid / semantic ranking
  • Autocomplete suggestions
  • Filters for source, file type, owner, created date
  • Box content can be used in Smart Answers


 

Objects and content supported

  1. Objects - List the object types that are indexed, for example:
  • Files
  • Folder
  1. Metadata - For each indexed item, Box captures:
  • Title
  • URL / link
  • Owner
  • Created time and last modified time
  • Parent URL
  • File type, extension and size
  • Mime Type
  • Permissions (users and groups level access)
  1. Permissions model - Permissions are read from Box and enforced in Simpplr Enterprise Search. Include: 
    1. How user and group permissions are synchronized
  • Box user and group memberships are fetched and stored in the ACL index.
  • When a user is added to or removed from a Box group, the ACL index is updated the next time the ACL sync runs (by default, every hour).

How public or link-shared content is handled

Content that is only available via anonymous or public shared links is not indexed in the current version of this connector.

What happens when access is removed for a specific document?

  • When a user loses access to a file or folder in Box, the updated permissions are applied during the next ACL sync.
  • The file will no longer appear in that user’s Simpplr search results after the ACL sync completes.

Versions and editions supported

  • Supported Box account editions: Enterprise account.
  • Not supported: Free account

Prerequisites

Before you begin, ensure the following:

  1. Source system permissions
  • You need access to the Box Developer Console to create and configure your application.
  1. Application / service account
  • Ability to register a new Server Authentication (Client Credentials Grant) app in the Box dev console with custom App 
  • Ability to Authorize the application from the Admin console.

Box documentation:

Authentication and security

How does Simpplr Enterprise Search connect to Box?

  • Auth type: Server Authentication (Client credentials)
  • Scopes or permissions required (examples):
    • Write all files and folders stored in Box" in Application Scopes
    • "Make API calls using the as-user header" in Advanced Features

Data security

  1. Data storage and residency: Indexed content and ACLs from Box are stored within your Simpplr Enterprise Search environment, in the same region as your Simpplr tenant.
  2. Encryption in transit: Server-side encryption with Amazon S3 managed keys (SSE-S3), TLS encryption in Kafka.
  3. Encryption at rest: SSL (TLS 1.2 or higher), Auth: OAuth 2.0 Bearer tokens (client-credential).
  4. Permission enforcement: Box access controls (users and groups) are stored in the ACL index and applied at query time. Search results are always filtered by the signed-in user’s identity and Box group memberships.

Setup and configuration

Step 1 - Prepare your Box source

  1. Go to your Box dev console, click Create Platform App and then Select Custom App.



  1. App name: Simpplr Box Enterprise Search (or any name you prefer)
  2. Purpose: “Integration” 
  3. Categories: “Productivity” (or any categories that apply
  4. Which external system are you integrating with? “Simpplr” 
  5. Click Next.


  1. Select Server Authentication (Client Credentials Grant), Click Create App.


  1. You will see a “Configuration” screen below.
    Scroll down to “OAuth 2.0 Credentials
    Copy the Client ID and save it somewhere, we will need to enter this in Simpplr later.
  2. Click Fetch Client Secret and copy the Client Secret somewhere, we will need to enter this in the Simpplr configuration later.


Note the Enterprise ID mentioned under the “General Settings” tab. We will need to enter this in the Simpplr configuration later.

  1. Check following permissions: 
          A.  "Write all files and folders stored in Box" in Application Scopes.=
          B. "Make API calls using the as-user header" in Advanced Features


  1. Select App + Enterprise Access in App Access Level.
  2. Click Save Changes.
  3. Authorize your application from the admin console. Go to the "Authorization" tab, click Review and Submit. If you don't have permission, you may need to submit the application for authorization.


  1. Enter an App Description, then click Submit.


  1. Your Box admin should receive an email about the request. Please click on the email to go to the approval page, click Authorize to authorize the app.


    Once the app is authorized, the status in the Authorization tab should change to “Authorized”:

Step 2 - Create the connector in Simpplr

  1. From your Simpplr home dashboard, go to: Manage features > Enterprise Search > Add source.


  2. Search for and select Box.

  3. Enter basic information:
    • Name: (Connector Name for this instance)
  4. Provide authentication details (Copied from the app Server Application):
    • Client ID
    • Client Secret
    • Enterprise ID


  1. Click Save, then Confirm

Step 3 - Define sync scope

  • Configure inclusion rules:
    • Not configurable in the current version
  • Configure exclusion rules:
    • File extension (e.g., .zip, .exe)
    • File size above a specified threshold
    • Document age (e.g., older than X days)


  • Configure Audience based filtering.
    • Include audiences
    • Exclude audiences


Step 4 - Configure sync schedule

Default schedule: Full crawl at first setup and once in a week, incremental sync every 4 hours, ACL runs every hour

Configuration options:

  • No option to configure the sync schedule, however sync can be paused and resumed manually.

Step 5 - Monitor the sync

  1. Monitor the initial full sync status (starts automatically) in the connector dashboard.

Crawling and sync behavior

How the connector works over time:

  1.  Initial full crawl
  • All the content present in the storage account is indexed during the first run
  • How long it may take: Depends on the size of the content
  1. Incremental updates
  • Mechanism: Based on Timestamp of previous sync
  • What changes trigger reindexing:
    • New items created
    • Existing items updated
    • Permissions changed
    • Items moved or renamed
    • Items deleted or archived
  1. Deletion and permission changes
  • Deleted items are removed from index at next sync
  • Permission changes are updated at the next sync cycle
  1. Expected latency
  • With the default schedule (incremental sync every 4 hours and ACL sync every hour), changes made to Box content are generally reflected in Simpplr search results within 4 hours of the update, and the permission lag in the system can be up-to 4hours. (as the incremental Sync every 4 hours). On top of that, there can be certain cases, where the permission sync can take up-to 7days (When the full sync is run), subject to content volume and system load.

Field mapping and search experience

Default field mapping

Source field Box Storage > Index field Simpplr

title name 
url url
owner/created by created_by.login
file type file_type
Last modified _timestamp
Created data created_at
size size
permissions /access control  _allow_access_control

Search experience - how content from this connector appears in search:

Result layout: (Icon, Connector name, title (name) as link, body(excerpt), owner, Created Date, File Type icon, file type)


  • Available filters and facets:
    • Sources = Box
    • File type
    • Owner
    • Created Date
  • Participation in advanced features:
    • Smart Answers / Q&A: Yes
    • Autocomplete: Yes
    • Recommendations / “Suggested for you”: N/A
    • Trending / popular results: N/A
    • Semantic / hybrid ranking: Yes

Known limitations

Maximum file size indexed Files bigger than 10 MB won’t be extracted.
Unsupported file types

Compressed files are not supported, e.g., an archive file containing a set of PDFs

(The file content is not searchable, however, the users can still search via file title.)

Rate limits N/A
Preview limitations No preview available for excel, or media files.
Permission edge case Permission changes are not synced unless ACL sync is run.
Other known limitations The text from the PDF is extracted. However,  If text is within an image, it is not extracted.

Monitoring and troubleshooting

Connector health and monitoring - Describe where admins can see status information:

  • Enterprise Search > Connector name
  • Available metrics:
    • Last sync status (Success / Warning / Failed)
    • Last sync time
    • Next scheduled sync
    • Sync Type
    • Total items indexed count


Common issues and resolutions - Example pattern:

  1. Issue: Authentication failed, Failed to generate the access token (invalid credentials or missing scopes)
    Possible causes:
    1. Incorrect client ID or secret
    2. App not granted the required permissions
    3. App not authorized by the Admin
  2. Resolution:
    1. Verify and re-enter credentials
    2. Confirm required scopes are granted
    3. Confirm if the App is authorized
  3. When to contact Simpplr Support. 
    1. Authentication error persists even after trying the above-mentioned resolutions
    2. Sync is stuck in the Pending state,
    3. Sync is in progress but no documents are getting ingested.
    4. Sync failure with cancelled error (when not cancelled manually)
    5. Incomplete or Partial sync
  4. When contacting Support, include:
  • Connector name and instance ID (if available)
  • Organization URL
  • Approximate time and date of the issue
  • Error messages or screenshots
  • Steps you already tried

Frequently asked questions (FAQ)

Q. Can I connect multiple Box tenants or domains?
A. Multiple Box connections can be configured in the Simpplr environment. But functionality to connect multiple tenants or domains in a single connector is not implemented yet.

Q. How often does Box sync data?
A. The connector runs a full crawl on first setup and then once per week. Incremental sync runs every 4 hours and ACL (permission) sync runs every hour by default.

Q. Are comments, revisions, or version history indexed?
A. Comments and individual versions are not indexed as separate items. The connector indexes the latest file metadata, including the last updated time and updated-by user.

Q. Does the connector index content from external guests or shared links?
A. Not implemented yet.

Q. What happens when a user loses access to an item in Box Storage?
A. The updated access permissions will be indexed during the next sync.

Note: Though the access control syncs run every hour, the permission lag in the system can be up-to 4 hours. (As the incremental sync every 4 hours). On top of that, there can be certain cases, where the permission sync can take up-to 7 days (when the full sync is run).

Q. Can I exclude certain sites/teams/folders from being indexed?
A. Documents can only be excluded based on file extension, size, and age. Additionally, documents can be included or excluded based on audiences.

Q. How are deletions handled?
A. Objects deleted from the source are permanently deleted from the index.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request
Note: Some features may not be avalable in your instance due to various packaging and pricing. To learn what features are available to your org and bundling with the Simpplr One packaging, contact your CSM or Account Manager.

Comments

0 comments

Please sign in to leave a comment.

Articles in this section