Table of Contents
Field mapping and search experience
Monitoring and troubleshooting
When to contact Simpplr Support
Overview
The SharePoint connector allows Simpplr Enterprise Search to index Microsoft SharePoint storage content, making it easily discoverable and searchable directly within Simpplr.
With this connector, you can:
- Bring SharePoint content into Simpplr Enterprise Search so users can find files alongside intranet content in one place.
- Respect SharePoint access permissions so users only see files they already have access to in SharePoint.
- Use advanced features like autocomplete, hybrid ranking, and Smart Answers on top of SharePoint content.
Indexed content from Simpplr Enterprise Search is available in:
- In main search listing
- Smart answers
Capabilities at a glance
| Content types | Folders, Files, pdf, ppt, doc, word, excel, csv, text etc. |
| Metadata | Name, URL, Owner details, Updated By Details Created Time and Updated Time, Parent Reference, Object type, Extension and Size, Mime Type, Permissions (users and groups level access) |
| Permissions | User and Group based permissions |
| Indexing | Initial full crawl when the connector is created, followed by a weekly full crawl. Incremental updates run every 4 hours, and ACL (permission) sync runs every hour. |
| Multiple instances support | Multiple SharePoint connections can be configured in the Simpplr environment. |
| Search features |
Pre-ingestion filters
Filtering follows an after-scan approach, meaning the entire dataset is scanned first and then filtered based on the specified fields.
|
Objects and content supported
- Objects - List the object types that are indexed, for example:
- Files
- Folder
- Metadata - For each indexed item, SharePoint captures:
- Name
- URL / link
- Owner details
- Update by details
- Created time and Updated Time
- Parent Details
- Object Type
- File extension and size
- Mime Type
- Permissions (users and groups level access)
- Permissions model - Permissions are read from SharePoint and enforced in Simpplr Enterprise Search:
- How user and group permissions are synchronized
- SharePoint user and group memberships are fetched and stored in the ACL index.
- When a user is added to or removed from a SharePoint group, the ACL index is updated the next time the ACL sync runs (by default, every hour).
- How public or link-shared content is handled
- Content that is only available via anonymous or public shared links is not indexed in the current version.
- What happens when access is removed for a specific document
- When a user loses access to a file or folder in SharePoint, the updated permissions are applied during the next ACL sync.
- The file will no longer appear in that user’s Simpplr search results after the ACL sync completes.
Versions and editions supported
- Supports enabled SharePoint storage service.
Prerequisites
Before you begin, ensure the following:
- Source system access
- Access to Microsoft Entra user account
- Application/service account permissions
- Sufficient permissions to register an application with your Microsoft Entra tenant, and assign to the application a role in your Azure subscription. To complete these tasks, you'll need the Application.ReadWrite.All permission
- Ability to grant the admin consent to the application from the Admin console (If you are not an admin, you need to request the Admin to grant consent via their Azure Portal)
SharePoint source documentation
Authentication and security
- Authentication mechanism - How does Simpplr Enterprise Search connect to SharePoint?
- Auth type: Application Authentication (Client credentials)
- Scopes or permissions required:
-
Graph API
- - Sites.Read.All
- - Files.Read.All
- - Group.Read.All
- - User.Read.All
-
SharePoint
- - Sites.Read.All
- Admin Consent to the permissions
-
Graph API
- Data security
- Data storage and residency: Indexed content and ACLs from SharePoint are stored within your Simpplr Enterprise Search environment, in the same region as your Simpplr tenant.
- Encryption in transit: Server-side encryption with Amazon S3 managed keys (SSE-S3), TLS encryption in Kafka.
- Encryption at rest: SSL (TLS 1.2 or higher), Auth: OAuth 2.0 client-credentials.
- Permission enforcement: SharePoint access controls (users and groups) are stored in the ACL index and applied at query time. Search results are always filtered by the signed-in user’s identity and SharePoint group memberships.
Setup and configuration
Step 1 - Prepare Microsoft SharePoint source
- Go to the Azure portal and sign in with your Azure account.
-
Search for and navigate to the App Registration service.
-
Click on the New registration button to register a new application.
- Provide a name for your app, and optionally select the supported account types (e.g., single tenant, multi-tenant) based on your Entra-ID.
- Click Register to create the app registration.
- After the registration is complete, you will be redirected to the app’s overview page. Take note of the Application (client) ID value and Directory (Tenant) ID, as you’ll need them later.
- Navigate to Manage > API permissions section and click Add a permission.
- Click Add a permission and in the Request API permissions pane, select Microsoft Graph as the API.
-
Choose the application permissions and select the following permissions under the Application tab:
-
Graph API
- - Sites.Read.All
- - Files.Read.All
- - Group.Read.All
- - User.Read.All
-
SharePoint
- - Sites.Read.All
-
Graph API
Similarly, add the remaining permissions (Sites.Read.All, Group.Read.All).
- Click Add permissions to add the selected permissions to your app. Finally, click Grant admin consent to grant the required permissions to the app. This step requires administrative privileges.
- Navigate to Manage > Certificates & Secrets tab. Go to Client Secrets. Generate a new client secret and keep a note of the string present under the Value column.
Step 2 - SharePoint Authentication
- Go to the tenant URL in your browser. The URL follows this pattern: https://admin_tenant_or_domain>/_layouts/15/appinv.aspx. This loads the SharePoint admin center page.
- In the App ID box, enter the application ID that you recorded earlier, and then click Lookup. The application name will appear in the Title box.
- In the App Domain box and Redirect URI, type www.localhost.com
App domain : www.localhost.com
Redirect Url : http://www/localhost.com
- In the App’s Permission Request XML box, type the following XML string:
<AppPermissionRequests AllowAppOnlyPolicy="true">
<AppPermissionRequest Scope="http://sharepoint/content/tenant" Right="FullControl" />
<AppPermissionRequest Scope="http://sharepoint/social/tenant" Right="Read" />
</AppPermissionRequests>- Set DisableCustomAppAuthentication to false.
- Open the Azure Terminal. If the account does not have a subscription, you will need a Windows machine with Powershell installed. SharePoint Online module doesn’t work on Linux or Mac machines. The connect command executes, but nothing happens under the hood.
- Open the Windows with Run as Power Shell Common.
- Before performing the steps, you'll need to temporarily remove the MFA code.
# Some of the steps might not be required on Azure Cloud Shell
# Install the SharePoint Online
Install-Module -Name Microsoft.Online.SharePoint.PowerShell
# Import the SharePoint Online
Import-Module Microsoft.Online.SharePoint.PowerShell
# Set the credentials for the Admin account.
$username = "HenriettaM@smplrdev.onmicrosoft.com"
$password = "your_password"
$cred = New-Object -TypeName System.Management.Automation.PSCredential -argumentlist $username, $(convertto-securestring $Password -asplaintext -force)
# Doing this on a non-windows Powershell returs a 400. Credential supply migth not be required on Azure Cloud Shell
Connect-SPOService -Url your_url -credential $cred
your_url should be replaced with your domain sharepoint
eg : https://smplrdev-admin.sharepoint.com/
set-spotenant -DisableCustomAppAuthentication $falseStep 3 - Create the connector in Simpplr
- From your Simpplr intranet, go to Manage features > Enterprise search > Add source.
- Search for and select Microsoft Sharepoint.
- Enter basic information:
- Connection Name: (Connector Name for this instance)
- Provide authentication details (Copied from the client Application):
- Tenant ID
- Tenant Name
- Client ID
- Secret value
- Click Save, then Confirm.
Step 3 - Define sync scope
- Configure site-based rules:
- Include specific sites: Include the files belonging to specific site (link or ID)
- Exclude specific sites: Exclude the files belonging to specific site (link or ID)
- Configure Common Filters rules (Exclusion only):
- File extension (e.g., .zip, .exe)
- File size above a specified threshold
- Document age (e.g., older than specified date)
- Configure Audience-based filtering.
- Include Audiences
- Exclude Audiences
Step 4 - Configure sync schedule
- Default schedule: Full crawl at first setup and once in a week, incremental sync every 4 hours, ACL runs every hour
- Configuration options:
- No option to configure the sync schedule, however sync can be paused and resumed manually
Step 5 - Monitor the sync
- Monitor the initial full sync status (starts automatically) in the connector dashboard.
Crawling and sync behavior
The information below explains the connector works over time:
- Initial full crawl
- All the content present in the SharePoint storage account is indexed during the first run
- How long it may take: Depends on the size of the content
- Incremental updates
- Mechanism: Based on timestamp of previous sync
- What changes trigger reindexing:
- New items created
- Existing items updated
- Permissions changed
- Items moved or renamed
- Items deleted or archived
- Deletion and permission changes
- Deleted items are removed from index at next sync
- Permission changes are updated at the next sync cycle
- Expected latency
- With the default schedule (incremental sync every 4 hours and ACL sync every hour), changes made to SharePoint content are generally reflected in Simpplr search results within 4 hours of the update, and the permission lag in the system can be up to 4 hours. (as the incremental sync every 4 hours). On top of that, there can be certain cases, where the permission sync can take up to 7 days (when the full sync is run), subject to content volume and system load.
Field mapping and search experience
Default field mapping
Source field SharePoint > Index field Simpplr
| Title | name |
| URL | webUrl |
| Owner/created by | createdBy.user.email |
| File type | |
| Last modified | lastModifiedDateTime |
| Created data | createdDateTime |
| Size | size |
| Permissions/access control | _allow_access_control |
Search experience - how content from this connector appears in search:
- Result layout: (Icon, Connector name, Folder name, title (name) as link, body(excerpt), Created Date, File Type icon, file type)
- Available filters and facets:
- Sources = SharePoint
- File type
- Owner
- Created Date
- Participation in advanced features:
- Smart Answers/ Q&A: Yes
- Autocomplete: Yes
- Recommendations/“Suggested for you”: N/A
- Trending/popular results: N/A
- Semantic/hybrid ranking: Yes
Limits and known limitations
| Maximum file size indexed | Files bigger than 10 MB won’t be extracted. |
| Unsupported file types |
Compressed files are not supported, e.g., an archive file containing a set of PDFs (The file content is not searchable, however, the users can still search via file title.) |
| Rate limits | N/A |
| Preview limitations | No preview available for excel, or media files. |
| Permission edge case | Permission changes are not synced unless ACL sync is run. |
| Other known limitations | The text from the PDF is extracted. However, If text is within an image, it is not extracted. |
Monitoring and troubleshooting
- Connector health and monitoring - Follow the steps below to see status information:
- Enterprise Search > Connector name
- Available metrics:
- Last sync status (Success/Warning/Failed)
- Last sync time
- Next scheduled sync
- Sync type
- Total items indexed count
- Common issues and resolutions. Example pattern:
- Issue: Authentication failed, Failed to fetch the access token (invalid credentials or missing scopes)
Possible causes:- Incorrect client ID or secret
- App not granted the required permissions
- App not granted with the Admin consent
- Resolution:
- Verify and re-enter credentials
- Confirm required permissions are granted
- Confirm if the app is granted with the Admin consent
- Issue: Authentication failed, Failed to fetch the access token (invalid credentials or missing scopes)
When to contact Simpplr Support
- Authentication error persists even after trying the above-mentioned resolutions
- Sync is stuck in the Pending state,
- Sync is in progress but no documents are getting ingested.
- Sync failure with cancelled error (when not cancelled manually)
- Incomplete or Partial sync.
- When contacting Support, include:
- Connector name and instance ID (if available)
- Organization URL
- Approximate time and date of the issue
- Error messages or screenshots
- Steps you already tried
Frequently asked questions (FAQ)
Q1. Can I connect multiple SharePoint tenants or domains?
A. Multiple SharePoint connections can be configured in the Simpplr environment. But functionality to connect multiple tenants or domains in a single connector is not implemented yet.
Q2. How often does SharePoint sync data?
A. The connector runs a full crawl on first setup and then once per week. Incremental sync runs every 4 hours and ACL (permission) sync runs every hour by default.
Q3. Are comments, revisions, or version history indexed?
A. Comments and individual versions are not indexed as separate items. The connector indexes the latest file metadata, including the last updated time and updated by the user.
Q5. What happens when a user loses access to an item in SharePoint Storage?
A. The updated access permissions will be indexed during the next sync.
Note: Though the access control syncs run every hour, the permission lag in the system can be up to 4 hours (as the incremental sync every 4 hours). On top of that, there can be certain cases, where the permission sync can take up to 7 days (when the full sync is run).
Q6. Can I exclude certain sites/teams/folders from being indexed?
A. Documents can be included and excluded based on the Sites and IDs. Documents can also be excluded based on file extension, size, and age. Additionally, documents can be included or excluded based on audiences.
Q7. How are deletions handled?
A. Objects deleted from the source are permanently deleted from the index.
Q8. Are image files searchable?
A. No, images are not searchable.
Comments
Please sign in to leave a comment.