Google Drive Connection for Enterprise Search

Table of Contents

Overview

Prerequisites

Authentication and security

Setup and configuration

Field mapping and search experience

Monitoring and troubleshooting

FAQ

Overview

This article will walk you through creating a Google Cloud connection to Simpplr's Enterprise Search feature.

With this connector, you can:

  • Bring Google Drive content into Simpplr Enterprise Search so users can find files alongside intranet content in one place.
  • Respect Google Drive access permissions so users only see files they already have access to in Google Drive.
  • Use advanced features like autocomplete, hybrid ranking, and Smart Answers on top of Google Drive content.

Indexed content from Simpplr Enterprise Search is available in:

  • Main search listing
  • Smart answers

Capabilities at a glance

Content types Folders, Files, pdf, ppt, doc, word, excel, csv, text etc.
Metadata Name, URL, Author Details, Created Time and Updated Time, Parent Detail, File type, Extension and Size, Mime Type, If Trashed, Permissions (users and groups level access)
Permissions User and Group based permissions
Indexing Initial full crawl when the connector is created, followed by a weekly full crawl. Incremental updates run every 4 hours, and ACL (permission) sync runs every hour.
Multiple instances support Multiple Google Drive connections can be configured in the Simpplr environment.
Search features

Pre-ingestion filters

  • Folder Based Filters - Admins can include and exclude files from indexing based on:
  • Folder Path
  • Common Filters  - Admins can exclude files from indexing based on:
  • File extension (e.g., .zip, .exe)
  • File size
  • Document age
  • Audience filters - Admins can include/exclude documents from indexing based on the Audiences.

Filtering follows an after-scan approach, meaning the entire dataset is scanned first and then filtered based on the specified fields.

  • Keyword search
  • Hybrid / semantic ranking
  • Autocomplete suggestions
  • Filters for source, file type, owner, created date
  • Google Drive content can be used in Smart Answers

Objects and content supported

  1. Objects - List the object types that are indexed, for example:
  • Files
  • Folder
  1. Metadata - For each indexed item, Google Drive captures:
  • Name
  • URL / link
  • Author name and email address
  • Created time and Updated Time
  • Parent Details
  • File type, extension and size
  • Mime Type
  • Is Trashed
  • Permissions (users and groups level access)
  1. Permissions model - Permissions are read from Google Drive and enforced in Simpplr Enterprise Search: 
    1. How user and group permissions are synchronized
  • Google Drive user and group memberships are fetched and stored in the ACL index.
  • When a user is added to or removed from a Google Drive group, the ACL index is updated the next time the ACL sync runs (by default, every hour).
  1. How public or link-shared content is handled
  • Files Shared with the user are indexed.
  1. What happens when access is removed for a specific document
  • When a user loses access to a file or folder in Google Drive, the updated permissions are applied during the next ACL sync.
  • The file will no longer appear in that user’s Simpplr search results after the ACL sync completes.

Versions and editions supported

  • Google Drive Storage is supported.

Prerequisites

Before you begin, ensure you have the following:

Source system access

  • The organization Admin (Superadmin) accounts for the domain.

Application/service account permissions:

    • Sufficient permissions to register service accounts and enable APIs. 
    • Ability to grant the Grant Domain-Wide Delegation of Authority to the client application.

Google Drive source documentation:

Authentication and security

  1. Authentication mechanism
    • Describe how Simpplr Enterprise Search connects to Google Drive:
      • Auth type: Service Account JSON Key
      • Scopes or permissions required:
        • https://www.googleapis.com/auth/admin.directory.group.readonly
        • https://www.googleapis.com/auth/admin.directory.user.readonly
        • https://www.googleapis.com/auth/calendar.readonly
        • https://www.googleapis.com/auth/drive.readonly
        • https://www.googleapis.com/auth/drive.metadata.readonly
        • https://www.googleapis.com/auth/drive.photos.readonly
  1. Data security
    1. Data storage and residency: Indexed content and ACLs from Google Drive are stored within your Simpplr Enterprise Search environment, in the same region as your Simpplr tenant.
    2. Encryption in transit: Server-side encryption with Amazon S3 managed keys (SSE-S3), TLS encryption in Kafka.
    3. Encryption at rest: SSL (TLS 1.2 or higher), Auth: Uses a service account key to sign a JSON Web Token (JWT) and exchange it for an access token.
    4. Permission enforcement: Google Drive access controls (users and groups) are stored in the ACL index and applied at query time. Search results are always filtered by the signed-in user’s identity and Google Drive group memberships.

Setup and configuration

Step 1 - Prepare Google Drive source

  1. Log into Google Cloud Platform and go to the Console.

 

  1. Create a Google Cloud Project. Give your project a name, change the project ID and click the Create button.

  1. Select the created project.

  1. Enable Google APIs. Choose APIs & Services from the left menu and click on Enable APIs and Services. Search for the “Google Drive” API and enable it. 

Create a Service Account. In the APIs & Services section, click on Credentials and click on Create credentials to create a service account. Give your service account a name and a service account ID. This is like an email address and will be used to identify your service account in the future. Click Done to finish creating the service account.

Create a Key File

    1. In the Cloud Console, go to IAM and Admin > Service accounts page.

  1. Click the email address of the service account that you want to create a key for.

  1. Click the Keys tab. Click the Add key drop-down menu, then select Create new key.

 

  1. Select JSON as the Key type and then click Create. This will download a JSON file that will contain the service account credentials. This key will be required during the Simpplr connector configuration. 

 

  1. Grant Domain-Wide Delegation of Authority: To access user data on a Google Workspace domain, the service account that you created needs to be granted access by a super administrator for the domain.
    1. Enable Google APIs.
      Choose APIs & Services from the left menu and click on Enable APIs and Services. You need to enable the Admin SDK API (as did earlier for Google Drive).

  1. Go to admin.google.com, and log in with the Organization Admin Account (superadmin).
  2. Navigate to the main menu, then Security > Access and data control > API controls > Domain Wide Delegation.
  3. Click Add new.
  4. In the Client ID field, enter the client ID obtained from the service account creation steps above.
  5. Add new Api Client with the Client Id of the service account and grant the following OAuth Scopes to your service account:
    1. https://www.googleapis.com/auth/admin.directory.group.readonly
    2. https://www.googleapis.com/auth/admin.directory.user.readonly
    3. https://www.googleapis.com/auth/calendar.readonly
    4. https://www.googleapis.com/auth/drive.readonly
    5. https://www.googleapis.com/auth/drive.metadata.readonly
    6. https://www.googleapis.com/auth/drive.photos.readonly
  6. Click Authorize.

Step 2 - Create the connector in Simpplr

  1. In Simpplr, go to Manage features > Enterprise Search > Add source.



  2. Search for and select Google Drive.


  3. Enter basic information:
    • Connection Name: (Connector Name for this instance).
  4. Provide authentication details (Copied from the Admin console):
    • Google Workspace Admin email (Account used to create the service account)
    • Google Drive Service Account JSON (Service account JSON key downloaded during the above configuration - in step 6.d).

  1. Click Save and Confirm.

Step 3 - Define sync scope

Configure Folder-based rules:

    • Include specific folders: Include the files belonging to specific folder and drives (link or ID).
    • Exclude specific folders: Exclude the files belonging to specific folder and drives (link or ID).

  • Configure Common Filters rules (Exclusion only):
    • File extension (e.g., .zip, .exe)
    • File size above a specified threshold
    • Document age (e.g., older than specified date)

  • Configure Audience-based filtering.
    • Include audiences
    • Exclude audiences

Step 4 - Configure sync schedule

  • Default schedule: Full crawl at first setup and once in a week, incremental sync every 4 hours, ACL runs every hour
  • Configuration options:
    • No option to configure the sync schedule, however sync can be paused and resumed manually

Step 5 - Monitor the sync

  1. Monitor the initial full sync status (starts automatically) in the connector dashboard.

Crawling and sync behavior

How the connector works over time:

  1.  Initial full crawl
    • All the content present in the Google Drive account is indexed during the first run
    • How long it may take: Depends on the size of the content.
  1. Incremental updates
    • Mechanism: Based on Timestamp of previous sync.
    • What changes trigger reindexing:
      • New items created
      • Existing items updated
      • Permissions changed
      • Items renamed
      • Items deleted
  1. Deletion and permission changes
    • Deleted items are removed from index at next sync.
    • Permission changes are updated at the next sync cycle.
  1. Expected latency
    • With the default schedule (incremental sync every 4 hours and ACL sync every hour), changes made to Google Drive content are generally reflected in Simpplr search results within 4 hours of the update, and the permission lag in the system can be up-to 4hours. (as the incremental Sync every 4 hours). On top of that, there can be certain cases, where the permission sync can take up-to 7days (When the full sync is run), subject to content volume and system load.

Field mapping and search experience

Default field mapping

Source field Google Drive > Index field Simpplr.

title name 
url url
owner/created by created_by_email
file type type
Last modified last_updated
Created data created_at
size size
permissions /access control  _allow_access_control

Search experience - how content from this connector appears in search:

  • Result layout: (Icon, Connector name, Folder name, name as link, body (excerpt), Created By, Created Date, File Type icon, File Type, Permission)

  • Available filters and facets:
    • Sources = Google Drive
    • File type
    • Owner
    • Created Date
  • Participation in advanced features:
    • Smart Answers / Q&A: Yes
    • Autocomplete: Yes
    • Recommendations / “Suggested for you”: N/A
    • Trending / popular results: N/A
    • Semantic / hybrid ranking: Yes

Limits and known limitations

Maximum file size indexed Files bigger than 10 MB won’t be extracted.
Unsupported file types

Compressed files are not supported, e.g., an archive file containing a set of PDFs.

(The file content is not searchable, however, the users can still search via file title.)

Rate limits N/A
Preview limitations No preview available for excel, or media files.
Permission edge case Permission changes are not synced unless ACL sync is run.
Other known limitations
  • The text from the PDF is extracted. However,  If text is within an image, it is not extracted.
  • Nested group permission won't be assigned 
  • In Filtering rules if we have folder id inclusion , we only ingest the root level of the folder in contention and not all the sub-folder’s / children’s
  • We don’t support inclusion filters for file type, owner and created date.

Monitoring and troubleshooting

  1. Connector health and monitoring - Describe where admins can see status information:
    • Enterprise Search > Connector name
    • Available metrics:
      • Last sync status (Success/Warning/Failed)
      • Last sync time
      • Next scheduled sync
      • Sync Type
      • Total items indexed count

  1. Common issues and resolutions. Example pattern:
    1. Issue: AuthError: Unauthorized Content: 'unauthorized_client'
      Possible causes:
  • Incorrect service account JSON key
  • Service account not granted with the required permissions
  • Client ID not granted with Domain-wide delegation for authority

Resolution:

  • Verify and re-enter credentials
  • Confirm required permissions are granted
  • Confirm if the Client ID is granted with Domain-wide delegation for authority.
  1. ACL sync not working.
    Possible causes:
  • Client ID not granted with Domain-wide delegation for authority

Resolution:

  • Confirm if the Client ID is granted with Domain-wide delegation for authority.

When to contact support:

    1. Authentication error persists even after trying the above-mentioned resolutions
    2. Sync is stuck in the Pending state,
    3. Sync is in progress but no documents are getting ingested.
    4. Sync failure with cancelled error (when not cancelled manually)
    5. Incomplete or Partial sync.

When contacting Support, include:

  • Connector name and instance ID (if available)
  • Organization URL
  • Approximate time and date of the issue
  • Error messages or screenshots
  • Steps you already tried

Frequently asked questions (FAQ)

Q1. Can I connect multiple Google Drive tenants or domains?
A. Multiple Google Drive connections can be configured in the Simpplr environment. But functionality to connect multiple tenants or domains in a single connector is not implemented yet.

Q2. How often does Google Drive sync data?
A. The connector runs a full crawl on first setup and then once per week. Incremental sync runs every 4 hours and ACL (permission) sync runs every hour by default.

Q3. Are comments, revisions, or version history indexed?
A. Comments and individual versions are not indexed as separate items. The connector indexes the latest file metadata, including the last updated time and updated by user.

Q4. Does the connector index content from external guests or shared links?
A. Files shared with the user are indexed.

Q5. What happens when a user loses access to an item in Google Drive Storage?
A. The updated access permissions will be indexed during the next sync.

Though the access control syncs run every hour, the permission lag in the system can be up to 4 hours. (As the incremental sync every 4 hours). On top of that, there can be certain cases, where the permission sync can take up to 7 days (when the full sync is run).

Q6. Can I exclude certain sites/teams/folders from being indexed?
A. Documents can be included and excluded based on the Folder paths and IDs. Documents can also be excluded based on file extension, size, and age. Additionally, documents can be included or excluded based on audiences.

Q7. How are deletions handled?
A. Objects deleted from the source are permanently deleted from the index.

Q8. Are image files searchable?
A. No, images are not searchable.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request
Note: Some features may not be avalable in your instance due to various packaging and pricing. To learn what features are available to your org and bundling with the Simpplr One packaging, contact your CSM or Account Manager.

Comments

0 comments

Please sign in to leave a comment.

Articles in this section