ConversationAI: Importing Documents

Date:

image

Objectives

  1. Develop a Python code: to use Discovery Engine Agent Builder Client Libraries to Import Documents into Agent Builder Data Stores from Cloud Storage
  2. Deploy the Python code in Cloud Functions
  3. Read Document from Cloud Storage into Vertex AI Agent Data Store
  4. Test the DialogFlow CX Conversation Chatbot to be able to respond with the new data.

I am agoing to add data about enterprise arhictect. The DialogFlow CX is not able to provide a result since it does not have information from the data store on it.

image

We will use automation to import the data and will then try again.

1. Python Client

These are the main library. I did not find much help from the library, so I had to read the documentation to figure this out.

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine_v1
from google.cloud.discoveryengine_v1 import ImportDocumentsRequest, GcsSource

Some setup which we can read via runtime

# If a parameter is not provided, use the default value.
project_id = request.args.get('project_id', "XXXX")
location = request.args.get('location', "us")
data_store_id = request.args.get('data_store_id', "all-githhub-as-text_XXXXX0")
gcs_uri = request.args.get('gcs_uri', "gs://XXXdatastore-dialogflowcx/enterprisearchitect.txt")

API End Point

# Determine the API endpoint based on the location.
    # If the location is not 'global', create a client_options object with the appropriate API endpoint.
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )
 # Create a Discovery Engine Document Service client using the client_options.
    client = discoveryengine_v1.DocumentServiceClient(client_options=client_options)

    # Construct the full path to the branch within the specified data store.
    # This path is used as the 'parent' for the import request.
    parent = f"projects/{project_id}/locations/{location}/dataStores/{data_store_id}/branches/default_branch"

The ImportDocumentsRequest

 # Define the GCS source from which documents will be imported.
    # 'input_uris' is a list of GCS URIs pointing to the files you want to import.
    # 'data_schema' specifies the format of the documents, such as "content" or "csv".
    gcs_source = discoveryengine_v1.GcsSource(
        input_uris=[gcs_uri],
        data_schema="content",  # Replace with "csv" if the GCS file is in CSV format.
    )

    # Create an ImportDocumentsRequest object, specifying the parent path,
    # the GCS source, and the reconciliation mode.
    # The reconciliation mode "INCREMENTAL" means that only new or updated documents are imported.
    import_request = ImportDocumentsRequest(
        parent=parent,
        gcs_source=gcs_source,
        reconciliation_mode="INCREMENTAL"
    )

Call the Discovery Engine API

  # Call the Discovery Engine API to import documents based on the provided request.
  # 'operation.result()' waits for the import process to complete before proceeding.
  operation = client.import_documents(request=import_request)
  operation.result()  # Wait for the import to complete.

2. Cloud Funciton

there is a lack of time so I did not add any triggers into the cloud function. I have other demo for that. I will trigger the Cloud Funciton via Curl

image

3. Read Document from Cloud Storage

image

The text is from Wikipedia on enterprise architect

image

Running the Cloud Function

image

Checking Data Store if the document was uploaded

Checking the activity tab image

checking the document

image

4. Validating if the Dialog Flow has this updated and genterates a result.

now we have a answer

image

I would like to compare the original just for kicks.

image