ConversationAI: Importing Documents
Date:
Objectives
- Develop a Python code: to use Discovery Engine Agent Builder Client Libraries to Import Documents into Agent Builder Data Stores from Cloud Storage
- Deploy the Python code in Cloud Functions
- Read Document from Cloud Storage into Vertex AI Agent Data Store
- Test the DialogFlow CX Conversation Chatbot to be able to respond with the new data.
I am agoing to add data about enterprise arhictect. The DialogFlow CX is not able to provide a result since it does not have information from the data store on it.
We will use automation to import the data and will then try again.
1. Python Client
These are the main library. I did not find much help from the library, so I had to read the documentation to figure this out.
from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine_v1
from google.cloud.discoveryengine_v1 import ImportDocumentsRequest, GcsSource
Some setup which we can read via runtime
# If a parameter is not provided, use the default value.
project_id = request.args.get('project_id', "XXXX")
location = request.args.get('location', "us")
data_store_id = request.args.get('data_store_id', "all-githhub-as-text_XXXXX0")
gcs_uri = request.args.get('gcs_uri', "gs://XXXdatastore-dialogflowcx/enterprisearchitect.txt")
API End Point
# Determine the API endpoint based on the location.
# If the location is not 'global', create a client_options object with the appropriate API endpoint.
client_options = (
ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
if location != "global"
else None
)
# Create a Discovery Engine Document Service client using the client_options.
client = discoveryengine_v1.DocumentServiceClient(client_options=client_options)
# Construct the full path to the branch within the specified data store.
# This path is used as the 'parent' for the import request.
parent = f"projects/{project_id}/locations/{location}/dataStores/{data_store_id}/branches/default_branch"
The ImportDocumentsRequest
# Define the GCS source from which documents will be imported.
# 'input_uris' is a list of GCS URIs pointing to the files you want to import.
# 'data_schema' specifies the format of the documents, such as "content" or "csv".
gcs_source = discoveryengine_v1.GcsSource(
input_uris=[gcs_uri],
data_schema="content", # Replace with "csv" if the GCS file is in CSV format.
)
# Create an ImportDocumentsRequest object, specifying the parent path,
# the GCS source, and the reconciliation mode.
# The reconciliation mode "INCREMENTAL" means that only new or updated documents are imported.
import_request = ImportDocumentsRequest(
parent=parent,
gcs_source=gcs_source,
reconciliation_mode="INCREMENTAL"
)
Call the Discovery Engine API
# Call the Discovery Engine API to import documents based on the provided request.
# 'operation.result()' waits for the import process to complete before proceeding.
operation = client.import_documents(request=import_request)
operation.result() # Wait for the import to complete.
2. Cloud Funciton
there is a lack of time so I did not add any triggers into the cloud function. I have other demo for that. I will trigger the Cloud Funciton via Curl
3. Read Document from Cloud Storage
’
The text is from Wikipedia on enterprise architect
Running the Cloud Function
Checking Data Store if the document was uploaded
Checking the activity tab
checking the document
4. Validating if the Dialog Flow has this updated and genterates a result.
now we have a answer
I would like to compare the original just for kicks.