RAG solution on Amazon Bedrock - Part 2: Build the MCQ orchestrator using Bedrock Converse API

Published in

AWS in Plain English

8 min readJul 7, 2024

Generative AI Demo using the Well-Architected Machine Learning Lens PDF to prepare for Machine Learning Engineer Associate (MLA-C01) Certification

In the first part of this series, we set up Amazon OpenSearch Serverless using AWS CDK to build a vector store for our Retrieval-Augmented Generation (RAG) solution. In this second part, we will build the Multiple Choice Questions (MCQ) orchestrator using Amazon Bedrock’s Converse API. This orchestrator will generate and evaluate MCQs based on the Well-Architected Machine Learning Lens PDF to help candidates prepare for the AWS Machine Learning Engineer Associate (MLA-C01) certification.

AWS Certified Machine Learning Engineer Associate Badge

Pre-requisites

AWS Account
AWS CLI
NodeJS
Python
AWS CDK
Bootstrap CDK with: cdk bootstrap aws://ACCOUNT-NUMBER/REGION
Visual Studio Code (or your favourite Editor)
Amazon Bedrock Access

CDK Project Implementation

Ensure you have the AWS CDK installed and initialized as per the pre-requisites.

The complete source code for this project can be found on our GitHub repository here. Clone the repository using the following command with release branch v0.2.1:

git clone -b v0.2.1 https://github.com/awsdataarchitect/opensearch-bedrock-rag-cdk.git && cd opensearch-bedrock-rag-cdk
npm i

Ensure you have deployed the AOSS cluster from CDK project and the AWS Machine Learning Lens PDF document is indexed as mentioned in the first part of this series.

Explanation of Key Components in this release:

app.py: This Streamlit application takes user input, sends it to the answer_query function, and displays the generated MCQs.
query_against_openSearch.py: This script communicates with Amazon Bedrock to generate embeddings, performs a KNN search on OpenSearch, and uses the Bedrock Converse API to generate and format the MCQs.

Key Points

Embedding Generation: get_embedding function uses Amazon Bedrock to generate embeddings for the input text.
KNN Search: Performs a KNN search on the OpenSearch vector index to find similar documents.
Prompt Engineering: Constructs a prompt for generating MCQs based on the retrieved documents.
Conversation Orchestrator: Uses Bedrock Converse API to manage the conversation context and generate responses.

Follow the steps below to use the application

Step 1: Set Up the Python Virtual Environment

First, set up a Python virtual environment in the root directory of this Proof of Concept (POC). Ensure you are using Python 3.10. Run the following commands:

pip install virtualenv
python3.10 -m venv venv

A virtual environment is essential for managing dependencies and ensuring consistency across different development environments. If you need more clarification on setting up a virtual environment, refer to relevant resources or documentation.

After creating the virtual environment, activate it with the following command:

source venv/bin/activate

Step 2: Install Required Packages

Once your virtual environment is active, install the required packages listed in the requirements.txt file. Run this command in the root directory of the POC:

pip install -r requirements.txt

Step 3: Configure Environment Variables

Next, configure your environment variables. Create a .env file in the root directory of the repository and add the following line:

profile_name=<CLI_profile_name>

Ensure your AWS CLI profile has access to Amazon Bedrock.

Step 4: Running the Application

After cloning the repository, creating and activating the virtual environment, installing the required packages, and setting up the .env file, your application is ready to go. Start the application with the following command:

streamlit run app.py

Once the application is up and running in your browser, you can begin asking questions and generating natural language responses. The Amazon Bedrock Converse API will manage the conversation history and context.

Once the official study guide for AWS Machine Learning Engineer Associate (MLA-C01) certification is released on August 13, 2024, you can use the topic name from the guide to generate the MCQ questions using our App. For now, I simply used a topic “Perform offline and online model evaluation (A/B testing)” from the MLS-C01 (AWS Machine Learning Specialty Certification) exam study guide as shown in screenshot below:

Here are all the questions generated by Bedrock Converse API for the sample Topic and using our RAG solution:

Question #1) Which of the following is NOT a reason to evaluate a model?

A) To determine if the model is overfitting or underfitting

B) To determine if the model is more sensitive or specific

C) To determine if the model is more accurate than a previous model

D) To determine if the model is more accurate than a random guess

Correct Answer:

D)

Explanation:

The model should be evaluated to determine if it is more accurate than a previous model, not a random guess.

Question #2) Which of the following is NOT a method for evaluating a model?

A) A/B testing

B) Cross-validation

C) Holdout validation

D) Random sampling

Correct Answer:

D)

Explanation:

Random sampling is not a method for evaluating a model, but rather a method for selecting a subset of data for training or testing.

Question #3) Which of the following is NOT a type of error rate that can be calculated for a multiclass model?

A) False positive rate

B) False negative rate

C) True positive rate

D) True negative rate

Correct Answer:

D)

Explanation:

In a multiclass model, there is no concept of a true negative rate, as there is no single "negative" class.

Question #4) Which of the following is NOT a type of evaluation that can be performed on a model?

A) Offline evaluation

B) Online evaluation

C) Sensitivity analysis

D) Specificity analysis

Correct Answer:

C)

Explanation:

Sensitivity analysis is a method for understanding how changes in input variables affect the output of a model, not a type of evaluation.

Question #5) Which of the following is NOT a benefit of A/B testing?

A) It allows for direct comparison of different models

B) It allows for testing of models with live data

C) It allows for testing of models with historical data

D) It allows for testing of models with synthetic data

Correct Answer:

D)

Explanation:

A/B testing is typically performed with live or historical data, not synthetic data.

Question #6) Which of the following is NOT a step in the A/B testing process?

A) Split the data into two groups

B) Train two models on the split data

C) Test the models on the split data

D) Compare the performance of the two models

E) Replace the old model with the new model if it performs better

Correct Answer:

D)

Explanation:

The step of comparing the performance of the two models is not explicitly mentioned in the A/B testing process.

Question #7) Which of the following is NOT a reason to perform offline evaluation?

A) To evaluate the model's performance on a subset of the data

B) To evaluate the model's performance on historical data

C) To evaluate the model's performance on live data

D) To evaluate the model's performance on synthetic data

Correct Answer:

C)

Explanation:

Offline evaluation is typically performed on historical or synthetic data, not live data.

Question #8) Which of the following is NOT a benefit of online evaluation?

A) It allows for testing of models with live data

B) It allows for direct comparison of different models

C) It allows for evaluation of the model's performance over time

D) It allows for evaluation of the model's performance on a subset of the data

Correct Answer:

D)

Explanation:

Online evaluation typically involves testing the model on all available data, not a subset of the data.

Question #9) Which of the following is NOT a type of evaluation metric that can be used to evaluate a model?

A) Precision

B) Recall

C) F1 score

D) Mean squared error

Correct Answer:

D)

Explanation:

Mean squared error is a type of loss function, not an evaluation metric.

Question #10) Which of the following is NOT a method for evaluating the performance of a multiclass model?

A) Confusion matrix

B) ROC curve

C) Precision-recall curve

D) Accuracy

Correct Answer:

D)

Explanation:

Accuracy is not a method for evaluating the performance of a multiclass model, but rather a metric that can be calculated from the results of other evaluation methods.

Question #11) Which of the following is NOT a type of evaluation that can be performed on a model?

A) Sensitivity analysis

B) Specificity analysis

C) Precision analysis

D) Recall analysis

Correct Answer:

A)

Explanation:

Sensitivity analysis is a method for understanding how changes in input variables affect the output of a model, not a type of evaluation.

Question #12) Which of the following is NOT a benefit of offline evaluation?

A) It allows for evaluation of the model's performance on a subset of the data

B) It allows for evaluation of the model's performance on historical data

C) It allows for testing of models with synthetic data

D) It allows for evaluation of the model's performance over time

Correct Answer:

D)

Explanation:

Offline evaluation typically involves testing the model on a fixed dataset, not over time.

Question #13) Which of the following is NOT a benefit of A/B testing?

A) It allows for direct comparison of different models

B) It allows for testing of models with live data

C) It allows for evaluation of the model's performance over time

D) It allows for testing of models with historical data

Correct Answer:

C)

Explanation:

A/B testing typically involves testing the model on a fixed dataset, not over time.

Question #14) Which of the following is NOT a type of evaluation that can be performed on a model?

A) Sensitivity analysis

B) Specificity analysis

C) Precision analysis

D) Recall analysis

Correct Answer:

A)

Explanation:

Sensitivity analysis is a method for understanding how changes in input variables affect the output of a model, not a type of evaluation.

Logging response metadata and token counts

The Converse method also returns metadata about the API call. We log the usage property that includes details about the input and output tokens. This can help you understand the charges for your API call. The latency property gives us the latency of the call to Converse, in milliseconds.

You will see a response similar to the following in the Streamlit’s background screen:

usage: {'inputTokens': 579, 'outputTokens': 1600, 'totalTokens': 2179}
latencyMs: {'latencyMs': 34931}

Note that the displayed usage numbers are only for the last API call from our app. You can use these token counts to track the cost of the API call. You can read more about token-based pricing on the Amazon Bedrock pricing page.

Clean-up

To delete all the resources, simply run the cdk destroy command. By running this command, you ensure the complete removal of the defined resources, freeing up any allocated resources and eliminating associated costs.

Conclusion

In this part of the series, we created a Streamlit application that leverages Amazon Bedrock’s Converse API to generate and evaluate multiple-choice questions. This MCQ orchestrator will help candidates prepare for the AWS Machine Learning Engineer Associate (MLA-C01) certification by providing targeted practice based on the Well-Architected Machine Learning Lens PDF. Stay tuned for the next part of this series, where we will further enhance our RAG solution.

In Plain English 🚀

Thank you for being a part of the In Plain English community! Before you go:

Be sure to clap and follow the writer ️👏️️
Follow us: X | LinkedIn | YouTube | Discord | Newsletter
Visit our other platforms: CoFeed | Differ
More content at PlainEnglish.io